Incorrect 'beautification' of URL?

2002-03-05 Thread Philipp Thomas

When requesting a URL like http://tmp.logix.cz/slash.xp , wget shortens
this to http://tmp.logix.cz/slash.xp/; All Browsers I tested (Opera 6b1,
Mozilla 0.9.8, Konqueror 2.9.2) pass this URL as given.

So the question is, why wget (1.8.1) does what it does and how to possibly
switch off this behaviour.

Philipp

-- 
Philipp Thomas [EMAIL PROTECTED]
SuSE Linux AG, Deutscherrnstr. 15-19, D-90429 Nuremberg, Germany

HPUX and sane have never been uttered in the same sentence
without accompanying negatives.
-- Richard Henderson on gcc ml



Change in behaviour between 1.7 and 1.8.1

2002-03-05 Thread Philipp Thomas

When you issue

 wget --recursive --level=1 --reject=.html www.suse.de

wget 1.7 really ommits downloading all the .html files except index.html
(which is needed for --recursive), but wget 1.8.1 also downloads all .html
files that are referenced from index.html and deletes them immediately.

It is clear that the .html files are needed to find the next level of files
when downloading recursively, but they should be ommitted when the recursion
depth is limited and the limit has been reached.

Philipp

-- 
Philipp Thomas [EMAIL PROTECTED]
SuSE Linux AG, Deutscherrnstr. 15-19, D-90429 Nuremberg, Germany



Re: Wget and i18n

2001-04-27 Thread Philipp Thomas

* Drazen Kacar ([EMAIL PROTECTED]) [20010427 00:51]:

 I wonder. POSIX compilation environment may not do such a thing with POSIX
 headers, unless explicitely allowed by POSIX. I'm too lazy to look into

AFAIK, POSIX is rather useless in this context as it is an ISO issue, with
ctype.h being defined there.

Philipp

-- 
Penguins shall save the dinosaurs
  -- Handelsblatt about Linux on S/390



Re: Wget and i18n

2001-04-26 Thread Philipp Thomas

* Herold Heiko ([EMAIL PROTECTED]) [20010426 18:42]:

 bugfix), now still ctype.h is included in winnt.h, compilation fails.
 (We always knew winnt is an *old* system, but this proves it :).
 
 Any idea what would be a sensible way to cover this ?

Does MS ctype.h have include guards? If yes, one could just define those
guards on the command line. The other alternative would be to disable NLS
support on WIN* and make the inclusion of safe-ctype.h also depending on
this.

Philipp

-- 
Philipp Thomas [EMAIL PROTECTED]
Development, SuSE GmbH, Schanzaecker Str. 10, D-90443 Nuremberg, Germany

Penguins shall save the dinosaurs
  -- Handelsblatt about Linux on S/390



Re: Minor fixes in wget 1.6's main.c

2001-04-04 Thread Philipp Thomas

* Nicols Lichtmaier ([EMAIL PROTECTED]) [20010404 09:07]:

  That's what 1.7 does.
 
  Yup... I guess I'll port that to 1.6. It's easy to do.

Here's my patch against the WGET_1.6 branch as of today:

src/ChangeLog:

2001-04-04  Philipp Thomas  [EMAIL PROTECTED]

* safe-ctype.h: New file. Locale independent ctype.h
replacement taken from libiberty.
safe-ctype.c: New file. Tables for above.
Makefile.in: Add safe-ctype$o to OBJS.
Add dependencies for safe-ctype$o.
cmpt.c: Remove include of ctype.h. Use ISSPACE instead
of isspace.
ftp-basic.c: Don't include ctype.h.
ftp-ls.c: Likewise.
ftp.c: Likewise.
headers.c: Likewise.
host.c: Likewise.
html-parse.c: Likewise.
html-url.c: Likewise.
http.c: Likewise.
init.c: Likewise.
main.c: Likewise. Set LC_CTYPE along with LC_MESSAGES.
netrc.c: Likewise.
recur.c: Likewise.
retr.c: Likewise.
snprintf.c: Replace ctype.h with safe-ctype.h. Use
ISDIGIT instead of isdigit.
sysdep.h: Remove defines of ctype macros as they aren't
needed for safe-ctype-h.
url.c: Don't include ctype.h.
utils.c: Likewise.
wget.h: Include safe-ctype.h.

Index: src/Makefile.in
===
RCS file: /pack/anoncvs/wget/src/Makefile.in,v
retrieving revision 1.2
diff -u -r1.2 Makefile.in
--- src/Makefile.in 2000/11/04 22:49:45 1.2
+++ src/Makefile.in 2001/04/04 13:09:14
@@ -59,7 +59,7 @@
 OBJ = $(ALLOCA) cmpt$o connect$o fnmatch$o ftp$o ftp-basic$o  \
   ftp-ls$o $(OPIE_OBJ) getopt$o headers$o host$o html$o   \
   http$o init$o log$o main$o $(MD5_OBJ) netrc$o rbuf$o\
-  recur$o retr$o snprintf$o url$o utils$o version$o
+  recur$o retr$o snprintf$o url$o utils$o version$o safe-ctype$o
 
 .SUFFIXES:
 .SUFFIXES: .c .o ._c ._o
@@ -154,5 +154,6 @@
 rbuf$o: config.h wget.h sysdep.h options.h rbuf.h connect.h
 recur$o: config.h wget.h sysdep.h options.h url.h recur.h utils.h retr.h rbuf.h ftp.h 
fnmatch.h host.h
 retr$o: config.h wget.h sysdep.h options.h utils.h retr.h rbuf.h url.h recur.h ftp.h 
host.h connect.h
+safe-ctype$o: safe-ctype.h
 url$o: config.h wget.h sysdep.h options.h utils.h url.h host.h html.h
 utils$o: config.h wget.h sysdep.h options.h utils.h fnmatch.h
Index: src/cmpt.c
===
RCS file: /pack/anoncvs/wget/src/cmpt.c,v
retrieving revision 1.2
diff -u -r1.2 cmpt.c
--- src/cmpt.c  2000/04/12 13:23:34 1.2
+++ src/cmpt.c  2001/04/04 13:09:14
@@ -26,7 +26,6 @@
 #else
 # include strings.h
 #endif /* HAVE_STRING_H */
-#include ctype.h
 
 #include sys/types.h
 #ifdef HAVE_UNISTD_H
@@ -657,9 +656,9 @@
 {
   /* A white space in the format string matches 0 more or white
 space in the input string.  */
-  if (isspace (*fmt))
+  if (ISSPACE (*fmt))
{
- while (isspace (*rp))
+ while (ISSPACE (*rp))
++rp;
  ++fmt;
  continue;
@@ -851,7 +850,7 @@
case 'n':
case 't':
  /* Match any white space.  */
- while (isspace (*rp))
+ while (ISSPACE (*rp))
++rp;
  break;
case 'p':
Index: src/ftp-basic.c
===
RCS file: /pack/anoncvs/wget/src/ftp-basic.c,v
retrieving revision 1.3.2.1
diff -u -r1.3.2.1 ftp-basic.c
--- src/ftp-basic.c 2000/12/17 18:14:29 1.3.2.1
+++ src/ftp-basic.c 2001/04/04 13:09:14
@@ -26,7 +26,6 @@
 #else
 # include strings.h
 #endif
-#include ctype.h
 #ifdef HAVE_UNISTD_H
 # include unistd.h
 #endif
Index: src/ftp-ls.c
===
RCS file: /pack/anoncvs/wget/src/ftp-ls.c,v
retrieving revision 1.2
diff -u -r1.2 ftp-ls.c
--- src/ftp-ls.c2000/11/10 18:01:35 1.2
+++ src/ftp-ls.c2001/04/04 13:09:14
@@ -30,7 +30,6 @@
 # include unistd.h
 #endif
 #include sys/types.h
-#include ctype.h
 #include errno.h
 
 #include "wget.h"
Index: src/ftp.c
===
RCS file: /pack/anoncvs/wget/src/ftp.c,v
retrieving revision 1.16.2.5
diff -u -r1.16.2.5 ftp.c
--- src/ftp.c   2000/12/31 03:55:20 1.16.2.5
+++ src/ftp.c   2001/04/04 13:09:14
@@ -26,7 +26,6 @@
 #else
 # include strings.h
 #endif
-#include ctype.h
 #ifdef HAVE_UNISTD_H
 # include unistd.h
 #endif
Index: src/headers.c
===
RCS file: /pack/anoncvs/wget/src/headers.c,v
retrieving revision 1.2
diff -u -r1.2 headers.c
--- src/headers.c   2000/04/12 13:23:34 1.2
+++ src/headers.c   2001/04/04 13:09:14
@@ -26,7 +26,6 @@
 #else
 # include strings.h
 #endif
-#include ctype.h
 
 #include "wget.h"
 #include "con

Re: [Patch] LC_CTYPE not defined

2001-03-28 Thread Philipp Thomas

* R.I.P. Deaddog ([EMAIL PROTECTED]) [20010328 12:16]:

 I'm not sure if this has been reported before, but the patch is attached
 here just in case.

I did report it. But setting LC_CTYPE is not enough, as this will change the
behaviour of the ctype.h macros. Check out the CVS version, as I submitted a
patch to use a replacement for ctype.h which makes setting LC_CTYPE rather
safe.

Philipp

-- 
Philipp Thomas [EMAIL PROTECTED]
Development, SuSE GmbH, Schanzaecker Str. 10, D-90443 Nuremberg, Germany

Penguins shall save the dinosaurs
  -- Handelsblatt about Linux on S/390



Re: Wget and i18n

2001-03-06 Thread Philipp Thomas

* Hrvoje Niksic ([EMAIL PROTECTED]) [20010306 10:35]:

  #ifdef isalpha
   #error "safe-ctype.h and ctype.h may not be used simultaneously"
  #else
 
 Is the error statement actually true, or is this only a warning that
 tries to enforce consistency of the application?

The error statement is true. Remember that ctype.h is locale dependent
whereas safe-ctype is not. So for instance isprint (ctype.h) and ISPRINT
(safe-ctype) could well produce different results. And as the intention
is to get rid of the locale dependency, you have to block the inclusion
of ctype.h.

The caveat with using safe-ctype is, that it won't work with multibyte
encodings or wchars. So in the end every use of is... does need to be
checked anway.
 
 Also, won't this trigger an error if a system header file, say
 string.h, happens to include ctype.h?  (I know system header files
 should not do that because it pollutes your namespace, but older
 systems sometimes do that.)

Yes, it would trigger in that case. But safe-ctype was developed for GCC
originally and as gcc is used also on old systems (one of them the original
BSD), I guess we would have heard if safe-ctype broke things.

Philipp

-- 
Penguins shall save the dinosaurs
  -- Handelsblatt about Linux on S/390



Re: Wget and i18n

2001-03-06 Thread Philipp Thomas

* Hrvoje Niksic ([EMAIL PROTECTED]) [20010306 11:21]:

 It is true that old systems use Gcc, but I wonder if anyone tests
 *new* Gcc's on old these old systems...

Yes, they do. The patches to make gcc build on the original BSD are only
present in the current CVS GCC.

Philipp

-- 
Penguins shall save the dinosaurs
  -- Handelsblatt about Linux on S/390



Re: Wget and i18n

2001-03-06 Thread Philipp Thomas

* Hrvoje Niksic ([EMAIL PROTECTED]) [20010306 14:09]:

 OK, then the #error stays.  If noone objects, I'll modify Wget to use
 these files.

I have the patches ready and and am about to test them. So if you wait a
bit, you'll get patches ready to apply.

Philipp

-- 
Penguins shall save the dinosaurs
  -- Handelsblatt about Linux on S/390



Re: Wget and i18n

2001-03-05 Thread Philipp Thomas

* Hrvoje Niksic ([EMAIL PROTECTED]) [20010305 18:44]:

 Yes.  I hate them for making that change, but apparently it's allowed
 (or even required, I forget now) by the applicable standards.

It is required. LC_MESSAGES and LC_CTYPE are two different and independent 
locale categories.
 
 Philipp, you've nailed it down -- the ctype change is the reason why I
 didn't want to make that change.

That's what I thought :) Seems like very few authors took heed of the
implications that the locale support in the standard C library brought with
it. As a result, nearly every package that supports i18n and uses ctype.h
macros either needs to be audited very closely (there could be places that
locale dependency is wanted) or needs to use a locale independent ctype.h
replacement.

 If the sage-ctype thing is not too ugly, using it would be acceptable.

Look for yourself :) I've attached safe-ctype.h from libiberty.

 If/when you're making the change, don't forget to make the patch
 against the latest CVS sources.

I'm just now checking out the CVS tree and'll make the patch against it.

  Thanks for your effort.

Well, as it's in my direct interest to have a correct package for our
distribution, this comes naturally ;-)

Philipp

-- 
Philipp Thomas [EMAIL PROTECTED]
Development, SuSE GmbH, Schanzaecker Str. 10, D-90443 Nuremberg, Germany

Penguins shall save the dinosaurs
  -- Handelsblatt about Linux on S/390