Incorrect 'beautification' of URL?
When requesting a URL like http://tmp.logix.cz/slash.xp , wget shortens this to http://tmp.logix.cz/slash.xp/; All Browsers I tested (Opera 6b1, Mozilla 0.9.8, Konqueror 2.9.2) pass this URL as given. So the question is, why wget (1.8.1) does what it does and how to possibly switch off this behaviour. Philipp -- Philipp Thomas [EMAIL PROTECTED] SuSE Linux AG, Deutscherrnstr. 15-19, D-90429 Nuremberg, Germany HPUX and sane have never been uttered in the same sentence without accompanying negatives. -- Richard Henderson on gcc ml
Change in behaviour between 1.7 and 1.8.1
When you issue wget --recursive --level=1 --reject=.html www.suse.de wget 1.7 really ommits downloading all the .html files except index.html (which is needed for --recursive), but wget 1.8.1 also downloads all .html files that are referenced from index.html and deletes them immediately. It is clear that the .html files are needed to find the next level of files when downloading recursively, but they should be ommitted when the recursion depth is limited and the limit has been reached. Philipp -- Philipp Thomas [EMAIL PROTECTED] SuSE Linux AG, Deutscherrnstr. 15-19, D-90429 Nuremberg, Germany
Re: Wget and i18n
* Drazen Kacar ([EMAIL PROTECTED]) [20010427 00:51]: I wonder. POSIX compilation environment may not do such a thing with POSIX headers, unless explicitely allowed by POSIX. I'm too lazy to look into AFAIK, POSIX is rather useless in this context as it is an ISO issue, with ctype.h being defined there. Philipp -- Penguins shall save the dinosaurs -- Handelsblatt about Linux on S/390
Re: Wget and i18n
* Herold Heiko ([EMAIL PROTECTED]) [20010426 18:42]: bugfix), now still ctype.h is included in winnt.h, compilation fails. (We always knew winnt is an *old* system, but this proves it :). Any idea what would be a sensible way to cover this ? Does MS ctype.h have include guards? If yes, one could just define those guards on the command line. The other alternative would be to disable NLS support on WIN* and make the inclusion of safe-ctype.h also depending on this. Philipp -- Philipp Thomas [EMAIL PROTECTED] Development, SuSE GmbH, Schanzaecker Str. 10, D-90443 Nuremberg, Germany Penguins shall save the dinosaurs -- Handelsblatt about Linux on S/390
Re: Minor fixes in wget 1.6's main.c
* Nicols Lichtmaier ([EMAIL PROTECTED]) [20010404 09:07]: That's what 1.7 does. Yup... I guess I'll port that to 1.6. It's easy to do. Here's my patch against the WGET_1.6 branch as of today: src/ChangeLog: 2001-04-04 Philipp Thomas [EMAIL PROTECTED] * safe-ctype.h: New file. Locale independent ctype.h replacement taken from libiberty. safe-ctype.c: New file. Tables for above. Makefile.in: Add safe-ctype$o to OBJS. Add dependencies for safe-ctype$o. cmpt.c: Remove include of ctype.h. Use ISSPACE instead of isspace. ftp-basic.c: Don't include ctype.h. ftp-ls.c: Likewise. ftp.c: Likewise. headers.c: Likewise. host.c: Likewise. html-parse.c: Likewise. html-url.c: Likewise. http.c: Likewise. init.c: Likewise. main.c: Likewise. Set LC_CTYPE along with LC_MESSAGES. netrc.c: Likewise. recur.c: Likewise. retr.c: Likewise. snprintf.c: Replace ctype.h with safe-ctype.h. Use ISDIGIT instead of isdigit. sysdep.h: Remove defines of ctype macros as they aren't needed for safe-ctype-h. url.c: Don't include ctype.h. utils.c: Likewise. wget.h: Include safe-ctype.h. Index: src/Makefile.in === RCS file: /pack/anoncvs/wget/src/Makefile.in,v retrieving revision 1.2 diff -u -r1.2 Makefile.in --- src/Makefile.in 2000/11/04 22:49:45 1.2 +++ src/Makefile.in 2001/04/04 13:09:14 @@ -59,7 +59,7 @@ OBJ = $(ALLOCA) cmpt$o connect$o fnmatch$o ftp$o ftp-basic$o \ ftp-ls$o $(OPIE_OBJ) getopt$o headers$o host$o html$o \ http$o init$o log$o main$o $(MD5_OBJ) netrc$o rbuf$o\ - recur$o retr$o snprintf$o url$o utils$o version$o + recur$o retr$o snprintf$o url$o utils$o version$o safe-ctype$o .SUFFIXES: .SUFFIXES: .c .o ._c ._o @@ -154,5 +154,6 @@ rbuf$o: config.h wget.h sysdep.h options.h rbuf.h connect.h recur$o: config.h wget.h sysdep.h options.h url.h recur.h utils.h retr.h rbuf.h ftp.h fnmatch.h host.h retr$o: config.h wget.h sysdep.h options.h utils.h retr.h rbuf.h url.h recur.h ftp.h host.h connect.h +safe-ctype$o: safe-ctype.h url$o: config.h wget.h sysdep.h options.h utils.h url.h host.h html.h utils$o: config.h wget.h sysdep.h options.h utils.h fnmatch.h Index: src/cmpt.c === RCS file: /pack/anoncvs/wget/src/cmpt.c,v retrieving revision 1.2 diff -u -r1.2 cmpt.c --- src/cmpt.c 2000/04/12 13:23:34 1.2 +++ src/cmpt.c 2001/04/04 13:09:14 @@ -26,7 +26,6 @@ #else # include strings.h #endif /* HAVE_STRING_H */ -#include ctype.h #include sys/types.h #ifdef HAVE_UNISTD_H @@ -657,9 +656,9 @@ { /* A white space in the format string matches 0 more or white space in the input string. */ - if (isspace (*fmt)) + if (ISSPACE (*fmt)) { - while (isspace (*rp)) + while (ISSPACE (*rp)) ++rp; ++fmt; continue; @@ -851,7 +850,7 @@ case 'n': case 't': /* Match any white space. */ - while (isspace (*rp)) + while (ISSPACE (*rp)) ++rp; break; case 'p': Index: src/ftp-basic.c === RCS file: /pack/anoncvs/wget/src/ftp-basic.c,v retrieving revision 1.3.2.1 diff -u -r1.3.2.1 ftp-basic.c --- src/ftp-basic.c 2000/12/17 18:14:29 1.3.2.1 +++ src/ftp-basic.c 2001/04/04 13:09:14 @@ -26,7 +26,6 @@ #else # include strings.h #endif -#include ctype.h #ifdef HAVE_UNISTD_H # include unistd.h #endif Index: src/ftp-ls.c === RCS file: /pack/anoncvs/wget/src/ftp-ls.c,v retrieving revision 1.2 diff -u -r1.2 ftp-ls.c --- src/ftp-ls.c2000/11/10 18:01:35 1.2 +++ src/ftp-ls.c2001/04/04 13:09:14 @@ -30,7 +30,6 @@ # include unistd.h #endif #include sys/types.h -#include ctype.h #include errno.h #include "wget.h" Index: src/ftp.c === RCS file: /pack/anoncvs/wget/src/ftp.c,v retrieving revision 1.16.2.5 diff -u -r1.16.2.5 ftp.c --- src/ftp.c 2000/12/31 03:55:20 1.16.2.5 +++ src/ftp.c 2001/04/04 13:09:14 @@ -26,7 +26,6 @@ #else # include strings.h #endif -#include ctype.h #ifdef HAVE_UNISTD_H # include unistd.h #endif Index: src/headers.c === RCS file: /pack/anoncvs/wget/src/headers.c,v retrieving revision 1.2 diff -u -r1.2 headers.c --- src/headers.c 2000/04/12 13:23:34 1.2 +++ src/headers.c 2001/04/04 13:09:14 @@ -26,7 +26,6 @@ #else # include strings.h #endif -#include ctype.h #include "wget.h" #include "con
Re: [Patch] LC_CTYPE not defined
* R.I.P. Deaddog ([EMAIL PROTECTED]) [20010328 12:16]: I'm not sure if this has been reported before, but the patch is attached here just in case. I did report it. But setting LC_CTYPE is not enough, as this will change the behaviour of the ctype.h macros. Check out the CVS version, as I submitted a patch to use a replacement for ctype.h which makes setting LC_CTYPE rather safe. Philipp -- Philipp Thomas [EMAIL PROTECTED] Development, SuSE GmbH, Schanzaecker Str. 10, D-90443 Nuremberg, Germany Penguins shall save the dinosaurs -- Handelsblatt about Linux on S/390
Re: Wget and i18n
* Hrvoje Niksic ([EMAIL PROTECTED]) [20010306 10:35]: #ifdef isalpha #error "safe-ctype.h and ctype.h may not be used simultaneously" #else Is the error statement actually true, or is this only a warning that tries to enforce consistency of the application? The error statement is true. Remember that ctype.h is locale dependent whereas safe-ctype is not. So for instance isprint (ctype.h) and ISPRINT (safe-ctype) could well produce different results. And as the intention is to get rid of the locale dependency, you have to block the inclusion of ctype.h. The caveat with using safe-ctype is, that it won't work with multibyte encodings or wchars. So in the end every use of is... does need to be checked anway. Also, won't this trigger an error if a system header file, say string.h, happens to include ctype.h? (I know system header files should not do that because it pollutes your namespace, but older systems sometimes do that.) Yes, it would trigger in that case. But safe-ctype was developed for GCC originally and as gcc is used also on old systems (one of them the original BSD), I guess we would have heard if safe-ctype broke things. Philipp -- Penguins shall save the dinosaurs -- Handelsblatt about Linux on S/390
Re: Wget and i18n
* Hrvoje Niksic ([EMAIL PROTECTED]) [20010306 11:21]: It is true that old systems use Gcc, but I wonder if anyone tests *new* Gcc's on old these old systems... Yes, they do. The patches to make gcc build on the original BSD are only present in the current CVS GCC. Philipp -- Penguins shall save the dinosaurs -- Handelsblatt about Linux on S/390
Re: Wget and i18n
* Hrvoje Niksic ([EMAIL PROTECTED]) [20010306 14:09]: OK, then the #error stays. If noone objects, I'll modify Wget to use these files. I have the patches ready and and am about to test them. So if you wait a bit, you'll get patches ready to apply. Philipp -- Penguins shall save the dinosaurs -- Handelsblatt about Linux on S/390
Re: Wget and i18n
* Hrvoje Niksic ([EMAIL PROTECTED]) [20010305 18:44]: Yes. I hate them for making that change, but apparently it's allowed (or even required, I forget now) by the applicable standards. It is required. LC_MESSAGES and LC_CTYPE are two different and independent locale categories. Philipp, you've nailed it down -- the ctype change is the reason why I didn't want to make that change. That's what I thought :) Seems like very few authors took heed of the implications that the locale support in the standard C library brought with it. As a result, nearly every package that supports i18n and uses ctype.h macros either needs to be audited very closely (there could be places that locale dependency is wanted) or needs to use a locale independent ctype.h replacement. If the sage-ctype thing is not too ugly, using it would be acceptable. Look for yourself :) I've attached safe-ctype.h from libiberty. If/when you're making the change, don't forget to make the patch against the latest CVS sources. I'm just now checking out the CVS tree and'll make the patch against it. Thanks for your effort. Well, as it's in my direct interest to have a correct package for our distribution, this comes naturally ;-) Philipp -- Philipp Thomas [EMAIL PROTECTED] Development, SuSE GmbH, Schanzaecker Str. 10, D-90443 Nuremberg, Germany Penguins shall save the dinosaurs -- Handelsblatt about Linux on S/390