wget -r crash on local file
Hi, I accidentally tried to recursively get files from the local file systems rather then the web. This resulted in a segmentation fault. Not the Unsupported scheme error message I get with -r. $ echo : /tmp/test.html $ wget -r /tmp/test.html Segmentation fault (core dumped) wget version is 1.8.1. Running Debian 3.0 on a x86 machine. Valgrind log (without debug info, sorry) shows a NULL pointer dereferenciation: ==6822== Invalid read of size 4 ==6822==at 0x805CB3C: (within /usr/bin/wget) ==6822==by 0x805A568: (within /usr/bin/wget) ==6822==by 0x4025B14E: __libc_start_main (in /lib/libc-2.2.5.so) ==6822==by 0x8049AC0: (within /usr/bin/wget) ==6822==Address 0x0 is not stack'd, malloc'd or free'd Segmentation fault (core dumped) Harri.
Re: Annyoing behaviour with --input-file
Adam Klobukowski [EMAIL PROTECTED] writes: Adam Klobukowski [EMAIL PROTECTED] writes: If wget is used with --input-file option, it gets directory listing for each file specified in input file (if ftp protocol) before downloading each file, This is not specific to --input-file, it happens when --timestamping is specified. Are you using --timestamping (-N)? If so, can you do without it, or replace it with --no-clobber? The whole command line was: wget -cr -i list You shouldn't use -r unless you need recursive download.
Re: wget ipv6
By the way, can you please clarify the intention behind AI_V4MAPPED and AI_ALL, which configure tests for, but nothing uses?
Re: can you authenticate to a http proxy with a username that contains a space?
Tony Lewis [EMAIL PROTECTED] writes: antonio taylor wrote: http://fisrtname lastname:[EMAIL PROTECTED] Have you tried http://fisrtname%20lastname:[EMAIL PROTECTED] ? Or simply quotes, as in wget http://firstname lastname:[EMAIL PROTECTED].
Re: Recursive ftp broken
Interestingly, I can't repeat this. Still, to be on the safe side, I added some additional restraints to the code that make it behave more like the previous code, that worked. Please try again and see if it works now. If not, please provide some form of debugging output as well. This Changelog fixed it: * ftp.c: Set con-csock to -1 where rbuf_uninitialize was previously used. Thanks. --gv
Testing on BEOS?
Does someone have access to a BEOS machine with a compiler? I'd like to verify whether the current CVS works on BEOS, i.e. whether it's still true that BEOS doesn't support MSG_PEEK. Speaking of testing, please be sure to test the latest CVS on Windows as well, where MSG_PEEK is said to be flaky. HTTPS is another thing that might work strangely because SSL_peek is undocumented (!).
MSG_PEEK (was Re: Testing on BEOS?)
On Wed, 26 Nov 2003, Hrvoje Niksic wrote: Speaking of testing, please be sure to test the latest CVS on Windows as well, where MSG_PEEK is said to be flaky. HTTPS is another thing that might work strangely because SSL_peek is undocumented (!). Out of curiosity, why are you introducing this peeking? I mean, what's the gain? -- -=- Daniel Stenberg -=- http://daniel.haxx.se -=- ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol
Re: MSG_PEEK
Daniel Stenberg [EMAIL PROTECTED] writes: Out of curiosity, why are you introducing this peeking? I mean, what's the gain? Simplifying the code. Getting rid of the unfinished and undocumented rbuf abstraction layer. Buffering is unnecessary when downloading the body, and is mostly unnecessary when downloading the headers or in line-oriented communication. I got the idea by tracing how fetchmail communicates with the server. Since fetchmail is fairly portable, I believe the idea works well in practice.
RE: Testing on BEOS?
Sample windows MSVC compiled and basic test performed (download of the same site with http and https, got exactly the same files). Binary at the usual place, unfortunately my crappy ISP webserver seems to be in Guru Meditation just now and refuses access (not the first problem after the recent merger induced changes), so here are the direct links for binary and sources: ftp://ftp.sunsite.dk/projects/wget/windows/wget20031126b.zip ftp://ftp.sunsite.dk/projects/wget/windows/wget20031126s.zip Whenever that webserver will decide to return to earth the usual description (stating nothing special in this case) will again be available at http://xoomer.virgilio.it/hherold Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 26, 2003 2:38 PM To: [EMAIL PROTECTED] Subject: Testing on BEOS? Does someone have access to a BEOS machine with a compiler? I'd like to verify whether the current CVS works on BEOS, i.e. whether it's still true that BEOS doesn't support MSG_PEEK. Speaking of testing, please be sure to test the latest CVS on Windows as well, where MSG_PEEK is said to be flaky. HTTPS is another thing that might work strangely because SSL_peek is undocumented (!).
Re: correct processing of redirections
Peter Kohts [EMAIL PROTECTED] writes: 4) When I'm doing straight-forward wget -m -nH http://www.gnu.org; everything is excellent, except the redirections: the files which we get because of the redirections overwrite any currently existing files with the same filenames. I see your point. Redirections to other hosts are indeed somewhat evil, and I'm becoming convinced that the way that Wget handles them now is suboptimal. Fixing this correctly will require some thinking, but a short-term workaround might be to provide an option to ignore redirections to other hosts.
Re: problem with LF/CR etc.
Peter GILMAN [EMAIL PROTECTED] writes: first of all, thanks for taking the time and energy to consider this issue. i was only hoping to pick up a pointer or two; i never realized this could turn out to be such a big deal! Neither did we. :-) 1) Jens' observation that the user will think wget is broken is correct. the immediate reaction is, it works in my browser; why does wget say '404'? [...] (and, after all, what is the purpose of wget? is it an html verifier, or is it a Web-GET tool? i submit that evaluation of the correctness of web code is outside the purview of wget.) It's true that the point of Wget is not to evaluate correctness of web pages. But its purpose is not handling every piece of badly written HTML on the web, either! Just like badly written pages work in some browsers, but not in others, some pages that work in IE will not work in Wget. This is nothing new. As I said, Wget tries to handle badly written code if the mistakes are either easy to handle or frequent enough to hamper the usefulness of the program. Strict comments fall into the second category, and these embedded newlines fall into the first one. conclusion: if it doesn't break anything, and if it makes wget more useful, i can think of no reason this capability shouldn't be added. Agreed. This patch should fix your case. It applies to the latest CVS sources, but it can be easily retrofitted to earlier versions as well. 2003-11-26 Hrvoje Niksic [EMAIL PROTECTED] * html-parse.c (convert_and_copy): Remove embedded newlines when AP_TRIM_BLANKS is specified. Index: src/html-parse.c === RCS file: /pack/anoncvs/wget/src/html-parse.c,v retrieving revision 1.21 diff -u -r1.21 html-parse.c --- src/html-parse.c2003/11/02 16:48:40 1.21 +++ src/html-parse.c2003/11/26 16:28:29 @@ -360,17 +360,16 @@ the ASCII range when copying the string. * AP_TRIM_BLANKS -- ignore blanks at the beginning and at the end - of text. */ + of text, as well as embedded newlines. */ static void convert_and_copy (struct pool *pool, const char *beg, const char *end, int flags) { int old_tail = pool-tail; - int size; - /* First, skip blanks if required. We must do this before entities - are processed, so that blanks can still be inserted as, for - instance, `#32;'. */ + /* Skip blanks if required. We must do this before entities are + processed, so that blanks can still be inserted as, for instance, + `#32;'. */ if (flags AP_TRIM_BLANKS) { while (beg end ISSPACE (*beg)) @@ -378,7 +377,6 @@ while (end beg ISSPACE (end[-1])) --end; } - size = end - beg; if (flags AP_DECODE_ENTITIES) { @@ -391,15 +389,14 @@ never lengthen it. */ const char *from = beg; char *to; + int squash_newlines = flags AP_TRIM_BLANKS; POOL_GROW (pool, end - beg); to = pool-contents + pool-tail; while (from end) { - if (*from != '') - *to++ = *from++; - else + if (*from == '') { int entity = decode_entity (from, end); if (entity != -1) @@ -407,6 +404,10 @@ else *to++ = *from++; } + else if ((*from == '\n' || *from == '\r') squash_newlines) + ++from; + else + *to++ = *from++; } /* Verify that we haven't exceeded the original size. (It shouldn't happen, hence the assert.) */ Index: src/html-url.c === RCS file: /pack/anoncvs/wget/src/html-url.c,v retrieving revision 1.40 diff -u -r1.40 html-url.c --- src/html-url.c 2003/11/09 01:33:33 1.40 +++ src/html-url.c 2003/11/26 16:28:29 @@ -612,9 +612,12 @@ init_interesting (); /* Specify MHT_TRIM_VALUES because of buggy HTML generators that - generate a href= foo instead of a href=foo (Netscape - ignores spaces as well.) If you really mean space, use 32; or - %20. */ + generate a href= foo instead of a href=foo (browsers + ignore spaces as well.) If you really mean space, use 32; or + %20. MHT_TRIM_VALUES also causes squashing of embedded newlines, + e.g. in img src=foo.[newline]html. Such newlines are also + ignored by IE and Mozilla and are presumably introduced by + writing HTML with editors that force word wrap. */ flags = MHT_TRIM_VALUES; if (opt.strict_comments) flags |= MHT_STRICT_COMMENTS;