wget -r crash on local file

2003-11-26 Thread Harri Porten
Hi,

I accidentally tried to recursively get files from the local file systems
rather then the web. This resulted in a segmentation fault. Not the
Unsupported scheme error message I get with -r.

  $ echo :  /tmp/test.html
  $ wget -r /tmp/test.html
  Segmentation fault (core dumped)

wget version is 1.8.1. Running Debian 3.0 on a x86 machine.

Valgrind log (without debug info, sorry) shows a NULL pointer
dereferenciation:

==6822== Invalid read of size 4
==6822==at 0x805CB3C: (within /usr/bin/wget)
==6822==by 0x805A568: (within /usr/bin/wget)
==6822==by 0x4025B14E: __libc_start_main (in /lib/libc-2.2.5.so)
==6822==by 0x8049AC0: (within /usr/bin/wget)
==6822==Address 0x0 is not stack'd, malloc'd or free'd
Segmentation fault (core dumped)

Harri.



Re: Annyoing behaviour with --input-file

2003-11-26 Thread Hrvoje Niksic
Adam Klobukowski [EMAIL PROTECTED] writes:

 Adam Klobukowski [EMAIL PROTECTED] writes:
 
  If wget is used with --input-file option, it gets directory
  listing for each file specified in input file (if ftp protocol)
  before downloading each file,
 
 This is not specific to --input-file, it happens when --timestamping
 is specified.
 
 Are you using --timestamping (-N)?  If so, can you do without it, or
 replace it with --no-clobber?

 The whole command line was: 

 wget -cr -i list

You shouldn't use -r unless you need recursive download.


Re: wget ipv6

2003-11-26 Thread Hrvoje Niksic
By the way, can you please clarify the intention behind AI_V4MAPPED
and AI_ALL, which configure tests for, but nothing uses?


Re: can you authenticate to a http proxy with a username that contains a space?

2003-11-26 Thread Hrvoje Niksic
Tony Lewis [EMAIL PROTECTED] writes:

 antonio taylor wrote:

 http://fisrtname lastname:[EMAIL PROTECTED]

 Have you tried http://fisrtname%20lastname:[EMAIL PROTECTED] ?

Or simply quotes, as in wget http://firstname lastname:[EMAIL PROTECTED].


Re: Recursive ftp broken

2003-11-26 Thread Gisle Vanem
 Interestingly, I can't repeat this.  Still, to be on the safe side, I
 added some additional restraints to the code that make it behave more
 like the previous code, that worked.  Please try again and see if it
 works now.  If not, please provide some form of debugging output as
 well.

This Changelog fixed it:
* ftp.c: Set con-csock to -1 where rbuf_uninitialize was
previously used.

Thanks.

--gv



Testing on BEOS?

2003-11-26 Thread Hrvoje Niksic
Does someone have access to a BEOS machine with a compiler?  I'd like
to verify whether the current CVS works on BEOS, i.e. whether it's
still true that BEOS doesn't support MSG_PEEK.

Speaking of testing, please be sure to test the latest CVS on Windows
as well, where MSG_PEEK is said to be flaky.  HTTPS is another thing
that might work strangely because SSL_peek is undocumented (!).


MSG_PEEK (was Re: Testing on BEOS?)

2003-11-26 Thread Daniel Stenberg
On Wed, 26 Nov 2003, Hrvoje Niksic wrote:

 Speaking of testing, please be sure to test the latest CVS on Windows as
 well, where MSG_PEEK is said to be flaky.  HTTPS is another thing that might
 work strangely because SSL_peek is undocumented (!).

Out of curiosity, why are you introducing this peeking? I mean, what's the
gain?

-- 
 -=- Daniel Stenberg -=- http://daniel.haxx.se -=-
  ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol


Re: MSG_PEEK

2003-11-26 Thread Hrvoje Niksic
Daniel Stenberg [EMAIL PROTECTED] writes:

 Out of curiosity, why are you introducing this peeking? I mean,
 what's the gain?

Simplifying the code.  Getting rid of the unfinished and undocumented
rbuf abstraction layer.  Buffering is unnecessary when downloading
the body, and is mostly unnecessary when downloading the headers or in
line-oriented communication.

I got the idea by tracing how fetchmail communicates with the server.
Since fetchmail is fairly portable, I believe the idea works well in
practice.


RE: Testing on BEOS?

2003-11-26 Thread Herold Heiko
Sample windows MSVC compiled and basic test performed (download of the same
site with http and https, got exactly the same files).
Binary at the usual place, unfortunately my crappy ISP webserver seems to be
in Guru Meditation just now and refuses access (not the first problem after
the recent merger induced changes), so here are the direct links for binary
and sources:
ftp://ftp.sunsite.dk/projects/wget/windows/wget20031126b.zip
ftp://ftp.sunsite.dk/projects/wget/windows/wget20031126s.zip

Whenever that webserver will decide to return to earth the usual description
(stating nothing special in this case) will again be available at
http://xoomer.virgilio.it/hherold

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

 -Original Message-
 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 26, 2003 2:38 PM
 To: [EMAIL PROTECTED]
 Subject: Testing on BEOS?
 
 
 Does someone have access to a BEOS machine with a compiler?  I'd like
 to verify whether the current CVS works on BEOS, i.e. whether it's
 still true that BEOS doesn't support MSG_PEEK.
 
 Speaking of testing, please be sure to test the latest CVS on Windows
 as well, where MSG_PEEK is said to be flaky.  HTTPS is another thing
 that might work strangely because SSL_peek is undocumented (!).
 


Re: correct processing of redirections

2003-11-26 Thread Hrvoje Niksic
Peter Kohts [EMAIL PROTECTED] writes:

 4) When I'm doing straight-forward wget -m -nH http://www.gnu.org;
 everything is excellent, except the redirections: the files which we
 get because of the redirections overwrite any currently existing
 files with the same filenames.

I see your point.  Redirections to other hosts are indeed somewhat
evil, and I'm becoming convinced that the way that Wget handles them
now is suboptimal.  Fixing this correctly will require some thinking,
but a short-term workaround might be to provide an option to ignore
redirections to other hosts.


Re: problem with LF/CR etc.

2003-11-26 Thread Hrvoje Niksic
Peter GILMAN [EMAIL PROTECTED] writes:

 first of all, thanks for taking the time and energy to consider this
 issue.  i was only hoping to pick up a pointer or two; i never
 realized this could turn out to be such a big deal!

Neither did we.  :-)

 1) Jens' observation that the user will think wget is broken is
 correct.  the immediate reaction is, it works in my browser; why
 does wget say '404'?
[...]
 (and, after all, what is the purpose of wget?  is it an html
 verifier, or is it a Web-GET tool?  i submit that evaluation of the
 correctness of web code is outside the purview of wget.)

It's true that the point of Wget is not to evaluate correctness of web
pages.  But its purpose is not handling every piece of badly written
HTML on the web, either!  Just like badly written pages work in some
browsers, but not in others, some pages that work in IE will not work
in Wget.  This is nothing new.

As I said, Wget tries to handle badly written code if the mistakes are
either easy to handle or frequent enough to hamper the usefulness of
the program.  Strict comments fall into the second category, and these
embedded newlines fall into the first one.

 conclusion: if it doesn't break anything, and if it makes wget more
 useful, i can think of no reason this capability shouldn't be added.

Agreed.  This patch should fix your case.  It applies to the latest
CVS sources, but it can be easily retrofitted to earlier versions as
well.


2003-11-26  Hrvoje Niksic  [EMAIL PROTECTED]

* html-parse.c (convert_and_copy): Remove embedded newlines when
AP_TRIM_BLANKS is specified.

Index: src/html-parse.c
===
RCS file: /pack/anoncvs/wget/src/html-parse.c,v
retrieving revision 1.21
diff -u -r1.21 html-parse.c
--- src/html-parse.c2003/11/02 16:48:40 1.21
+++ src/html-parse.c2003/11/26 16:28:29
@@ -360,17 +360,16 @@
  the ASCII range when copying the string.
 
* AP_TRIM_BLANKS -- ignore blanks at the beginning and at the end
- of text.  */
+ of text, as well as embedded newlines.  */
 
 static void
 convert_and_copy (struct pool *pool, const char *beg, const char *end, int flags)
 {
   int old_tail = pool-tail;
-  int size;
 
-  /* First, skip blanks if required.  We must do this before entities
- are processed, so that blanks can still be inserted as, for
- instance, `#32;'.  */
+  /* Skip blanks if required.  We must do this before entities are
+ processed, so that blanks can still be inserted as, for instance,
+ `#32;'.  */
   if (flags  AP_TRIM_BLANKS)
 {
   while (beg  end  ISSPACE (*beg))
@@ -378,7 +377,6 @@
   while (end  beg  ISSPACE (end[-1]))
--end;
 }
-  size = end - beg;
 
   if (flags  AP_DECODE_ENTITIES)
 {
@@ -391,15 +389,14 @@
 never lengthen it.  */
   const char *from = beg;
   char *to;
+  int squash_newlines = flags  AP_TRIM_BLANKS;
 
   POOL_GROW (pool, end - beg);
   to = pool-contents + pool-tail;
 
   while (from  end)
{
- if (*from != '')
-   *to++ = *from++;
- else
+ if (*from == '')
{
  int entity = decode_entity (from, end);
  if (entity != -1)
@@ -407,6 +404,10 @@
  else
*to++ = *from++;
}
+ else if ((*from == '\n' || *from == '\r')  squash_newlines)
+   ++from;
+ else
+   *to++ = *from++;
}
   /* Verify that we haven't exceeded the original size.  (It
 shouldn't happen, hence the assert.)  */
Index: src/html-url.c
===
RCS file: /pack/anoncvs/wget/src/html-url.c,v
retrieving revision 1.40
diff -u -r1.40 html-url.c
--- src/html-url.c  2003/11/09 01:33:33 1.40
+++ src/html-url.c  2003/11/26 16:28:29
@@ -612,9 +612,12 @@
 init_interesting ();
 
   /* Specify MHT_TRIM_VALUES because of buggy HTML generators that
- generate a href= foo instead of a href=foo (Netscape
- ignores spaces as well.)  If you really mean space, use 32; or
- %20.  */
+ generate a href= foo instead of a href=foo (browsers
+ ignore spaces as well.)  If you really mean space, use 32; or
+ %20.  MHT_TRIM_VALUES also causes squashing of embedded newlines,
+ e.g. in img src=foo.[newline]html.  Such newlines are also
+ ignored by IE and Mozilla and are presumably introduced by
+ writing HTML with editors that force word wrap.  */
   flags = MHT_TRIM_VALUES;
   if (opt.strict_comments)
 flags |= MHT_STRICT_COMMENTS;