Re: [bug #20329] Make HTTP timestamping use If-Modified-Since

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yes, that's what it means.

I'm not yet committed to doing this. I'd like to see first how many
mainstream servers will respect If-Modified-Since when given as part of
an HTTP/1.0 request (in comparison to how they respond when it's part of
an HTTP/1.1 request). If common servers ignore it in HTTP/1.0, but not
in HTTP/1.1, that'd be an excellent case for holding off until we're
doing HTTP/1.1 requests.

Also, I don't think "removing the previous HEAD request" code is
entirely accurate: we probably would want to detect when a server is
feeding us non-new content in response to If-Modified-Since, and adjust
to use the current HEAD method instead as a fallback.

- -Micah

vinothkumar raman wrote:
> This mean we should remove the previous HEAD request code and use
> If-Modified-Since by default and have it to handle all the request and
> store pages if it is not returning a 304 response
> 
> Is it so?
> 
> 
> On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan <[EMAIL PROTECTED]> wrote:
>> Follow-up Comment #4, bug #20329 (project wget):
>>
>> verbatim-mode's not all that readable.
>>
>> The gist is, we should go ahead and use If-Modified-Since, perhaps even now
>> before there's true HTTP/1.1 support (provided it works in a reasonable
>> percentage of cases); and just ensure that any Last-Modified header is sane.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb7t7M8hyUobTrERAsvQAJ4k7fKrsFtfC4MQtuvE3Ouwz6LseACePqt2
8JiRBKtEhmcK3schVVO347A=
=yCJV
-END PGP SIGNATURE-


Re: [BUG:#20329] If-Modified-Since support

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

vinothkumar raman wrote:
> We need to give out the time stamp the local file in the Request
> header for that we need to pass on the local file's time stamp from
> http_loop() to get_http() . The only way to pass on this without
> altering the signature of the function is to add a field to struct url
> in url.h
> 
> Could we go for it?

That is acceptable.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb5B7M8hyUobTrERAv2YAJ0ajYx+pynFLtV2YmEw7fA+vwf8ugCfSaU1
AFkIYSyyyS4egbyXjzBLXBo=
=fIT5
-END PGP SIGNATURE-


Re: [bug #20329] Make HTTP timestamping use If-Modified-Since

2008-09-01 Thread vinothkumar raman
This mean we should remove the previous HEAD request code and use
If-Modified-Since by default and have it to handle all the request and
store pages if it is not returning a 304 response

Is it so?


On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan <[EMAIL PROTECTED]> wrote:
>
> Follow-up Comment #4, bug #20329 (project wget):
>
> verbatim-mode's not all that readable.
>
> The gist is, we should go ahead and use If-Modified-Since, perhaps even now
> before there's true HTTP/1.1 support (provided it works in a reasonable
> percentage of cases); and just ensure that any Last-Modified header is sane.
>
>___
>
> Reply to this item at:
>
>  
>
> ___
>  Message sent via/by Savannah
>  http://savannah.gnu.org/
>
>


Re: bug in wget

2008-06-14 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sir Vision wrote:
> Hello,
> 
> enterring following command results in an error:
> 
> --- command start ---
> c:\Downloads\wget_v1.11.3b>wget
> "ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8-l10n/";
> -P c:\Downloads\
> --- command end ---
> 
> wget cant convert ".listing"-file into a "html"-file

As this seems to work fine on Unix, for me, I'll have to leave it to the
Windows porting guy (hi Chris!) to find out what might be going wrong.

...however, it would really help if you would supply the full output you
got, from wget, that leads you to believe Wget couldn't do this
conversion. in fact, it wouldn't hurt to supply the -d flag as well, for
maximum debugging messages.

- --
Cheers,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIVKXx7M8hyUobTrERAo40AKCAmwgDOGgjU2kcTYeEGC3+RkCjzQCeJt6B
dz38DW8jMMZtUxc+FhvIhfI=
=T+mK
-END PGP SIGNATURE-


Re: Bug

2008-03-03 Thread Mark Pors
ok, thanks for your reply
We have a work-around in place now, but it doesnt scale very good.
Anyways, I'll start looking for another solution

Thanks!
Mark


On Sat, Mar 1, 2008 at 10:15 PM, Micah Cowan <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
>  Hash: SHA1
>
>
>
>  Mark Pors wrote:
>  > Hi,
>  >
>  > I posted this bug over two years ago:
>  > http://marc.info/?l=wget&m=113252747105716&w=4
>  >>From the release notes I see that this is still not resolved. Are
>  > there any plans to fix this any time soon?
>
>  I'm not sure that's a bug. It's more of an architectural choice.
>
>  Wget currently works by downloading a file, then, if it needs to look
>  for links in that file, it will open it and scan through it. Obviously,
>  it can't do that when you use -O -.
>
>  There are plans to move Wget to a more stream-like process, where it
>  scans links during download. At such time, it's very possible that -p
>  will work the way you want it to. In the meantime, though, it doesn't.
>
>  - --
>  Micah J. Cowan
>  Programmer, musician, typesetting enthusiast, gamer...
>  http://micah.cowan.name/
>  -BEGIN PGP SIGNATURE-
>  Version: GnuPG v1.4.6 (GNU/Linux)
>  Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
>  iD8DBQFHycd/7M8hyUobTrERAqWKAJ40YUOf5aKxvBwahRWBvqhwvqvq+gCePgI9
>  u646lF2Qp0abOw3iuvD0ohg=
>  =Cix9
>  -END PGP SIGNATURE-
>


Re: Bug

2008-03-01 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mark Pors wrote:
> Hi,
> 
> I posted this bug over two years ago:
> http://marc.info/?l=wget&m=113252747105716&w=4
>>From the release notes I see that this is still not resolved. Are
> there any plans to fix this any time soon?

I'm not sure that's a bug. It's more of an architectural choice.

Wget currently works by downloading a file, then, if it needs to look
for links in that file, it will open it and scan through it. Obviously,
it can't do that when you use -O -.

There are plans to move Wget to a more stream-like process, where it
scans links during download. At such time, it's very possible that -p
will work the way you want it to. In the meantime, though, it doesn't.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHycd/7M8hyUobTrERAqWKAJ40YUOf5aKxvBwahRWBvqhwvqvq+gCePgI9
u646lF2Qp0abOw3iuvD0ohg=
=Cix9
-END PGP SIGNATURE-


Re: bug on wget

2007-11-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
> Generally, if Wget considers a header to be in error (and hence
> ignores it), the user probably needs to know about that.  After all,
> it could be the symptom of a Wget bug, or of an unimplemented
> extension the server generates.  In both cases I as a user would want
> to know.  Of course, Wget should continue to be lenient towards syntax
> violations widely recognized by popular browsers.
> 
> Note that I'm not arguing that Wget should warn in this particular
> case.  It is perfectly fine to not consider an empty `Set-Cookie' to
> be a syntax error and to simply ignore it (and maybe only print a
> warning in debug mode).

That was my thought. I agree with both of your points above: if Wget's
not handling something properly, I want to know about it; but at the
same time, silently ignoring (erroneous) empty headers doesn't seem like
a problem.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHRGqx7M8hyUobTrERCPwQAJ4wGFwPBqyoVDXjrOifNB/fVF1vtACbBnDU
fnSx/Vj+S+DVnfRUbIz5HKU=
=n4yr
-END PGP SIGNATURE-


Re: bug on wget

2007-11-21 Thread Hrvoje Niksic
Micah Cowan <[EMAIL PROTECTED]> writes:

>> The new Wget flags empty Set-Cookie as a syntax error (but only
>> displays it in -d mode; possibly a bug).
>
> I'm not clear on exactly what's possibly a bug: do you mean the fact
> that Wget only calls attention to it in -d mode?

That's what I meant.

> I probably agree with that behavior... most people probably aren't
> interested in being informed that a server breaks RFC 2616 mildly;

Generally, if Wget considers a header to be in error (and hence
ignores it), the user probably needs to know about that.  After all,
it could be the symptom of a Wget bug, or of an unimplemented
extension the server generates.  In both cases I as a user would want
to know.  Of course, Wget should continue to be lenient towards syntax
violations widely recognized by popular browsers.

Note that I'm not arguing that Wget should warn in this particular
case.  It is perfectly fine to not consider an empty `Set-Cookie' to
be a syntax error and to simply ignore it (and maybe only print a
warning in debug mode).


Re: bug on wget

2007-11-20 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:
> Micah Cowan <[EMAIL PROTECTED]> writes:
> 
>> I was able to reproduce the problem above in the release version of
>> Wget; however, it appears to be working fine in the current
>> development version of Wget, which is expected to release soon as
>> version 1.11.*
> 
> I think the old Wget crashed on empty Set-Cookie headers.  That got
> fixed when I converted the Set-Cookie parser to use extract_param.
> The new Wget flags empty Set-Cookie as a syntax error (but only
> displays it in -d mode; possibly a bug).

I'm not clear on exactly what's possibly a bug: do you mean the fact
that Wget only calls attention to it in -d mode?

I probably agree with that behavior... most people probably aren't
interested in being informed that a server breaks RFC 2616 mildly;
especially if it's not apt to affect the results. Unless of course the
user was expecting that the user send a real cookie, but I'm guessing
that this only happens when the server doesn't have one to send (or
something). But a user in that situation should be using -d (or at least
- -S) to find out what the server is sending.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHQ3N97M8hyUobTrERCCpFAJ9RHcdJ8X4UWpEQIhz+khDWc8MOJwCfZANU
vr2lCTLP04R/PP/cBf7sIpE=
=6csr
-END PGP SIGNATURE-


Re: bug on wget

2007-11-20 Thread Hrvoje Niksic
Micah Cowan <[EMAIL PROTECTED]> writes:

> I was able to reproduce the problem above in the release version of
> Wget; however, it appears to be working fine in the current
> development version of Wget, which is expected to release soon as
> version 1.11.*

I think the old Wget crashed on empty Set-Cookie headers.  That got
fixed when I converted the Set-Cookie parser to use extract_param.
The new Wget flags empty Set-Cookie as a syntax error (but only
displays it in -d mode; possibly a bug).


Re: bug on wget

2007-11-20 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Diego Campo wrote:
> Hi,
> I got a bug on wget when executing:
> 
> wget -a log -x -O search/search-1.html --verbose --wait 3
> --limit-rate=20K --tries=3
> http://www.nepremicnine.net/nepremicninske_agencije.html?id_regije=1
> 
> Segmentation fault (core dumped)

Hi Diego,

I was able to reproduce the problem above in the release version of
Wget; however, it appears to be working fine in the current development
version of Wget, which is expected to release soon as version 1.11.*

* Unfortunately, it has been "expected to release soon" for a few months
now; we got hung up with some legal/licensing issues that are yet to be
resolved. It will almost certainly be released in the next few weeks,
though.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHQypR7M8hyUobTrERCF99AJ4w790h4juXzPwO+csBbSY3KcLOXACdGYgO
Kf4Oawgfjx6WOEzYwkQ47mw=
=8gL2
-END PGP SIGNATURE-


Re: bug in escaped filename calculation?

2007-10-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hrvoje Niksic wrote:

> In the long run, supporting something like IRL is surely the right
> thing to go for, but I have a feeling that we'll be stuck with the
> current messy URLs for quite some time to come.  So Wget simply needs
> to adapt to the current circumstances.  If the locale includes "UTF-8"
> in any shape or form, it is perfectly safe to assume that it's valid
> to create UTF-8 file names.  Of course, we don't know if a particular
> URL path sequence is really meant to be UTF-8, but there should be no
> harm in allowing valid UTF-8 sequences to pass through.  In other
> words, the default "quote control" policy could simply be smarter
> about what "control" means.

That's true. I had been thinking I'd just deal with it all together, but
there's no reason why we couldn't adjust what "control characters" are
based on the locale today. Still, I think it's a low-priority enough
issue (given that there are work-arounds), that I may save it to address
all in one lump.

BTW, there's a related discussion at
https://savannah.gnu.org/bugs/index.php?20863, though that one is
regarding  translating between the current locale and Unicode (for
command-line arguments) and back again (for file names).

> One consequence would be that Wget creates differently-named files in
> different locales, but it's probably a reasonable price to pay for not
> breaking an important expectation.  Another consequence would be
> making users open to IDN homograph attacks, but I don't know if that's
> a problem in the context of creating file names (IDN is normally
> defined as a misrepresentation of who you communicate with).

Aren't we already open to this? That is, if someone directs us to
www.microsoft.com, where the "o" of "soft" is replaced by its look-alike
in cyrillic, and our DNS server happens to respect IDNs represented
literally (instead of translated into the ASCII "punycode" format, as
they will be when we support IDNs properly), that "o" in UTF-8 would be
0xD0 0xBE, and so wouldn't get percent-encoded on the way in.

One way of dealing with this when we _do_ translate to punycode, would
be to keep the punycode version for creation of the "hostname"
directory. Though that could be ugly in practice, at least for
especially non-latin domain names.

The best way of dealing with homographs, though, is to only use IRIs
from trusted sources (usually: type them in).

> It could be made to recognize UTF-8 character
> sequences in UTF-8 locales and exempt valid UTF-8 chars from being
> treated as "control" characters.  Invalid UTF-8 chars would still pass
> all the checks, and non-canonical UTF-8 sequences would be "rejected"
> (by condemning their byte values to being escaped as %..).  This is
> not much work for someone who understands the basics of UTF-8.

Right. If the high-bit isn't set, it's ASCII; if it is set, then you can
tell by context which high-bits ought to be set in its neighbors.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBWZj7M8hyUobTrERCEL5AJ9Yh7ctcpGUus67WeakvUxzzcbR6wCfV42N
9LXCwW29U/S5QclTl9UTSGg=
=Giga
-END PGP SIGNATURE-


Re: bug in escaped filename calculation?

2007-10-04 Thread Hrvoje Niksic
Micah Cowan <[EMAIL PROTECTED]> writes:

> It is actually illegal to specify byte values outside the range of
> ASCII characters in a URL, but it has long been historical practice
> to do so anyway. In most cases, the intended meaning was one of the
> latin character sets (usually latin1), so Wget was right to do as it
> does, at that time.

Your explanation is spot-on.  I would only add that Wget's
interpretation of what is a "control" character is not so much geared
toward Latin 1 as it is geared toward maximum safety.  Originally I
planned to simply encode *all* file name characters outside the 32-127
range, but in practice it was very annoying (not to mention
US-centric) to encode perfectly valid Latin 1/2/3/... as %xx.  Since
the codes 128-159 *are* control characters (in those charsets) that
can mess up your screen and that you wouldn't want seen by default, I
decided to encode them by default, but allow for a way to turn it off,
in case someone used a different charset.

In the long run, supporting something like IRL is surely the right
thing to go for, but I have a feeling that we'll be stuck with the
current messy URLs for quite some time to come.  So Wget simply needs
to adapt to the current circumstances.  If the locale includes "UTF-8"
in any shape or form, it is perfectly safe to assume that it's valid
to create UTF-8 file names.  Of course, we don't know if a particular
URL path sequence is really meant to be UTF-8, but there should be no
harm in allowing valid UTF-8 sequences to pass through.  In other
words, the default "quote control" policy could simply be smarter
about what "control" means.

One consequence would be that Wget creates differently-named files in
different locales, but it's probably a reasonable price to pay for not
breaking an important expectation.  Another consequence would be
making users open to IDN homograph attacks, but I don't know if that's
a problem in the context of creating file names (IDN is normally
defined as a misrepresentation of who you communicate with).

For those who want to hack on this, the place to look at is
url.c:append_uri_pathel; that strangely-named function takes a path
element (a directory name or file name component of the URL) and
appends it to the file name.  It takes care not to ever use ".." as a
path component and to respect the --restrict-file-names setting as
specified by the user.  It could be made to recognize UTF-8 character
sequences in UTF-8 locales and exempt valid UTF-8 chars from being
treated as "control" characters.  Invalid UTF-8 chars would still pass
all the checks, and non-canonical UTF-8 sequences would be "rejected"
(by condemning their byte values to being escaped as %..).  This is
not much work for someone who understands the basics of UTF-8.


Re: bug in escaped filename calculation?

2007-10-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Brian Keck wrote:
> Hello,
> 
> I'm wondering if I've found a bug in the excellent wget.
> I'm not asking for help, because it turned out not to be the reason
> one of my scripts was failing.
> 
> The possible bug is in the derivation of the filename from a URL which
> contains UTF-8.
> 
> The case is:
> 
>   wget http://en.wikipedia.org/wiki/%C3%87atalh%C3%B6y%C3%BCk
> 
> Of course these are all ascii characters, but underlying it are
> 3 nonascii characters, whose UTF-8 encoding is:
> 
>   hexoctal name
>     ---  -
>   C387  303 274  C-cedilla
>   C3B6  303 266  o-umlaut
>   C3BC  303 274  u-umlaut
> 
> The file created has a name that's almost, but not quite, a valid UTF-8
> bytestring ... 
> 
>   ls *y*k | od -tc
>   000 303   %   8   7   a   t   a   l   h 303 266   y 303 274   k  \n
> 
> Ie the o-umlaut & u-umlaut UTF-8 encodings occur in the bytestring,
> but the UTF-8 encoding of C-cedilla has its 2nd byte replaced by the
> 3-byte string "%87".

Using --restrict=nocontrol will do what you want it to, in this instance.

> I'm guessing this is not intended.  

Actually, it is (more-or-less).

Realize that Wget really has no idea how to tell whether you're trying
to give it UTF-8, or one of the ISO latin charsets. It tends to assume
the latter. It also, by default, will not create filenames with control
characters in them. In ISO latin, characters in the range 0x80-0x9f are
control characters, which is why Wget left %87 escaped, which falls into
that range, but not the others, which don't.

It is actually illegal to specify byte values outside the range of ASCII
characters in a URL, but it has long been historical practice to do so
anyway. In most cases, the intended meaning was one of the latin
character sets (usually latin1), so Wget was right to do as it does, at
that time.

There is now a standard for representing Unicode values in URLs, whose
result is then called IRLs (Internationalized Resource Locators).
Conforming correctly to this standard would require that Wget be
sensitive to the context and encoding of documents in which it finds
URLs; in the case of filenames and command arguments, it would probably
also require sensitivity to the current locale as determined by
environment variables. Wget is simply not equipped to handle IRLs or
encoding issues at the moment, so until it is, a proper fix will not be
in place. Addressing these are considered a "Wget 2.0" (next-generation
Wget functionality) priority, and probably won't be done for a year or
two, given that the number of developers involved with Wget, if you add
up all the part-time helpers (including me), is probably still less than
one full-time dev. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBSHX7M8hyUobTrERCKRLAJwKiDOo0uO7x/k/iAEB/W0pPQmUJQCfUHaP
c6k2490strgy1Efy1DmiOhA=
=7lvZ
-END PGP SIGNATURE-


Re: bug in escaped filename calculation?

2007-10-04 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Josh Williams wrote:
> On 10/4/07, Brian Keck <[EMAIL PROTECTED]> wrote:
>> I would have sent a fix too, but after finding my way through http.c &
>> retr.c I got lost in url.c.
> 
> You and me both. A lot of the code needs re-written.. there's a lot of
> spaghetti code in there. I hope Micah chooses to do a complete
> re-write for version 2 so I can get my hands dirty and understand the
> code better.

Currently, I'm planning on refactoring what exists, as needed, rather
than going for a complete rewrite. This will be driven by unit-tests, to
try to ensure that we do not lose functionality along the way. This
involves more work overall, but IMO has these key advantages:

 * as mentioned, it's easier to prevent functionality loss,
 * we will be able to use the work as its written, instead of waiting
many months for everything to be finished (especially with the current
number of developers), and
 * AIUI, the wording of employer copyright assignment releases may not
apply to new works that are not _preexisting_ as GPL works. This means
that, if a rewrite ended up using no code whatsoever from the original
work (not likely, but...), there could be legal issues.

After 1.11 is released (or possibly before), one of my top priorities is
to clean up the gethttp and http_loop functions to a degree where they
can be much more readily read and understood (and modified!). This is
important to me because so far (in my
probably-not-statistically-significant 3 months as maintainer) a
majority of the trickier fixes have been in those two functions. Some of
these fixes seem to frequently introduce bugs of their own, and I spend
more time than seems right in trying to understand the code there, which
is why these particular functions are prime targets for refactoring. :)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBR7E7M8hyUobTrERCCrbAJ9Jw7LB/YW4myDOyPiHvXLZ13rkNQCeOVbf
5INV0ApmUTuzxp8zO5haVCA=
=EeEd
-END PGP SIGNATURE-


Re: bug in escaped filename calculation?

2007-10-04 Thread Josh Williams
On 10/4/07, Brian Keck <[EMAIL PROTECTED]> wrote:
> I would have sent a fix too, but after finding my way through http.c &
> retr.c I got lost in url.c.

You and me both. A lot of the code needs re-written.. there's a lot of
spaghetti code in there. I hope Micah chooses to do a complete
re-write for version 2 so I can get my hands dirty and understand the
code better.


Re: bug and "patch": blank spaces in filenames causes looping

2007-07-15 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
> 
> On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote:
> 
>>
>>> sprintf(filecopy, "\"%.2047s\"", file);
>>
>> This fix breaks the FTP protocol, making wget instantly stop working
>> with many conforming servers, but apparently start working with yours;
>> the RFCs are very clear that the file name argument starts right after
>> the string "RETR "; the very next character is part of the file name,
>> including if the next character is a space (or a quote). The file name
>> is terminated by the CR LF sequence (which implies that the sequence CR
>> LF may not occcur in the filename). Therefore, if you ask for a file
>> "file.txt", a conforming server will attempt to find and deliver a file
>> whose name begins and ends with double-quotes.
>>
>> Therefore, this seems like a server problem.
> 
> I think you may well be correct.  I am now unable to reproduce the
> problem where the server does not recognize a filename unless I give it
> quotes.  In fact, as you say, the server ONLY recognizes filenames
> WITHOUT quotes and quoting breaks it.  I had to revert to the non-quoted
> code to get proper behavior.  I am very confused now.  I apologize
> profusely for wasting your time.  How embarrassing!

No worries, it happens! Sometimes the tests we run go other than we
think they did. :)
> 
> I'll save this email, and if I see the behavior again, I will provide
> you with the details you requested below.

That would be terrific, thanks.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGmpOD7M8hyUobTrERCA7FAJ4oygvX7rpQy1k5FL7j3R12LUdWUACfVHrc
sk1tpS12pDYBvVbD4Nv7/I4=
=KCxk
-END PGP SIGNATURE-


Re: bug and "patch": blank spaces in filenames causes looping

2007-07-15 Thread Josh Williams

On 7/15/07, Rich Cook <[EMAIL PROTECTED]> wrote:

I think you may well be correct.  I am now unable to reproduce the
problem where the server does not recognize a filename unless I give
it quotes.  In fact, as you say, the server ONLY recognizes filenames
WITHOUT quotes and quoting breaks it.  I had to revert to the non-
quoted code to get proper behavior.  I am very confused now.  I
apologize profusely for wasting your time.  How embarrassing!

I'll save this email, and if I see the behavior again, I will provide
you with the details you requested below.


I wouldn't say it was a waste of time. Actually, I think it's good for
us to know that this problem exists on some servers. We're considering
writing a patch to recognise servers that do not support spaces. If
the standard method fails, then it will retry as an escaped character.

Nothing has been written for this yet, but it has been discussed, and
may be implemented in the future.


Re: bug and "patch": blank spaces in filenames causes looping

2007-07-15 Thread Rich Cook


On Jul 13, 2007, at 12:29 PM, Micah Cowan wrote:




sprintf(filecopy, "\"%.2047s\"", file);


This fix breaks the FTP protocol, making wget instantly stop working
with many conforming servers, but apparently start working with yours;
the RFCs are very clear that the file name argument starts right after
the string "RETR "; the very next character is part of the file name,
including if the next character is a space (or a quote). The file name
is terminated by the CR LF sequence (which implies that the  
sequence CR

LF may not occcur in the filename). Therefore, if you ask for a file
"file.txt", a conforming server will attempt to find and deliver a  
file

whose name begins and ends with double-quotes.

Therefore, this seems like a server problem.


I think you may well be correct.  I am now unable to reproduce the  
problem where the server does not recognize a filename unless I give  
it quotes.  In fact, as you say, the server ONLY recognizes filenames  
WITHOUT quotes and quoting breaks it.  I had to revert to the non- 
quoted code to get proper behavior.  I am very confused now.  I  
apologize profusely for wasting your time.  How embarrassing!


I'll save this email, and if I see the behavior again, I will provide  
you with the details you requested below.




Could you please provide the following:
  1. The version of wget you are running (wget --version)
  2. The exact command line you are using to invoke wget
  3. The output of that same command line, run with --debug



--
Rich "wealthychef" Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.





Re: bug and "patch": blank spaces in filenames causes looping

2007-07-13 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
> On OS X, if a filename on the FTP server contains spaces, and the remote
> copy of the file is newer than the local, then wget gets thrown into a
> loop of "No such file or directory" endlessly.   I have changed the
> following in ftp-simple.c, and this fixes the error.
> Sorry, I don't know how to use the proper patch formatting, but it
> should be clear.

I and another developer could not reproduce this problem, either in the
current trunk or in wget 1.10.2.

> sprintf(filecopy, "\"%.2047s\"", file);

This fix breaks the FTP protocol, making wget instantly stop working
with many conforming servers, but apparently start working with yours;
the RFCs are very clear that the file name argument starts right after
the string "RETR "; the very next character is part of the file name,
including if the next character is a space (or a quote). The file name
is terminated by the CR LF sequence (which implies that the sequence CR
LF may not occcur in the filename). Therefore, if you ask for a file
"file.txt", a conforming server will attempt to find and deliver a file
whose name begins and ends with double-quotes.

Therefore, this seems like a server problem.

Could you please provide the following:
  1. The version of wget you are running (wget --version)
  2. The exact command line you are using to invoke wget
  3. The output of that same command line, run with --debug

Thank you very much.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGl9KT7M8hyUobTrERCJfoAJ91z9c2GniuoaX0mj9oqzHrrpNCtQCePQnm
lvbVe0i5/jVy9V10uQpYgmk=
=iQq1
-END PGP SIGNATURE-


Re: [bug #20323] Wget issues HEAD before GET, even when the file doesn't exist locally.

2007-07-12 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Mauro Tortonesi wrote:
> Micah Cowan ha scritto:
>> Update of bug #20323 (project wget):
>>
>>   Status:  Ready For Test => In
>> Progress   
>> ___
>>
>> Follow-up Comment #3:
>>
>> Moving back to In Progress until some questions about the logic are
>> answered:
>>
>> http://addictivecode.org/pipermail/wget-notify/2007-July/75.html
>> http://addictivecode.org/pipermail/wget-notify/2007-July/77.html
> 
> thanks micah.
> 
> i have partly misunderstood the logic behind preliminary HEAD request.
> in my code, HEAD is skipped if -O or --no-content-disposition are given,
> but if -N is given HEAD is always sent. this is wrong, as HEAD should be
> skipped even if -N and --no-content-disposition are given (no need to
> care about the deprecated -N -O combination). can't think of any other
> case in which HEAD should be skipped, though.

Cc'ing wget ML, as it's probably important to open up discussion of the
current logic.

What about the case when nothing is given on the command line except
- --no-content-disposition? What do we need HEAD for then?

Also: I don't believe HEAD should be sent if no options are given on the
command line. What purpose would that serve? If it's to find a possible
Content-Disposition header, we can get that (and more reliably) at GET
time (though, I believe we may currently be requiring the file name
before we fetch, which if true, should definitely be changed but not for
1.11, in which case the HEAD will be allowed for the time being); and
since we're not matching against potential accept/reject lists, we don't
really need it.

I think it really makes much more sense to enumerate those few cases
where we need to issue a HEAD, rather than try to determine all the
cases where we don't: if I have to choose a side to err on, I'd rather
not send HEAD in a case or two where we needed it, rather than send it
in a few where we didn't, as any request-response cycle eats up time. I
also believe that the cases where we want a HEAD are/should be fewer
than the cases where we don't want them.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGlol+7M8hyUobTrERCOT0AJwNt2dm/80zL7UYbadBaiaPrMvSUQCePKmS
WO77ltxl0vr0Pcgd8H1bIY8=
=zCTU
-END PGP SIGNATURE-


Re: Bug update notifications

2007-07-09 Thread Matthew Woehlke

Micah Cowan wrote:

Matthew Woehlke wrote:

Micah Cowan wrote:
...any reason to not CC bug updates here also/instead? That's how e.g.
kwrite does thing (also several other lists AFAIK), and seems to make
sense. This is 'bug-wget' after all :-).


It is; but it's also 'wget'.


Hmm, so it is; my bad :-).


While I agree that it probably makes sense
to send it to a bugs discussion list, this list is a combination
bugs/development/support/general discussion list, and I'm not certain
it's appropriate to bump up the traffic level for this.

Still, if there are enough folks that would like to get these updates
(without also seeing commit notifications), perhaps we could craft a
second list for this (or, alternatively, split off the "main
discussion/support list" from the "bugs" list)?


I guess a common pattern is:
foo-help
foo-devel
foo-commits

...but of course you're the maintainer, it's your call :-).
(The above aren't necessarily "actual names" of course, just the 
categories it seems like I'm most used to seeing. e.g. the GNU 
convention is of course bug-foo, not foo-devel.)


--
Matthew
This .sig is false




Re: Bug update notifications

2007-07-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Matthew Woehlke wrote:
> Micah Cowan wrote:
>> The wget-notify mailing list
>> (http://addictivecode.org/mailman/listinfo/wget-notify) will now also be
>> receiving notifications of bug updates from GNU Savannah, in addition to
>>  subversion commits.
> 
> ...any reason to not CC bug updates here also/instead? That's how e.g.
> kwrite does thing (also several other lists AFAIK), and seems to make
> sense. This is 'bug-wget' after all :-).

It is; but it's also 'wget'. While I agree that it probably makes sense
to send it to a bugs discussion list, this list is a combination
bugs/development/support/general discussion list, and I'm not certain
it's appropriate to bump up the traffic level for this.

Still, if there are enough folks that would like to get these updates
(without also seeing commit notifications), perhaps we could craft a
second list for this (or, alternatively, split off the "main
discussion/support list" from the "bugs" list)?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkrpK7M8hyUobTrERCIMaAKCDG8JN7DmUK7oIuE0fYmgYnZIrlgCghK7n
iV8rIDYe1+cxzrQATM43CEM=
=PKqt
-END PGP SIGNATURE-


Re: Bug update notifications

2007-07-09 Thread Matthew Woehlke

Micah Cowan wrote:

The wget-notify mailing list
(http://addictivecode.org/mailman/listinfo/wget-notify) will now also be
receiving notifications of bug updates from GNU Savannah, in addition to
 subversion commits.


...any reason to not CC bug updates here also/instead? That's how e.g. 
kwrite does thing (also several other lists AFAIK), and seems to make 
sense. This is 'bug-wget' after all :-).


--
Matthew
This .sig is false



Re: bug and "patch": blank spaces in filenames causes looping

2007-07-06 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Steven M. Schweda wrote:
>>From :
> 
>> [...]
>>char filecopy[2048];
>>if (file[0] != '"') {
>>  sprintf(filecopy, "\"%.2047s\"", file);
>>} else {
>>  strncpy(filecopy, file, 2047);
>>}
>> [...]
>> It should be:
>>
>>  sprintf(filecopy, "\"%.2045s\"", file);
>> [...]
> 
>I'll admit to being old and grumpy, but am I the only one who
> shudders when one small code segment contains "2048", "2047", and "2045"
> as separate, independent literal constants, instead of using a macro, or
> "sizeof", or something which would let the next fellow change one buffer
> size in one place, instead of hunting all over the code looking for
> every "20xx" which might be related?

Well, as already mentioned, aprintf() would be much more appropriate, as
it elminates the need for constants like these.

And yes, "magic numbers" drive me crazy, too. Of course, when used with
printf's 's' specifier, it needs special handling (crafting a STR()
macro or somesuch).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjxcX7M8hyUobTrERCHSAAJ9VkQdfhK4/LwByseYH2ZYVzoPqPwCePU1k
2Llybpq/oceXWMyZpBO4bPY=
=Vj/R
-END PGP SIGNATURE-


Re: bug and "patch": blank spaces in filenames causes looping

2007-07-06 Thread Steven M. Schweda
>From :

> [...]
>char filecopy[2048];
>if (file[0] != '"') {
>  sprintf(filecopy, "\"%.2047s\"", file);
>} else {
>  strncpy(filecopy, file, 2047);
>}
> [...]
> It should be:
> 
>  sprintf(filecopy, "\"%.2045s\"", file);
> [...]

   I'll admit to being old and grumpy, but am I the only one who
shudders when one small code segment contains "2048", "2047", and "2045"
as separate, independent literal constants, instead of using a macro, or
"sizeof", or something which would let the next fellow change one buffer
size in one place, instead of hunting all over the code looking for
every "20xx" which might be related?

   Just a thought.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook

Thanks for the follow up.  :-)

On Jul 5, 2007, at 3:52 PM, Micah Cowan wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:

So forgive me for a newbie-never-even-lurked kind of question:  will
this fix make it into wget for other users (and for me in the  
future)?

Or do I need to do more to make that happen, or...?  Thanks!


Well, I need a chance to look over the patch, run some tests, etc, to
see if it really covers everything it should (what about other,
non-space characters?).

The fix (or one like it) will probably make it into Wget at some  
point,
but I wouldn't expect it to come out in the next release (which,  
itself,
will not be arriving for a couple months); it will probably go into  
wget

1.12.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o
oWDlelFyfvvTlhtlDpLYLXM=
=DZ8v
-END PGP SIGNATURE-


--
✐"There's no time to stop for gas, we're already late"-- Karin Donker
--
Rich "wealthychef" Cook

925-784-3077
--
✐



Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Bruso, John wrote:
> Please remove me from this list. thanks,

Nobody on this list has the ability to do this, unfortunately (Wget
maintainership is separate from the maintainers of this list). To
further confuse the issue, [EMAIL PROTECTED] is actually just an alias to
wget@sunsite.dk, which is the one you're actually subscribed to.

To unsubscribe, send an email to [EMAIL PROTECTED]; it will
send you a confirmation email that you'll need to reply to before you'll
actually be unsubscribed.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjXaB7M8hyUobTrERCPluAJ96Uig9uMkHSeA8G5iRDPT2HDtaEQCffeN/
s+U0CnIY5oHYXWSwa6HXVBg=
=0gDG
-END PGP SIGNATURE-


Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Rich Cook wrote:
> So forgive me for a newbie-never-even-lurked kind of question:  will
> this fix make it into wget for other users (and for me in the future)? 
> Or do I need to do more to make that happen, or...?  Thanks!

Well, I need a chance to look over the patch, run some tests, etc, to
see if it really covers everything it should (what about other,
non-space characters?).

The fix (or one like it) will probably make it into Wget at some point,
but I wouldn't expect it to come out in the next release (which, itself,
will not be arriving for a couple months); it will probably go into wget
1.12.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjXYj7M8hyUobTrERCI5JAJ0UIDGzQsC8xCI3lK26pzzQ+BkS6ACgj16o
oWDlelFyfvvTlhtlDpLYLXM=
=DZ8v
-END PGP SIGNATURE-


Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook
So forgive me for a newbie-never-even-lurked kind of question:  will  
this fix make it into wget for other users (and for me in the  
future)?  Or do I need to do more to make that happen, or...?  Thanks!


On Jul 5, 2007, at 12:52 PM, Hrvoje Niksic wrote:


Rich Cook <[EMAIL PROTECTED]> writes:


On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:


Rich Cook <[EMAIL PROTECTED]> writes:


Trouble is, it's undocumented as to how to free the resulting
string.  Do I call free on it?


Yes.  "Freshly allocated with malloc" in the function documentation
was supposed to indicate how to free the string.


Oh, I looked in the source and there was this xmalloc thing that
didn't show up in my man pages, so I punted.  Sorry.


No problem.  Note that xmalloc isn't entirely specific to Wget, it's a
fairly standard GNU name for a malloc-or-die function.

Now I remembered that Wget also has xfree, so the above advice is not
entirely correct -- you should call xfree instead.  However, in the
normal case xfree is a simple wrapper around free, so even if you used
free, it would have worked just as well.  (The point of xfree is that
if you compile with DEBUG_MALLOC, you get a version that check for
leaks, although it should be removed now that there is valgrind, which
does the same job much better.  There is also the business of barfing
on NULL pointers, which should also be removed.)

I'd have implemented a portable asprintf, but I liked the aprintf
interface better (I first saw it in libcurl).


--
✐"There's no time to stop for gas, we're already late"-- Karin Donker
--
Rich "wealthychef" Cook

925-784-3077
--
✐



Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
Rich Cook <[EMAIL PROTECTED]> writes:

> On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:
>
>> Rich Cook <[EMAIL PROTECTED]> writes:
>>
>>> Trouble is, it's undocumented as to how to free the resulting
>>> string.  Do I call free on it?
>>
>> Yes.  "Freshly allocated with malloc" in the function documentation
>> was supposed to indicate how to free the string.
>
> Oh, I looked in the source and there was this xmalloc thing that
> didn't show up in my man pages, so I punted.  Sorry.

No problem.  Note that xmalloc isn't entirely specific to Wget, it's a
fairly standard GNU name for a malloc-or-die function.

Now I remembered that Wget also has xfree, so the above advice is not
entirely correct -- you should call xfree instead.  However, in the
normal case xfree is a simple wrapper around free, so even if you used
free, it would have worked just as well.  (The point of xfree is that
if you compile with DEBUG_MALLOC, you get a version that check for
leaks, although it should be removed now that there is valgrind, which
does the same job much better.  There is also the business of barfing
on NULL pointers, which should also be removed.)

I'd have implemented a portable asprintf, but I liked the aprintf
interface better (I first saw it in libcurl).


RE: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Bruso, John
Please remove me from this list. thanks,
 
John Bruso



From: Rich Cook [mailto:[EMAIL PROTECTED]
Sent: Thu 7/5/2007 12:30 PM
To: Hrvoje Niksic
Cc: Tony Lewis; [EMAIL PROTECTED]
Subject: Re: bug and "patch": blank spaces in filenames causes looping




On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:

> Rich Cook <[EMAIL PROTECTED]> writes:
>
>> Trouble is, it's undocumented as to how to free the resulting
>> string.  Do I call free on it?
>
> Yes.  "Freshly allocated with malloc" in the function documentation
> was supposed to indicate how to free the string.

Oh, I looked in the source and there was this xmalloc thing that 
didn't show up in my man pages, so I punted.  Sorry.

--
?"There's no time to stop for gas, we're already late"-- Karin Donker
--
Rich "wealthychef" Cook
<http://5pmharmony.com <http://5pmharmony.com/> >
925-784-3077
--
?





Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook


On Jul 5, 2007, at 11:08 AM, Hrvoje Niksic wrote:


Rich Cook <[EMAIL PROTECTED]> writes:


Trouble is, it's undocumented as to how to free the resulting
string.  Do I call free on it?


Yes.  "Freshly allocated with malloc" in the function documentation
was supposed to indicate how to free the string.


Oh, I looked in the source and there was this xmalloc thing that  
didn't show up in my man pages, so I punted.  Sorry.


--
✐"There's no time to stop for gas, we're already late"-- Karin Donker
--
Rich "wealthychef" Cook

925-784-3077
--
✐



Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
Rich Cook <[EMAIL PROTECTED]> writes:

> Trouble is, it's undocumented as to how to free the resulting
> string.  Do I call free on it?

Yes.  "Freshly allocated with malloc" in the function documentation
was supposed to indicate how to free the string.


Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
"Virden, Larry W." <[EMAIL PROTECTED]> writes:

> "Tony Lewis" <[EMAIL PROTECTED]> writes:
>
>> Wget has an `aprintf' utility function that allocates the result on
> the heap.  Avoids both buffer overruns and 
>> arbitrary limits on file name length.
>
> If it uses the heap, then doesn't that open a hole where a particularly
> long file name would overflow the heap?

No, aprintf tries to allocate as much memory as necessary.  If the
memory is unavailable, malloc returns NULL and Wget exits.


Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook
Trouble is, it's undocumented as to how to free the resulting  
string.  Do I call free on it?  I'd use asprintf, but I'm afraid to  
suggest that here as it may not be portable.


On Jul 5, 2007, at 10:45 AM, Hrvoje Niksic wrote:


"Tony Lewis" <[EMAIL PROTECTED]> writes:

There is a buffer overflow in the following line of the proposed  
code:


 sprintf(filecopy, "\"%.2047s\"", file);


Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and arbitrary limits on file
name length.


--
Rich "wealthychef" Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.





RE: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Virden, Larry W.
 


-Original Message-
From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] 

"Tony Lewis" <[EMAIL PROTECTED]> writes:

> Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and 
> arbitrary limits on file name length.

If it uses the heap, then doesn't that open a hole where a particularly
long file name would overflow the heap?

-- 
http://wiki.tcl.tk/ >
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.
mailto:[EMAIL PROTECTED] > http://www.purl.org/NET/lvirden/
>
 


Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Hrvoje Niksic
"Tony Lewis" <[EMAIL PROTECTED]> writes:

> There is a buffer overflow in the following line of the proposed code:
>
>  sprintf(filecopy, "\"%.2047s\"", file);

Wget has an `aprintf' utility function that allocates the result on
the heap.  Avoids both buffer overruns and arbitrary limits on file
name length.


Re: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Rich Cook
Good point, although it's only a POTENTIAL buffer overflow, and it's  
limited to 2 bytes, so at least it's not exploitable.  :-)



On Jul 5, 2007, at 9:05 AM, Tony Lewis wrote:


There is a buffer overflow in the following line of the proposed code:

 sprintf(filecopy, "\"%.2047s\"", file);

It should be:

 sprintf(filecopy, "\"%.2045s\"", file);

in order to leave room for the two quotes.

Tony
-Original Message-
From: Rich Cook [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 04, 2007 10:18 AM
To: [EMAIL PROTECTED]
Subject: bug and "patch": blank spaces in filenames causes looping

On OS X, if a filename on the FTP server contains spaces, and the
remote copy of the file is newer than the local, then wget gets
thrown into a loop of "No such file or directory" endlessly.   I have
changed the following in ftp-simple.c, and this fixes the error.
Sorry, I don't know how to use the proper patch formatting, but it
should be clear.

==
the beginning of ftp_retr:
=
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;

   /* Send RETR request.  */
   request = ftp_request ("RETR", file);

==
becomes:
==
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;
   char filecopy[2048];
   if (file[0] != '"') {
 sprintf(filecopy, "\"%.2047s\"", file);
   } else {
 strncpy(filecopy, file, 2047);
   }

   /* Send RETR request.  */
   request = ftp_request ("RETR", filecopy);






--
Rich "wealthychef" Cook
925-784-3077
--
  it takes many small steps to climb a mountain, but the view gets
better all the time.


--
Rich "wealthychef" Cook
925-784-3077
--
 it takes many small steps to climb a mountain, but the view gets  
better all the time.





RE: bug and "patch": blank spaces in filenames causes looping

2007-07-05 Thread Tony Lewis
There is a buffer overflow in the following line of the proposed code:

 sprintf(filecopy, "\"%.2047s\"", file);

It should be:

 sprintf(filecopy, "\"%.2045s\"", file);

in order to leave room for the two quotes.

Tony
-Original Message-
From: Rich Cook [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 04, 2007 10:18 AM
To: [EMAIL PROTECTED]
Subject: bug and "patch": blank spaces in filenames causes looping

On OS X, if a filename on the FTP server contains spaces, and the  
remote copy of the file is newer than the local, then wget gets  
thrown into a loop of "No such file or directory" endlessly.   I have  
changed the following in ftp-simple.c, and this fixes the error.
Sorry, I don't know how to use the proper patch formatting, but it  
should be clear.

==
the beginning of ftp_retr:
=
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;

   /* Send RETR request.  */
   request = ftp_request ("RETR", file);

==
becomes:
==
/* Sends RETR command to the FTP server.  */
uerr_t
ftp_retr (int csock, const char *file)
{
   char *request, *respline;
   int nwritten;
   uerr_t err;
   char filecopy[2048];
   if (file[0] != '"') {
 sprintf(filecopy, "\"%.2047s\"", file);
   } else {
 strncpy(filecopy, file, 2047);
   }

   /* Send RETR request.  */
   request = ftp_request ("RETR", filecopy);






--
Rich "wealthychef" Cook
925-784-3077
--
  it takes many small steps to climb a mountain, but the view gets  
better all the time.



Re: bug storing cookies with wget

2007-06-03 Thread Matthias Vill
Matthias Vill schrieb:
> Mario Ander schrieb:
>> Hi everybody,
>>
>> I think there is a bug storing cookies with wget.
>>
>> See this command line:
>>
>> "C:\Programme\wget\wget" --user-agent="Opera/8.5 (X11;
>> U; en)" --no-check-certificate --keep-session-cookies
>> --save-cookies="cookie.txt" --output-document=-
>> --debug --output-file="debug.txt"
>> --post-data="name=xxx&password=&dummy=Internetkennwort&login.x=0&login.y=0"
>> "https://www.vodafone.de/proxy42/portal/login.po";
> [..]
>> Set-Cookie:
>> JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE;
>> path=/jsp 
>> Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de;
>> expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ 
>> Set-Cookie:
>> JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338;
>> path=/proxy42
> [..]
>> ---response end---
>> 200 OK
>> Attempt to fake the path: /jsp,
>> /proxy42/portal/login.po
> 
> So the problem seems to be that wget rejects cookies for paths which
> don't "fit" to the request url. Like the script you call is in
> /proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts
> those cookies, but wich is not related to /jsp
> 
> So it seems to be wget sticking to the strict RFC and the script doing
> wrong.
> To get this working you would need to patch wget for not RFC-compliant
> cookies maybe along with an "--accept-malformed-cookies" directiv.
> 
> Hope this helps you
> 
> Matthias
> 

So I thought of a second solution: If you have cygwin (or at least
bash+grep) you can run this small script to dublicate and truncate the
cookie.
--- CUT here ---
#!/bin/bash
#Author: Matthias Vill; feel free to change and use

#get the line for proxy42-path in $temp
temp=$(grep proxy42 cookies.txt)

#remove everything after last !
temp=${temp%!*}

#replace proxy42 by jsp
temp=${temp/proxy42/jsp}

#append newline to file
#echo >>cookies.txt

#add new cookie to cookies.txt
echo $temp>>cookies.txt
--- CUT here ---
Maybe you need to remove the "#" in front of "echo >>cookies.txt" to
compensate a missing trailing newline; otherwise you may end up changing
the value of the previous cookie.

Maybe this helps even more

Matthias


Re: bug storing cookies with wget

2007-06-03 Thread Matthias Vill
Mario Ander schrieb:
> Hi everybody,
> 
> I think there is a bug storing cookies with wget.
> 
> See this command line:
> 
> "C:\Programme\wget\wget" --user-agent="Opera/8.5 (X11;
> U; en)" --no-check-certificate --keep-session-cookies
> --save-cookies="cookie.txt" --output-document=-
> --debug --output-file="debug.txt"
> --post-data="name=xxx&password=&dummy=Internetkennwort&login.x=0&login.y=0"
> "https://www.vodafone.de/proxy42/portal/login.po";
[..]
> Set-Cookie:
> JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE;
> path=/jsp 
> Set-Cookie: VODAFONELOGIN=1; domain=.vodafone.de;
> expires=Friday, 01-Jun-2007 15:05:16 GMT; path=/ 
> Set-Cookie:
> JSESSIONID=GgvG9LMyqLdpQKy11QLHdvN2QGgQrCGC9LsXSh42gkQrdbDnTGVQ!-249032648!NONE!1180705316338;
> path=/proxy42
[..]
> ---response end---
> 200 OK
> Attempt to fake the path: /jsp,
> /proxy42/portal/login.po

So the problem seems to be that wget rejects cookies for paths which
don't "fit" to the request url. Like the script you call is in
/proxy42/portal/, which is a subdir of /proxy42 an / so wget accepts
those cookies, but wich is not related to /jsp

So it seems to be wget sticking to the strict RFC and the script doing
wrong.
To get this working you would need to patch wget for not RFC-compliant
cookies maybe along with an "--accept-malformed-cookies" directiv.

Hope this helps you

Matthias


Re: Bug using recursive get and stdout

2007-04-17 Thread Steven M. Schweda
   A quick search at "http://www.mail-archive.com/wget@sunsite.dk/"; for
"-O" found:

  http://www.mail-archive.com/wget@sunsite.dk/msg08746.html
  http://www.mail-archive.com/wget@sunsite.dk/msg08748.html

   The way "-O" is implemented, there are all kinds of things which are
incompatible with it, "-r" among them.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Bug in 1.10.2 vs 1.9.1

2007-01-03 Thread Mauro Tortonesi

Juhana Sadeharju wrote:

Hello. Wget 1.10.2 has the following bug compared to version 1.9.1.
First, the bin/wgetdir is defined as
  wget -p -E -k --proxy=off -e robots=off --passive-ftp
  -o zlogwget`date +%Y%m%d%H%M%S` -r -l 0 -np -U Mozilla --tries=50
  --waitretry=10 $@

The download command is
  wgetdir http://udn.epicgames.com

Version 1.9.1 result: download ok
Version 1.10.2 result: only udn.epicgames.com/Main/WebHome downloaded
and other converted urls are of the form
  http://udn.epicgames.com/../Two/WebHome


hi juhana,

could you please try the current version of wget from our subversion 
repository:


http://www.gnu.org/software/wget/wgetdev.html#development

?

this bug should be fixed in the new code.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: BUG - .listing has sprung into existence

2006-10-30 Thread Steven M. Schweda
From: Sebastian

   "Doctor, it hurts when I do this."

   "Don't do that."



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: Bug

2006-09-15 Thread Mauro Tortonesi

Reece ha scritto:

Found a bug (sort of).

When trying to get all the images in the directory below:
http://www.netstate.com/states/maps/images/

It gives 403 Forbidden errors for most of the images even after
setting the agent string to firefox's, and setting -e robots=off

After a packet capture, it appears that the site will give the
forbidden error if the Refferer is not exaclty correct.  However,
since wget actually uses the domain www.netstate.com:80 instead of
without the port, it screws it all up.  I've been unable to find any
way to tell wget not to insert the port in the requesting url and
referrer url.

Here is the full command I was using:

wget -r -l 1 -H -U "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT
5.0)" -e robots=off -d -nh http://www.netstate.com/states/maps/images/


hi reece,

that's an interesting bug. i've just added it to my "THINGS TO FIX" list.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: bug/feature request

2006-07-26 Thread Marc Schoechlin
Hi !

Maybe you can add this patch to your mainline-tree:

http://www.mail-archive.com/wget%40sunsite.dk/msg09142.html

Best regards

Marc Schoechlin

On Wed, Jul 26, 2006 at 07:26:45AM +0200, Marc Schoechlin wrote:
> Date: Wed, 26 Jul 2006 07:26:45 +0200
> From: Marc Schoechlin <[EMAIL PROTECTED]>
> Subject: bug/feature request
> To: [EMAIL PROTECTED]
> 
> Hi,
> 
> i´m not sure if that is a feature request or a bug.
> Wget does not collect all page requisites of a given URL.
> Many sites are referencing components of these sites in cascading style 
> sheets,
> but wget does not collect these components as page requisites.
> 
> A example:
> ---
> $ wget -q -p -k -nc -x --convert-links \
>   http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496901
> $ find . -name "*.css"
> ./aspn.activestate.com/ASPN/static/aspn.css
> $ grep  "url(" ./aspn.activestate.com/ASPN/static/aspn.css
> list-style-image: url(/ASPN/img/dot_A68C53_8x8_.gif);
>background-image: url(/ASPN/img/ads/ASPN_banner_bg.gif);
>background-image: url('/ASPN/img/ads/ASPN_komodo_head.gif');
> background-image: url('/ASPN/img/ads/ASPN_banner_bottom.gif');
> $ find . -name "ASPN_banner_bg.gif" || echo "not found"
> ---
> 
> A solution for this problem would to parse all collected *.css files
> for lines which match for "url(.*)" and to collect these files.
> 
> Best regards
> 
> Marc Schoechlin
> -- 
> I prefer non-proprietary document-exchange.
> http://sector7g.wurzel6.de/pdfcreator/
> http://www.prooo-box.org/
> Contact me via jabber: [EMAIL PROTECTED]

-- 
I prefer non-proprietary document-exchange.
http://sector7g.wurzel6.de/pdfcreator/
http://www.prooo-box.org/
Contact me via jabber: [EMAIL PROTECTED]


Re: Bug in wget 1.10.2 makefile

2006-07-17 Thread Mauro Tortonesi

Daniel Richard G. ha scritto:

Hello,

The MAKEDEFS value in the top-level Makefile.in also needs to include 
DESTDIR='$(DESTDIR)'.


fixed, thanks.

--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: BUG

2006-07-10 Thread Mauro Tortonesi

Tony Lewis ha scritto:


Run the command with -d and post the output here.


in this case, -S can provide more useful information than -d. be careful to 
 obfuscate passwords, though!!!


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


RE: BUG

2006-07-03 Thread Tony Lewis
Title: RE: BUG






Run the command with -d and post the output here.


Tony

_ 

From:   Junior + Suporte [mailto:[EMAIL PROTECTED]] 

Sent:   Monday, July 03, 2006 2:00 PM

To: [EMAIL PROTECTED]

Subject:    BUG


Dear,


I using wget to send login request to a site, when wget is saving the cookies, the following error message appear:


Error in Set-Cookie, field `Path'Syntax error in Set-Cookie: tu=661541|802400391

@TERRA.COM.BR; Expires=Thu, 14-Oct-2055 20:52:46 GMT; Path= at position 78.

Location: http://www.tramauniversitario.com.br/servlet/login.jsp?username=802400

391%40terra.com.br&pass=123qwe&rd=http%3A%2F%2Fwww.tramauniversitario.com.br%2Ft

uv2%2Fenquete%2Fcb%2Fsul%2Farte.jsp [following]


I trying to access URL http://www.tramauniversitario.com.br/tuv2/participe/login.jsp?rd=http://www.tramauniversitario.com.br/tuv2/enquete/cb/sul/arte.jsp&[EMAIL PROTECTED]&pass=123qwe&Submit.x=6&Submit.y=1

In Internet Explorer, this URL work correctly and the cookie is saved in the local machine, but in WGET, this cookie return an error. 

Thanks,


Luiz Carlos Zancanella Junior





RE: Bug in GNU Wget 1.x (Win32)

2006-06-22 Thread Herold Heiko
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Þröstur
> Sent: Wednesday, June 21, 2006 4:35 PM

There have been some reports in the past but I don't think it has been acted
upon; one of the problems is that the list of names can be extended at will
(beside the standard comx, lptx, con, prn). Maybe it is possible to query
the os about the currently active device names and rename the output files
if neccessary ?

>   I reproduced the bug with Win32 versions 1.5.dontremeber,
> 1.10.1 and 1.10.2. I did also test version 1.6 on Linux but it
> was not affected.

That is since the problem is generated by the dos/windows filesystem drivers
(or whatever those should be called), basically com1* and so on are
equivalent of unix device drivers, with the unfortunate difference of acting
in every directory. 

> 
> Example URLs that reproduce the bug :
> wget g/nul
> wget http://www.gnu.org/nul
> wget http://www.gnu.org/nul.html
> wget -o loop.end "http://www.gnu.org/nul.html";
> 
>   I know that the bug is associated with words which are
> devices in the windows console, but i don't understand
> why, since I tried to set the output file to something else.

I think you meant to use -O, not -o.
Doesn't solve the real problem but at least a workaround.

Heiko 

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED]
-- +39-041-5907073 / +39-041-5917073 ph
-- +39-041-5907472 / +39-041-5917472 fax


Re: BUG: wget with option -O creates empty files even if the remote file does not exist

2006-06-01 Thread Steven M. Schweda
From: Eduardo M KALINOWSKI

> wget http://www.somehost.com/nonexistant.html -O localfile.html
> 
> then file "localfile.html" will always be created, and will have length
> of zero even if the remote file does not exist.

   Because with "-O", Wget opens the output file before it does any
network activity, and after it's done, it closes the file and leaves it
there, regardless of its content (or lack of content).

   You could avoid "-O", and rename the file after the Wget command. 
You could keep the "-O", and check the status of the Wget command
(and/or check the output file size), and delete the file if it's no
good.  (And probably many other things, as well.)

   If you look through "http://www.mail-archive.com/wget@sunsite.dk/";,
you can find many people who think that "-O" should do something else,
but (for now) it does what it does.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Re: bug?

2006-05-16 Thread Hrvoje Niksic
"yy :)" <[EMAIL PROTECTED]> writes:

> I ran "wget -P /tmp/.test [1]http://192.168.1.10"; in SUSE system (SLES 9)
> and found that it saved the file in /tmp/_test.
> This command works fine inRedHat, is it a bug?

I believe the bug is introduced by SuSE in an attempt to "protect" the
user.  Try reporting it to them.


Re: Bug in ETA code on x64

2006-04-03 Thread Hrvoje Niksic
Thomas Braby <[EMAIL PROTECTED]> writes:

>> eta_hrs = (int) (eta / 3600), eta %= 3600;
>
> Yes that also works. The cast is needed on Windows x64 because eta is 
> a wgint (which is 64-bit) but a regular int is 32-bit so otherwise a 
> warning is issued.

The same is the case on 32-bit Windows, and also on Linux.  I don't
see the value in that warning.  Maybe we can disable it with a
compiler flag?

> Oh well. Perhaps it would be better changed to use a semicolon for
> clarity anyway?

Note that, without the cast, both semicolon and comma work equally well.


Re: Bug in ETA code on x64

2006-04-03 Thread Thomas Braby


- Original Message -
From: Hrvoje Niksic <[EMAIL PROTECTED]>
Date: Tuesday, March 28, 2006 7:23 pm

> > in progress.c line 880:
> >
> >eta_hrs = (int)(eta / 3600, eta %= 3600);
> >eta_min = (int)(eta / 60, eta %= 60);
> >eta_sec = (int)(eta);
> 
> This is weird.  Did you compile the code yourself, or did you get it

Yes that is strange. I got the code from one of the GNU mirrors, but 
I'm afraid I can't remember which one.

> from a Windows download site?  I'm asking because the code in
> progress.c doesn't look like that; it in fact looks like this:
> 
>  eta_hrs = eta / 3600, eta %= 3600;
>  eta_min = eta / 60,   eta %= 60;
>  eta_sec = eta;
> 
> The cast to int looks like someone was trying to remove a warning and
> botched operator precedence in the process.  If you must insert the
> cast, try:
> 
> eta_hrs = (int) (eta / 3600), eta %= 3600;

Yes that also works. The cast is needed on Windows x64 because eta is 
a wgint (which is 64-bit) but a regular int is 32-bit so otherwise a 
warning is issued. Oh well. Perhaps it would be better changed to use 
a semicolon for clarity anyway?

cheers,


Re: Bug report

2006-04-01 Thread Frank McCown

Gary Reysa wrote:

Hi,

I don't really know if this is a Wget bug, or some problem with my 
website, but, either way, maybe you can help.


I have a web site ( www.BuildItSolar.com ) with perhaps a few hundred 
pages (260MB of storage total).  Someone did a Wget on my site, and 
managed to log 111,000 hits and 58,000 page views (using more than a GB 
of bandwidth).


I am wondering how this can happen, since the number of page views is 
about 200 times the number of pages on my site??


Is there something I can do to prevent this?  Is there something about 
the organization of my website that is causing Wget to get stuck in a loop?


I've never used Wget, but I am guessing that this guy really did not 
want 50,000+ pages -- do you provide some way for the user to shut 
itself down when it reaches some reasonable limit?


My website is non-commercial, and provides a lot of information that 
people find useful in building renewable energy projects.  It generates 
zero income, and I can't really afford to have a lot of people come in 
and burn up GBs of bandwidth to no useful end.  Help!


Gary Reysa


Bozeman, MT
[EMAIL PROTECTED]



Hello Gary,

From a quick look at your site, it appears to be mainly static html 
that would not generate a lot of extra crawls.  If you have some dynamic 
portion of your site, like a calendar, that could make wget go into an 
infinite loop.  It would be much easier to tell if you could look at the 
server logs that show what pages were requested.  They would easily tell 
you want wget was getting hung on.


One problem I did notice is that your site is generating "soft 404s". 
In other words, it is sending back a http 200 response when it should be 
sending back a 404 response.  So if wget tries to access


http://www.builditsolar.com/blah

your web server is telling wget that the page actually exists.  This 
*could* cause more crawls than necessary, but not likely.  This problem 
should be fixed though.


It's possible the wget user did not know what they were doing and ran 
the crawler several times.  You could try to block traffic from that 
particular IP address or create a robots.txt file that tells crawlers to 
stay away from your site or just certain pages.  Wget respects 
robots.txt.  For more info:


http://www.robotstxt.org/wc/robots.html

Regards,
Frank



Re: Bug in ETA code on x64

2006-03-31 Thread Greg Hurrell

El 29/03/2006, a las 14:39, Hrvoje Niksic escribió:



I can't see any good reason to use "," here. Why not write the line
as:
  eta_hrs = eta / 3600; eta %= 3600;


Because that's not equivalent.


Well, it should be, because the comma operator has lower precedence
than the assignment operator (see http://tinyurl.com/evo5a,
http://tinyurl.com/ff4pp and numerous other locations).


Indeed you are right. So:

eta_hrs = eta / 3600, eta %= 3600;

Is equivalent to the following (with explicit parentheses to make the  
effect of the precendence obvious):


(eta_hrs = eta / 3600), (eta %= 3600);

Or of course:

eta_hrs = eta / 3600; eta %= 3600;

Greg



smime.p7s
Description: S/MIME cryptographic signature


Re: Bug in ETA code on x64

2006-03-29 Thread Hrvoje Niksic
Greg Hurrell <[EMAIL PROTECTED]> writes:

> El 28/03/2006, a las 20:43, Tony Lewis escribió:
>
>> Hrvoje Niksic wrote:
>>
>>> The cast to int looks like someone was trying to remove a warning and
>>> botched operator precedence in the process.
>>
>> I can't see any good reason to use "," here. Why not write the line
>> as:
>>   eta_hrs = eta / 3600; eta %= 3600;
>
> Because that's not equivalent.

Well, it should be, because the comma operator has lower precedence
than the assignment operator (see http://tinyurl.com/evo5a,
http://tinyurl.com/ff4pp and numerous other locations).

I'd still like to know where Thomas got his version of progress.c
because it seems that the change has introduced the bug.


Re: Bug in ETA code on x64

2006-03-29 Thread Greg Hurrell

El 28/03/2006, a las 20:43, Tony Lewis escribió:


Hrvoje Niksic wrote:


The cast to int looks like someone was trying to remove a warning and
botched operator precedence in the process.


I can't see any good reason to use "," here. Why not write the line  
as:

  eta_hrs = eta / 3600; eta %= 3600;


Because that's not equivalent. "The sequence or comma operator , has  
two operands: first the left operand is evaluated, then the right.  
The result has the type and value of the right operand. Note that a  
command in a list of initializations or arguments is not an operator,  
but simply a punctuation mark!".


Cheers,
Greg




smime.p7s
Description: S/MIME cryptographic signature


RE: Bug in ETA code on x64

2006-03-28 Thread Tony Lewis
Hrvoje Niksic wrote:

> The cast to int looks like someone was trying to remove a warning and
> botched operator precedence in the process.

I can't see any good reason to use "," here. Why not write the line as:
  eta_hrs = eta / 3600; eta %= 3600;

This makes it much less likely that someone will make a coding error while
editing that section of code.

Tony



Re: Bug in ETA code on x64

2006-03-28 Thread Hrvoje Niksic
Thomas Braby <[EMAIL PROTECTED]> writes:

> With wget 1.10.2 compiled using Visual Studio 2005 for Windows XP x64 
> I was getting no ETA until late in the transfer, when I'd get things 
> like:
>
> 49:49:49 then 48:48:48 then 47:47:47 etc.
>
> So I checked the eta value in seconds and it was correct, so the code 
> in progress.c line 880:
>
>eta_hrs = (int)(eta / 3600, eta %= 3600);
>eta_min = (int)(eta / 60, eta %= 60);
>eta_sec = (int)(eta);

This is weird.  Did you compile the code yourself, or did you get it
from a Windows download site?  I'm asking because the code in
progress.c doesn't look like that; it in fact looks like this:

  eta_hrs = eta / 3600, eta %= 3600;
  eta_min = eta / 60,   eta %= 60;
  eta_sec = eta;

The cast to int looks like someone was trying to remove a warning and
botched operator precedence in the process.  If you must insert the
cast, try:

eta_hrs = (int) (eta / 3600), eta %= 3600;
...


Re: Bug in TOLOWER macro when STANDALONE (?)

2006-03-06 Thread Hrvoje Niksic
"Beni Serfaty" <[EMAIL PROTECTED]> writes:

> I Think I found a bug when STANDALONE is defined on hash.c
> I hope I'm not missing something here...

Good catch, thanks.  I've applied a slightly different fix, appended
below.

By the way, are you using hash.c in a project?  I'd like to hear if
you're satisfied with it and would be very interested in any
suggestions and, of course, bugs.  hash.c was written to be
reuse-friendly.

Also note that you can get the latest version of the file (this fix
included) from http://svn.dotsrc.org/repo/wget/trunk/src/hash.c .


2006-03-06  Hrvoje Niksic  <[EMAIL PROTECTED]>

* hash.c (TOLOWER): Fix definition when STANDALONE.
Reported by Beni Serfaty.

Index: src/hash.c
===
--- src/hash.c  (revision 2119)
+++ src/hash.c  (working copy)
@@ -53,7 +53,8 @@
 # ifndef countof
 #  define countof(x) (sizeof (x) / sizeof ((x)[0]))
 # endif
-# define TOLOWER(x) ('A' <= (x) && (x) <= 'Z' ? (x) - 32 : (x))
+# include 
+# define TOLOWER(x) tolower ((unsigned char) x)
 # if __STDC_VERSION__ >= 199901L
 #  include   /* for uintptr_t */
 # else


Re: Bug? -k not compatible with -O

2006-03-02 Thread Greg McCann
Steven M. Schweda  antinode.org> writes:

> > [...] wget version 1.9.1
> 
>You might try it with the current version (1.10.2).
> 
>   http://www.gnu.org/software/wget/wget.html
> 

Oh, man - I can't believe I missed that.  All better now!  Thank you.


Greg



Re: Bug? -k not compatible with -O

2006-03-02 Thread Steven M. Schweda
> [...] wget version 1.9.1

   You might try it with the current version (1.10.2).

  http://www.gnu.org/software/wget/wget.html



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: Bug: -x with -O

2005-12-15 Thread Frank McCown

wget -x -O images/logo.gif
http://www.google.co.uk/intl/en_uk/images/logo.gif

It worked for me.



   Try it after "rm -rf images".



That was why it worked... I had an images directory already created. 
Should have deleted it before I tried.


Frank


Re: Bug: -x with -O

2005-12-14 Thread Steven M. Schweda
>From Frank McCown:

> wget -x -O images/logo.gif
> http://www.google.co.uk/intl/en_uk/images/logo.gif
> 
> It worked for me.

   Try it after "rm -rf images".

alp $ wget -x http://alp/test.html -O testxxx/test.html
testxxx/test.html: no such file or directory

alp $ wget -x -O testxxx/test.html http://alp/test.html
testxxx/test.html: no such file or directory

alp $ create /directory [.testxxx]

alp $ wget http://alp/test.html -O testxxx/test.html
--23:45:55--  http://alp/test.html
   => `testxxx/test.html'
Resolving alp... 10.0.0.9
Connecting to alp|10.0.0.9|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 109 [text/html]

100%[>] 109   --.--K/s

23:45:55 (2.08 MB/s) - `testxxx/test.html' saved [109/109]

alp $ dire /date /size /prot [.testxxx]

Directory ALP$DKA0:[SMS.TESTXXX]

TEST.HTML;11  25-JUN-2004 00:19:25.00  (RWED,RWED,RE,)

Total of 1 file, 1 block.

alp $ wget -V
GNU Wget 1.10.2a1

(VMS Alpha V7.3-2, naturally.)



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: Bug: -x with -O

2005-12-14 Thread Steven M. Schweda
   I wouldn't call it a bug.  While it may not be well documented (which
would not be unusual), "-x" affects URL-derived directories, not
user-specified directories.

   Presumably Wget could be modified to handle this, but my initial
reaction is that it's not unreasonable to demand that the fellow who
specified the directory in the "-O" option should be responsible for
ensuring that that directory exists.

   Of course, the value of my opinion may be low.



   Steven M. Schweda   (+1) 651-699-9818
   382 South Warwick Street[EMAIL PROTECTED]
   Saint Paul  MN  55105-2547


Re: Bug: -x with -O

2005-12-14 Thread Frank McCown

Chris,

I think the problem is you don't have the URL last.  Try this:

wget -x -O images/logo.gif 
http://www.google.co.uk/intl/en_uk/images/logo.gif


It worked for me.

Frank


Chris Hills wrote:

Hi

Using wget-1.10.2.

Example command:-

$ wget -x http://www.google.co.uk/intl/en_uk/images/logo.gif -O 
images/logo.gif

images/logo.gif: No such file or directory

wget should created the directory images/.

wget --help shows:-

  -x,  --force-directoriesforce creation of directories.

This implies that the above behaviour should work.

Even if -x is not appropriate here, wget should still first attempt to 
create the directory.


Regards



Re: bug retrieving embedded images with --page-requisites

2005-11-09 Thread Hrvoje Niksic
"Jean-Marc MOLINA" <[EMAIL PROTECTED]> writes:

> Hrvoje Niksic wrote:
>> More precisely, it doesn't use the file name advertised by the
>> Content-Disposition header.  That is because Wget decides on the file
>> name it will use based on the URL used, *before* the headers are
>> downloaded.  This unfortunate design decision is the cause of all
>> these problems, and will take some work to be undone.
>
> Implementing the "Content-Disposition" header is on the TODO list :
>
> * Honor `Content-Disposition: XXX; filename="FILE"' when creating the
>   file name.  If possible, try not to break `-nc' and friends when
>   doing that.

It is, indeed -- I wrote that entry.  :-)  The problem is that
implementing this is not as easy or straightforward as it sounds.
This is shared by most TODO list items.


Re: bug retrieving embedded images with --page-requisites

2005-11-09 Thread Jean-Marc MOLINA
Tony Lewis wrote:
> The --convert-links option changes the website path to a local file
> system path. That is, it changes the directory, not the file name.

Thanks I didn't understand it that way.

> IMO, your suggestion has merit, but it would require wget to maintain
> a list of MIME types and corresponding renaming rules.

Well it seems implementing the "Content-Type" header is planned since a long
time and there are two items about it in the "TODO" document of the wget
distrib.

Maintaining a list of MIME types is not an issue as there are already lists
around :
* "File suffixes and MIME types" at Duke University :
http://www.duke.edu/websrv/file-extensions.html
* "MIME Types" category at Google :
http://www.google.com/Top/Computers/Data_Formats/MIME_Types
* ...

Just a word about how HTTrack handles MIME types and extensions. It has a
powerful "--assume" option that allows users to assign a MIME type to
extensions. For example : "All .php files are PNG images". Everything is
explained on the "Option panel : MIME Types" page at
http://www.httrack.com/html/step9_opt11.html. I think wget could use such an
option.

JM.





Re: bug retrieving embedded images with --page-requisites

2005-11-09 Thread Jean-Marc MOLINA
Hrvoje Niksic wrote:
> More precisely, it doesn't use the file name advertised by the
> Content-Disposition header.  That is because Wget decides on the file
> name it will use based on the URL used, *before* the headers are
> downloaded.  This unfortunate design decision is the cause of all
> these problems, and will take some work to be undone.

Implementing the "Content-Disposition" header is on the TODO list :

* Honor `Content-Disposition: XXX; filename="FILE"' when creating the
  file name.  If possible, try not to break `-nc' and friends when
  doing that.

JM.





RE: bug retrieving embedded images with --page-requisites

2005-11-09 Thread Tony Lewis
Jean-Marc MOLINA wrote:

> For example if a PNG image is generated using a "gen_png_image.php" PHP
> script, I think wget should be able to download it if the option
> "--page-requisites" is used, because it's part of the page and it's not
> an external resource, get its MIME type, "image/png", and using the
> option "--convert-links" should also rename the script-image to
> "gen_png_image.png".

The --convert-links option changes the website path to a local file system
path. That is, it changes the directory, not the file name. IMO, your
suggestion has merit, but it would require wget to maintain a list of MIME
types and corresponding renaming rules.

Tony




Re: bug retrieving embedded images with --page-requisites

2005-11-09 Thread Hrvoje Niksic
"Jean-Marc MOLINA" <[EMAIL PROTECTED]> writes:

> As I don't know anything about wget sources, I can't tell how it
> innerworks but I guess it doesn't check the MIME types of resources
> linked from the "src" attribute of a "img" elements. And that would
> be a bug... And I think some kind of RFC or spec should confirm it.

More precisely, it doesn't use the file name advertised by the
Content-Disposition header.  That is because Wget decides on the file
name it will use based on the URL used, *before* the headers are
downloaded.  This unfortunate design decision is the cause of all
these problems, and will take some work to be undone.


Re: bug retrieving embedded images with --page-requisites

2005-11-09 Thread Jean-Marc MOLINA
Gavin Sherlock wrote:
> i.e. the image is generated on the fly from a script, which then
> essentially prints the image back to the browser with the correct
> mime type.  While this is a non-standard way to include an image on a
> page, the --page-requisites are not fulfilled when retrieving this
> web page.

I don't think you can consider this a "non-standard way". I'm sure there's a
whole paragraph in a RFC (HTML 4.01 spec) about properly dealing with URI,
linked resources and MIME types. For example if a PNG image is generated
using a "gen_png_image.php" PHP script, I think wget should be able to
download it if the option "--page-requisites" is used, because it's part of
the page and it's not an external resource, get its MIME type, "image/png",
and using the option "--convert-links" should also rename the script-image
to "gen_png_image.png".

I tried the "--page-requisites" option and got my test page, at
http://jmmolina.free.fr/t_39638/, perfectly archived. Original names and
page is 100% offline browsable. The script name is still
"gen_png_image.php". Then I used the "--convert-links" option to see if the
script was renamed to a PNG image, it wasn't.

To compare this behaviour with HTTrack, I tried to archive the same page
with it. By default it converted the PHP script to a HTML page. It's logical
because HTTrack has some default ext/MIME mappings. So I removed the ".php
to text/html" and got a nice PNG image instead. I don't really know how to
force it not to rename the script but it doesn't really matter.

As I don't know anything about wget sources, I can't tell how it innerworks
but I guess it doesn't check the MIME types of resources linked from the
"src" attribute of a "img" elements. And that would be a bug... And I think
some kind of RFC or spec should confirm it.

JM.





Re: bug in wget windows

2005-10-14 Thread Mauro Tortonesi

Tobias Koeck wrote:

done.
==> PORT ... done.==> RETR SUSE-10.0-EvalDVD-i386-GM.iso ... done.

[   <=>  ] -673,009,664  113,23K/s

Assertion failed: bytes >= 0, file retr.c, line 292

This application has requested the Runtime to terminate it in an unusual 
way.

Please contact the application's support team for more information.


you are probably using an older version of wget, without long file 
support. please upgrade to wget 1.10.2.


--
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi  http://www.tortonesi.com

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
GNU Wget - HTTP/FTP file retrieval tool  http://www.gnu.org/software/wget
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: Bug rpt

2005-09-20 Thread Alain Bench
Hello Hrvoje!

 On Tuesday, September 20, 2005 at 12:50:41 AM +0200, Hrvoje Niksic wrote:

> "HonzaCh" <[EMAIL PROTECTED]> writes:
>> the thousand separator (space according to my local settings)
>> displays as "á" (character code 0xA0, see attch.)
> Wget obtains the thousand separator from the operating system using
> the `localeconv' function.

I did not test wget, but believe Win32 console apps have to
setlocale(LC_ALL, ".OCP") before, so localeconv() gives a CP-852 no
break space 0xFF. It selects current locale (from control panel), but
with console OEM Code Page.

The "standard" setlocale(LC_ALL, "") selects the ANSI Code Page 1250
of graphic apps, where no break space is 0xA0.


Bye!Alain.
-- 
When you post a new message, beginning a new topic, use the "mail" or
"post" or "new message" functions.
When you reply or followup, use the "reply" or "followup" functions.
Do not do the one for the other, this breaks or hijacks threads.



Re: Bug rpt

2005-09-20 Thread Hrvoje Niksic
"HonzaCh" <[EMAIL PROTECTED]> writes:

>>> My localeconv()->thousands_sep (as well as many other struct
>>> members) reveals to empty string ("") (MSVC6.0).
>>
>> How do you know?  I mean, what program did you use to check this?
>
> My quick'n'dirty one. See the source below.

Your source neglects to setlocale(LC_ALL, ""), which you must do
before locale goes into effect.  Otherwise you're getting values from
the "C" locale, which doesn't define thousand separators.


Re: Bug rpt

2005-09-19 Thread Hrvoje Niksic
"HonzaCh" <[EMAIL PROTECTED]> writes:

> Latest version (1.10.1) turns out an UI bug: the thousand separator
> (space according to my local settings) displays as "á" (character
> code 0xA0, see attch.)
>
> Although it does not affect the primary function of WGET, it looks
> quite ugly.
>
> Env.: Win2k Pro/Czech (CP852 for console apps, CP1250 for windowed
> ones).

Thanks for the report.  Is this a natively compiled Wget or one
compiled on Cygwin?

Wget obtains the thousand separator from the operating system using
the `localeconv' function.  According to MSDN
(http://tinyurl.com/cumk2 and http://tinyurl.com/chubg), Wget's usage
appears to be correct.  I'd be surprised if that function didn't
function properly on Windows.

Can other Windows testers repeat this problem?


Re: Bug? gettin file > 2 GB fails

2005-07-07 Thread Hrvoje Niksic
Jogchum Reitsma <[EMAIL PROTECTED]> writes:

> I'm not sure it's a bug, but behaviour descibes below seems strange
> to me, so I thought it was wise to report it:

Upgrade to Wget 1.10 and the problem should go away.  Earlier versions
don't handle files larger than 2GB properly.


Re: Bug

2005-07-07 Thread Hrvoje Niksic
Rodrigo Botafogo <[EMAIL PROTECTED]> writes:

> [EMAIL PROTECTED]:~/Download/Linux> wget -c
> ftp://chuck.ucs.indiana.edu/linux/suse/suse/i386/9.3/iso/SUSE-9.3-Eval-DVD.iso
> --09:55:03-- 
> ftp://chuck.ucs.indiana.edu/linux/suse/suse/i386/9.3/iso/SUSE-9.3-Eval-DVD.iso
>   => `SUSE-9.3-Eval-DVD.iso'
> Resolving chuck.ucs.indiana.edu... 156.56.247.193
> Connecting to chuck.ucs.indiana.edu[156.56.247.193]:21... connected.

Please upgrade to Wget 1.10, which has this bug fixed.


Re: Bug report: option -nr

2005-06-30 Thread Hrvoje Niksic
Marc Niederwieser <[EMAIL PROTECTED]> writes:

> option --mirror is described as
>   shortcut option equivalent to -r -N -l inf -nr.
> but option "-nr" is not implemented.
> I think you mean "--no-remove-listing".

Thanks for the report, I've now fixed the --help text.

2005-07-01  Hrvoje Niksic  <[EMAIL PROTECTED]>

* main.c (print_help): Don't refer to the non-existent -nr in
description of --mirror.

Index: src/main.c
===
--- src/main.c  (revision 1918)
+++ src/main.c  (working copy)
@@ -575,7 +575,7 @@
 N_("\
   -K,  --backup-converted   before converting file X, back up as X.orig.\n"),
 N_("\
-  -m,  --mirror shortcut option equivalent to -r -N -l inf 
-nr.\n"),
+  -m,  --mirror shortcut for -N -r -l inf --no-remove-listing.\n"),
 N_("\
   -p,  --page-requisitesget all images, etc. needed to display HTML 
page.\n"),
 N_("\


Re: Bug handling session cookies

2005-06-24 Thread Hrvoje Niksic
"Mark Street" <[EMAIL PROTECTED]> writes:

> Many thanks for the explanation and the patch.  Yes, this patch
> successfully resolves the problem for my particular test case.

Thanks for testing it.  It has been applied to the code and will be in
Wget 1.10.1 and later.


Re: Bug handling session cookies

2005-06-24 Thread Mark Street

Hrvoje,

Many thanks for the explanation and the patch.
Yes, this patch successfully resolves the problem for my particular test
case.

Best regards,

Mark Street.




Re: Bug handling session cookies

2005-06-24 Thread Hrvoje Niksic
"Mark Street" <[EMAIL PROTECTED]> writes:

> I'm not sure why this [catering for paths without a leading /] is
> done in the code.

rfc1808 declared that the leading / is not really part of path, but
merely a "separator", presumably to be consistent with its treatment
of ;params, ?queries, and #fragments.  The author of the code found it
appealing to disregard common sense and implement rfc1808 semantics.

In most cases the user shouldn't notice the difference, but it has
lead to all kinds of implementation problems with code that assumes
that URL paths naturally begin with /.  Because of that it will be
changed later.

> Note that the forward slash is stripped from "prefix", hence never
> matches "full_path".  I'm not sure why this is done in the code.

Because PREFIX is the path declared by the cookie, which always begins
with /, and FULL_PATH is the URL path coming from the URL parsing
code, which doesn't begin with a /.  To match them, one must indeed
strip the leading / off PREFIX.

But paths without a slash still caused subtle problems.  For example,
cookies without a path attribute still had to be stored with the
correct cookie-path (with a leading slash).  To account for this, the
invocation of cookie_handle_set_cookie was modified to prepend the /
before the path.  This lead to path_match unexpectedly receiving two
/-prefixed paths and being unable to match them.

The attached patch fixes the problem by:

* Making sure that path consistently gets prepended in all entry
  points to cookie code;

* Removing the special logic from path_match.

With that change your test case seems to work, and so do all the other
tests I could think of.

Please let me know if it works for you, and thanks for the detailed
bug report.


2005-06-24  Hrvoje Niksic  <[EMAIL PROTECTED]>

* http.c (gethttp): Don't prepend / here.

* cookies.c (cookie_handle_set_cookie): Prepend / to PATH.
(cookie_header): Ditto.

Index: src/http.c
===
--- src/http.c  (revision 1794)
+++ src/http.c  (working copy)
@@ -1706,7 +1706,6 @@
   /* Handle (possibly multiple instances of) the Set-Cookie header. */
   if (opt.cookies)
 {
-  char *pth = NULL;
   int scpos;
   const char *scbeg, *scend;
   /* The jar should have been created by now. */
@@ -1717,15 +1716,8 @@
   ++scpos)
{
  char *set_cookie; BOUNDED_TO_ALLOCA (scbeg, scend, set_cookie);
- if (pth == NULL)
-   {
- /* u->path doesn't begin with /, which cookies.c expects. */
- pth = (char *) alloca (1 + strlen (u->path) + 1);
- pth[0] = '/';
- strcpy (pth + 1, u->path);
-   }
- cookie_handle_set_cookie (wget_cookie_jar, u->host, u->port, pth,
-   set_cookie);
+ cookie_handle_set_cookie (wget_cookie_jar, u->host, u->port,
+   u->path, set_cookie);
}
 }
 
Index: src/cookies.c
===
--- src/cookies.c   (revision 1794)
+++ src/cookies.c   (working copy)
@@ -822,6 +822,17 @@
 {
   return path_matches (path, cookie_path) != 0;
 }
+
+/* Prepend '/' to string S.  S is copied to fresh stack-allocated
+   space and its value is modified to point to the new location.  */
+
+#define PREPEND_SLASH(s) do {  \
+  char *PS_newstr = (char *) alloca (1 + strlen (s) + 1);  \
+  *PS_newstr = '/';\
+  strcpy (PS_newstr + 1, s);   \
+  s = PS_newstr;   \
+} while (0)
+
 
 /* Process the HTTP `Set-Cookie' header.  This results in storing the
cookie or discarding a matching one, or ignoring it completely, all
@@ -835,6 +846,11 @@
   struct cookie *cookie;
   cookies_now = time (NULL);
 
+  /* Wget's paths don't begin with '/' (blame rfc1808), but cookie
+ usage assumes /-prefixed paths.  Until the rest of Wget is fixed,
+ simply prepend slash to PATH.  */
+  PREPEND_SLASH (path);
+
   cookie = parse_set_cookies (set_cookie, update_cookie_field, false);
   if (!cookie)
 goto out;
@@ -977,17 +993,8 @@
 static int
 path_matches (const char *full_path, const char *prefix)
 {
-  int len;
+  int len = strlen (prefix);
 
-  if (*prefix != '/')
-/* Wget's HTTP paths do not begin with '/' (the URL code treats it
-   as a mere separator, inspired by rfc1808), but the '/' is
-   assumed when matching against the cookie stuff.  */
-return 0;
-
-  ++prefix;
-  len = strlen (prefix);
-
   if (0 != strncmp (full_path, prefix, len))
 /* FULL_PATH doesn't begin with PREFIX. */
 return 0;
@@ -1149,6 +1156,7 @@
   int count, i, ocnt;
   char *result;
   int result_size, pos;
+  PREPEND_SLASH (path);/* see cookie_handle_set_cookie */
 
   /* First, find the cooki

Re: Bug: wget cannot handle quote

2005-06-21 Thread Hrvoje Niksic
Will Kuhn <[EMAIL PROTECTED]> writes:

> Apparentl wget does not handle single quote or double quote very well.
> wget with the following arguments give error.
>
>  wget
>  --user-agent='Mozilla/5.0' --cookies=off --header
>  'Cookie: testbounce="testing";
>  ih="b'!!!0T#8G(5A!!#c`#8HWsH!!#wt#8I0HY!!#yf#8I0G3";
>  cf="b$y~!!!D)#"; hi="b#!!!D)8I=C]"'
>  'ad.yieldmanager.com/imp?z=12&n=2&E=01-329&I=508&S=508-1'
>  -O /home/admin/http/wwwscanfile.YYO3Cy

You haven't stated which error you get, but on my system the error
comes from the shell and not from Wget.  The problem is that you used
single quotes to quote a string that contains, among other things,
single quotes.  This effectively turned off the quoting for some
portions of the text, causing the shell to interpret the bangs ("!") 
as (invalid) history events.

To correct the problem, replace ' within single quotes with something
like '\'':

wget --user-agent='Mozilla/5.0' --cookies=off --header 'Cookie: 
testbounce="testing"; 
ih="b'\''!!!0T#8G(5A!!#c`#8HWsH!!#wt#8I0HY!!#yf#8I0G3"; 
cf="b$y~!!!D)#"; hi="b#!!!D)8I=C]"' 
'ad.yieldmanager.com/imp?z=12&n=2&E=01-329&I=508&S=508-1' -O 
/home/admin/http/wwwscanfile.YYO3Cy


Re: bug with password containing @

2005-05-26 Thread Hrvoje Niksic
Andrew Gargan <[EMAIL PROTECTED]> writes:

> wget ftp://someuser:[EMAIL PROTECTED]@www.somedomain.com/some_file.tgz
>
> is splitting using on the first @ not the second.

Encode the '@' as %40 and this will work.  For example:

wget ftp://someuser:[EMAIL PROTECTED]/some_file.tgz

> Is this a problem with the URL standard or a wget issue?

Neither, but maybe URL could be smarter about handling the above case.


RE: bug with password containing @

2005-05-26 Thread Andrew Gargan




Hi 

wget ftp://someuser:[EMAIL PROTECTED]@www.somedomain.com/some_file.tgz

is splitting using on the first @ not the second.

Is this a problem with the URL standard or a wget issue?

Regards

Andrew Gargan




Re: bug in static build of wget with socks

2005-05-16 Thread Hrvoje Niksic
Seemant Kulleen <[EMAIL PROTECTED]> writes:

>> Since I don't use Gentoo, I'll need more details to fix this.
>> 
>> For one, I haven't tried Wget with socks for a while now.  Older
>> versions of Wget supported of --with-socks option, but the procedure
>> for linking a program with socks changed since then, and the option
>> was removed due to bitrot.  I don't know how the *dynamic* linking
>> against socks works in Gentoo, either.
>
> Ah ok, ./configure --help still shows the option, so this is fairly
> undocumented then.

I spoke too soon: it turns out that --with-socks is only removed in
Wget 1.10 (now in beta).

But --with-socks in 1.9.1 doesn't really force linking with the socks
library, it merely checks for a "Rconnect" function in "-lsocks".  If
that is not found, the build is continued as usual.  You should check
the configure output (along with `ldd' on the resulting executable) to
see if that really worked.

>> I don't even know if this is a bug in Wget or in the way that the
>> build is attempted by the Gentoo package mechanism.  Providing the
>> actual build output might shed some light on this.
>
> if use static; then
> emake LDFLAGS="--static" || die

I now tried `LDFLAGS=--static ./configure', and it seems to work in
1.10.  Linking does produce two warnings, but the resulting executable
is static.


Re: bug in static build of wget with socks

2005-05-16 Thread Hrvoje Niksic
Seemant Kulleen <[EMAIL PROTECTED]> writes:

> I wanted to alert you all to a bug in wget, reported by one of our
> (gentoo) users at:
>
> https://bugs.gentoo.org/show_bug.cgi?id=69827
>
> I am the maintainer for the Gentoo ebuild for wget.
>
> If someone would be willing to look at and help us with that bug,
> it'd be much appreciated.

Since I don't use Gentoo, I'll need more details to fix this.

For one, I haven't tried Wget with socks for a while now.  Older
versions of Wget supported of --with-socks option, but the procedure
for linking a program with socks changed since then, and the option
was removed due to bitrot.  I don't know how the *dynamic* linking
against socks works in Gentoo, either.

Secondly, I have very little experience with creating static binaries,
since I personally don't need them.  I don't even know what flags
USE=static causes to be passed to the compiler and the linker.
Likewise, I don't have a clue why there is a difference between Wget
1.8 and Wget 1.9 in this, nor why the presence of socks makes the
slightest difference.

I don't even know if this is a bug in Wget or in the way that the
build is attempted by the Gentoo package mechanism.  Providing the
actual build output might shed some light on this.


Re: Bug when downloading large files (over 2 gigs) from proftpd server.

2005-04-27 Thread Hrvoje Niksic
This problem has been fixed for the upcoming 1.10 release.  If you
want to try it, it's available at
ftp://ftp.deepspace6.net/pub/ds6/sources/wget/wget-1.10-alpha2.tar.bz2


Re: Bug

2005-03-20 Thread Jorge Bastos - Decimal
Hi Jens,
I see, no problem... i just thouth that no one saw that thing and decided to 
send email.
Thanks for the reply :)

Jorge

- Original Message - 
From: ""Jens Rösner"" <[EMAIL PROTECTED]>
To: "Jorge Bastos - Decimal" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Sunday, March 20, 2005 8:27 PM
Subject: Re: Bug


Hi Jorge!
Current wget versions do not support large files >2GB.
However, the CVS version does and the fix will be introduced
to the normal wget source.
Jens
(just another user)
When downloading a file of 2GB and more, the counter get crazy, probably
it should have a long instead if a int number.
--
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION "Kein Einrichtungspreis" nutzen: http://www.gmx.net/de/go/dsl



Re: Bug

2005-03-20 Thread Jens Rösner
Hi Jorge!

Current wget versions do not support large files >2GB. 
However, the CVS version does and the fix will be introduced 
to the normal wget source. 

Jens
(just another user)

> When downloading a file of 2GB and more, the counter get crazy, probably
> it should have a long instead if a int number.

-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION "Kein Einrichtungspreis" nutzen: http://www.gmx.net/de/go/dsl


Re: bug-wget still useful

2005-03-15 Thread Dan Jacobson
P> I don't know why you say that.  I see bug reports and discussion of fixes
P> flowing through here on a fairly regular basis.

All I know is my reports for the last few months didn't get the usual (any!)
cheery replies. However, I saw them on Gmane, yes.


Re: bug-wget still useful

2005-03-15 Thread Hrvoje Niksic
Dan Jacobson <[EMAIL PROTECTED]> writes:

> Is it still useful to mail to [EMAIL PROTECTED] I don't think
> anybody's home.  Shall the address be closed?

If you're referring to Mauro being busy, I don't see it as a reason to
close the bug reporting address.


RE: bug-wget still useful

2005-03-15 Thread Post, Mark K
I don't know why you say that.  I see bug reports and discussion of fixes
flowing through here on a fairly regular basis.


Mark Post


-Original Message-
From: Dan Jacobson [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 15, 2005 3:04 PM
To: [EMAIL PROTECTED]
Subject: bug-wget still useful


Is it still useful to mail to [EMAIL PROTECTED] I don't think anybody's
home.  Shall the address be closed?


Re: Bug: really large files cause problems with status text

2005-02-02 Thread Ulf Härnhammar
Quoting Alan Robinson <[EMAIL PROTECTED]>:

> When downloading a 4.2 gig file (such as from
> ftp://movies06.archive.org/2/movies/abe_lincoln_of_the_4th_ave/abe_lincoln_o
> f_the_4th_ave.mpeg ) cause the status text (i.e.
> 100%[+===>] 38,641,328   213.92K/sETA
> 00:00) to print invalid things (in this case, that 100% of the file has been
> downloaded, even though only 40MB really has.

It is a Frequently Asked Question, with the answer that people are working on
it.

// Ulf



Re: Bug (wget 1.8.2): Wget downloads files rejected with -R.

2005-01-22 Thread jens . roesner
Hi Jason!

If I understood you correctly, this quote from the manual should help you:
***
Note that these two options [accept and reject based on filenames] do not
affect the downloading of HTML files; Wget must load all the HTMLs to know
where to go at all--recursive retrieval would make no sense otherwise.
***

If you are seeing wget behaviour different from this, please a) update your
wget and b) provide more details where/how it happens.

CU & good luck!
Jens (just another user)



> When the -R option is specified to reject files by name in recursive mode,
> wget downloads them anyway then deletes them after downloading. This is a
> problem when you are trying to be picky about the files you are
downloading
> to save bandwidth. Since wget appears to know the name of the file it is
> downloading before it is downloaded (even if the specified URL is
redirected
> to a different filename), then it should not bother downloading the file
> at all if it is going to delete it immediately after downloading it.
> 
> - Jason Cipriani
> 

-- 
GMX im TV ... Die Gedanken sind frei ... Schon gesehen?
Jetzt Spot online ansehen: http://www.gmx.net/de/go/tv-spot


RE: bug

2004-12-26 Thread Post, Mark K
Title: Message



Put the URL in 
double quotes.  That worked for me.
 
 
Mark 
Post
 

-Original Message-From: szulevzs 
[mailto:[EMAIL PROTECTED] Sent: Sunday, December 26, 2004 5:23 
AMTo: [EMAIL PROTECTED]Subject: bug
WGET can not download the following 
link:
 
Wget --tries=5 http://extremetracking.com/free-2/scripts/reports/display/edit?server=c&login=flashani
 
I tested it with other downloader and it was 
working.


Re: bug / overflow issue

2004-09-01 Thread Jonathan Stewart
On Thu, 2 Sep 2004 04:28:39 +0200, Patrik Sjöberg <[EMAIL PROTECTED]> wrote:
> hi ive found the following bug / issue with wget.
> due to limitations wget bugs on files larger then "unsigned long" and
> displays incorrect size and also acts incorrectly when trying to download
> one of these files.

Yes, this is a known issue, unfortunately.  I hear rumours that folks
are trying to fix it, to give wget large file support (> 2 GiB).

Any devels care to comment on the status?

-- 
 Jonathan


Re: Bug#261755: Control sequences injection patch

2004-08-23 Thread Jan Minar
On Sun, Aug 22, 2004 at 08:02:54PM +0200, Jan Minar wrote:
> +/* vasprintf() requires _GNU_SOURCE.  Which is OK with Debian. */
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE

This must be done before stdio.h is included.

> +#endif
> +#include 
> +
>  #ifndef errno
>  extern int errno;
>  #endif
> @@ -345,7 +351,49 @@
>int expected_size;
>int allocated;
>  };
> +
> +/* XXX Where does the declaration belong?? */
> +void escape_buffer (char **src);
>  
> +/*
> + * escape_untrusted  -- escape using '\NNN'.  To be used wherever we want to
> + * print untrusted data.
> + *
> + * Syntax: escape_buffer (&buf-to-escape);
> + */
> +void escape_buffer (char **src)
> +{
> + char *dest;
> + int i, j;
> +
> + /* We encode each byte using at most 4 bytes, + trailing '\0'. */
> + dest = xmalloc (4 * strlen (*src) + 1);
> +
> + for (i = j = 0; (*src)[i] != '\0'; ++i) {
> + /*
> +  * We allow any non-control character, because LINE TABULATION
> +  * & friends can't do more harm than SPACE.  And someone
> +  * somewhere might be using these, so unless we actually can't
> +  * protect against spoofing attacks, we don't pretend we can.
> +  *
> +  * Note that '\n' is included both in the isspace() *and*
> +  * iscntrl() range.
> +  */
> + if (isprint((*src)[i]) || isspace((*src)[i])) {

This lets '\r' thru, not good.  BTW, (*src)[i] is quite a cypher.

> + dest[j++] = (*src)[i];
> + } else {
> + dest[j++] = '\\';
> + dest[j++] = '0' + (((*src)[i] & 0xff) >> 6);
> + dest[j++] = '0' + (((*src)[i] & 0x3f) >> 3);
> + dest[j++] = '0' + ((*src)[i] & 7);
> + }
> + }
> + dest[j] = '\0';
> +
> + xfree (*src);
> + *src = dest;
> +}


Attached is version 2, which solves these problems.

Please keep me CC'd.

Jan.

-- 
   "To me, clowns aren't funny. In fact, they're kind of scary. I've wondered
 where this started and I think it goes back to the time I went to the circus,
  and a clown killed my dad."
--- wget-1.9.1.ORIG/src/log.c   2004-08-22 13:42:33.0 +0200
+++ wget-1.9.1-jan/src/log.c2004-08-24 02:38:38.0 +0200
@@ -42,6 +42,12 @@
 # endif
 #endif /* not WGET_USE_STDARG */
 
+/* vasprintf() requires _GNU_SOURCE.  Which is OK with Debian. */
+/* This *must* be defined before stdio.h is included. */
+#ifndef _GNU_SOURCE
+# define _GNU_SOURCE
+#endif
+
 #include 
 #ifdef HAVE_STRING_H
 # include 
@@ -63,6 +69,8 @@
 #include "wget.h"
 #include "utils.h"
 
+#include 
+
 #ifndef errno
 extern int errno;
 #endif
@@ -345,7 +353,69 @@
   int expected_size;
   int allocated;
 };
+
+/* XXX Where does the declaration belong?? */
+void escape_buffer (char **src);
 
+/*
+ * escape_buffer  -- escape using '\NNN'.  To be used wherever we want to print
+ * untrusted data.
+ *
+ * Syntax: escape_buffer (&buf-to-escape);
+ */
+void escape_buffer (char **src)
+{
+   char *dest, c;
+   int i, j;
+
+   /* We encode each byte using at most 4 bytes, + trailing '\0'. */
+   dest = xmalloc (4 * strlen (*src) + 1);
+
+   for (i = j = 0; (c = (*src)[i]) != '\0'; ++i) {
+   /*
+* We allow any non-control character, because '\t' & friends
+* can't do more harm than SPACE.  And someone somewhere might
+* be using these, so unless we actually can protect against
+* spoofing attacks, we don't pretend it.
+*
+* Note that '\n' is included both in the isspace() *and*
+* iscntrl() range.
+*
+* We try not to allow '\r' & friends by using isblank()
+* instead of isspace().  Let's hope noone will complain about
+* '\v' & similar being filtered (the characters we may still
+* let thru can vary among locales, so there is not much we can
+* do about this *from within logvprintf()*.
+*/
+   if (c == '\r' && *(&c + 1) == '\n') {
+   /*
+* I've spotted wget printing CRLF line terminators
+* while communicating with ftp://ftp.debian.org.  This
+* is a bug: wget should print whatever the platform
+* line terminator is (CR on Mac, CRLF on CP/M, LF on
+* Un*x, etc.)
+*
+* We work around this bug here by taking CRLF for a
+* line terminator.  A lone CR is still treated as a
+* control character.
+*/
+   i++;
+   dest[j++] = '\n';
+   } else if (isprint(c) || isblank(c) || c == '\n') {
+ 

  1   2   >