Problems with wget traversing multiple sites, overwriting files,etc.

Josh Brooks Sat, 23 Nov 2002 23:58:03 -0800

Hello,

I am not subscribed to this list, so please do CC me on any replies.


I am running this wget command line:

wget -nH --no-parent --directory-prefix=/data--random-wait -r -l inf
--convert-links --html-extension --user-agent="Mozilla/4.0 (compatible;
MSIE 6.0; AOL 7.0; Windows NT 5.1)" www.somewebsite.com

I should also note that the destination directory /data is browsable by me
(ie. it is published with an http server) and so I am able to view the
results of this wget command line even while it is still running.


Normally this command line works exactly as I expect it to.  However, in
some cases, some very odd behavior occurs.

First off, as the command is running, I pull up the locally saved version
in my web browser - and I see the homepage of the site I am wgetting.
Then five minutes later I reload that local home page and suddenly I see
the paypal homepage, minus all the graphics.  The homepage keeps changing
every few minutes until I kill the command.

So, this represents two very odd behaviors - first is that an index.html
is downloaded successfully (the correct one) and then a few minutes later
that index.html is overwritten with a new one.  I cannot see how there
could be any expectation of this behavior given the command line...

The second behavior that is odd is that it is clearly traversing (and
grabbing) files from sites other than www.somewebsite.com - in fact it is
traversing all over the place and downloading from all sorts of web sites.

I have reproduced this (accidently) with several sites, however the only
one I can remember is www.explodingdog.com - please do not everyone run
out and test this and run up that sites bandwidth allotment - but if you
can decide amongst yourselves who might be best to test it out, you will
definitely see the behavior if you run that command line I provided with
that site.

It is very odd and very unexpected.

I would like to know a solution - I very much want to grab sites that have
this property with that command line, but as it stands now I cannot.

thanks!

Problems with wget traversing multiple sites, overwriting files,etc.

Reply via email to