Hello, I am not subscribed to this list, so please do CC me on any replies.
I am running this wget command line: wget -nH --no-parent --directory-prefix=/data--random-wait -r -l inf --convert-links --html-extension --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; AOL 7.0; Windows NT 5.1)" www.somewebsite.com I should also note that the destination directory /data is browsable by me (ie. it is published with an http server) and so I am able to view the results of this wget command line even while it is still running. Normally this command line works exactly as I expect it to. However, in some cases, some very odd behavior occurs. First off, as the command is running, I pull up the locally saved version in my web browser - and I see the homepage of the site I am wgetting. Then five minutes later I reload that local home page and suddenly I see the paypal homepage, minus all the graphics. The homepage keeps changing every few minutes until I kill the command. So, this represents two very odd behaviors - first is that an index.html is downloaded successfully (the correct one) and then a few minutes later that index.html is overwritten with a new one. I cannot see how there could be any expectation of this behavior given the command line... The second behavior that is odd is that it is clearly traversing (and grabbing) files from sites other than www.somewebsite.com - in fact it is traversing all over the place and downloading from all sorts of web sites. I have reproduced this (accidently) with several sites, however the only one I can remember is www.explodingdog.com - please do not everyone run out and test this and run up that sites bandwidth allotment - but if you can decide amongst yourselves who might be best to test it out, you will definitely see the behavior if you run that command line I provided with that site. It is very odd and very unexpected. I would like to know a solution - I very much want to grab sites that have this property with that command line, but as it stands now I cannot. thanks!
