Re: Major, and seemingly random problems with wget 1.8.2
Hello Hrvoje, On 12-Oct-03, you wrote: > I don't use an Amiga, nor do I have an idea what you mean by a > "working Wget setup". Have you tried compiling from source? Well simply a config or prefs file for one purpose using wget preferable for an amiga system. A batch/script file would do too. No I didn't try compiling from source since I already have a working wget binary. But I can't take too much time playing around with it finding the right setting for quickly downloading data. Regards Patrick Robinson
Re: Major, and seemingly random problems with wget 1.8.2
I don't use an Amiga, nor do I have an idea what you mean by a "working Wget setup". Have you tried compiling from source?
Re: Major, and seemingly random problems with wget 1.8.2
Hello Hrvoje, On 07-Oct-03, you wrote: is it possible for someone to e-mail me a working wget setup for amiga to my private mail? Thanks Regards Patrick Robinson
Re: Major, and seemingly random problems with wget 1.8.2
Josh Brooks <[EMAIL PROTECTED]> writes: >> > At first it will act normally, just going over the site in question, but >> > sometimes, you will come back to the terminal and see if grabbing all >> > sorts of pages from totally different sites (!) >> >> The only way I've seen it happen is when it follows a redirection to a >> different site. The redirection is followed because it's considered >> to be part of the same download. However, further links on the >> redirected site are not (supposed to be) followed. > > Ok, is there a way to tell wget not to follow redirects, so it will > not ever do that at all ? Not yet, sorry. But people have asked for it a lot, so it'll probably make it in after 1.9.
Re: Major, and seemingly random problems with wget 1.8.2
Thank you for the great response. It is much appreciated - see below... On Tue, 7 Oct 2003, Hrvoje Niksic wrote: > www.zorg.org/vsound/ contains this markup: > > > > That explicitly tells robots, such as Wget, not to follow the links in > the page. Wget respects this and does not follow the links. You can > tell Wget to ignore the robot directives. For me, this works as > expected: > > wget -km -e robots=off http://www.zorg.org/vsound/ Perfect - thank you. > > At first it will act normally, just going over the site in question, but > > sometimes, you will come back to the terminal and see if grabbing all > > sorts of pages from totally different sites (!) > > The only way I've seen it happen is when it follows a redirection to a > different site. The redirection is followed because it's considered > to be part of the same download. However, further links on the > redirected site are not (supposed to be) followed. Ok, is there a way to tell wget not to follow redirects, so it will not ever do that at all ? Basically I am looking for a way to tell wget "don't ever get anything with a different FQDN than what I started you with" thanks.
Re: Major, and seemingly random problems with wget 1.8.2
Josh Brooks <[EMAIL PROTECTED]> writes: > I have noticed very unpredictable behavior from wget 1.8.2 - > specifically I have noticed two things: > > a) sometimes it does not follow all of the links it should > > b) sometimes wget will follow links to other sites and URLs - when the > command line used should not allow it to do that. Thanks for the report. A more detailed response follows below: > First, sometimes when you attempt to download a site with -k -m > (--convert-links and --mirror) wget will not follow all of the links and > will skip some of the files! > > I have no idea why it does this with some sites and doesn't do it with > other sites. Here is an example that I have reproduced on several systems > - all with 1.8.2: Links are missed on some sites because of the use of incorrect comments. This has been fixed for Wget 1.9, where a more relaxed comment parsing code is the default. But that's not the case for www.zorg.org/vsound/. www.zorg.org/vsound/ contains this markup: That explicitly tells robots, such as Wget, not to follow the links in the page. Wget respects this and does not follow the links. You can tell Wget to ignore the robot directives. For me, this works as expected: wget -km -e robots=off http://www.zorg.org/vsound/ You can put `robots=off' in your .wgetrc and this problem will not bother you again. > The second problem, and I cannot currently give you an example to try > yourself but _it does happen_, is if you use this command line: > > wget --tries=inf -nH --no-parent > --directory-prefix=/usr/data/www.explodingdog.com--random-wait -r -l inf > --convert-links --html-extension --user-agent="Mozilla/4.0 (compatible; > MSIE 6.0; AOL 7.0; Windows NT 5.1)" www.example.com > > At first it will act normally, just going over the site in question, but > sometimes, you will come back to the terminal and see if grabbing all > sorts of pages from totally different sites (!) The only way I've seen it happen is when it follows a redirection to a different site. The redirection is followed because it's considered to be part of the same download. However, further links on the redirected site are not (supposed to be) followed. If you have a repeatable example, please mail it here so we can examine it in more detail.
Major, and seemingly random problems with wget 1.8.2
Hello, I have noticed very unpredictable behavior from wget 1.8.2 - specifically I have noticed two things: a) sometimes it does not follow all of the links it should b) sometimes wget will follow links to other sites and URLs - when the command line used should not allow it to do that. Here are the details. First, sometimes when you attempt to download a site with -k -m (--convert-links and --mirror) wget will not follow all of the links and will skip some of the files! I have no idea why it does this with some sites and doesn't do it with other sites. Here is an example that I have reproduced on several systems - all with 1.8.2: # wget -k -m http://www.zorg.org/vsound/ --17:09:32-- http://www.zorg.org/vsound/ => `www.zorg.org/vsound/index.html' Resolving www.zorg.org... done. Connecting to www.zorg.org[213.232.100.31]:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] [ <=> ] 12,23553.82K/s Last-modified header missing -- time-stamps turned off. 17:09:32 (53.82 KB/s) - `www.zorg.org/vsound/index.html' saved [12235] FINISHED --17:09:32-- Downloaded: 12,235 bytes in 1 files Converting www.zorg.org/vsound/index.html... 2-6 Converted 1 files in 0.03 seconds. What is the problem here ? When I run the exact same command line with wget 1.6, I get this: # wget -k -m http://www.zorg.org/vsound/ --11:10:06-- http://www.zorg.org/vsound/ => `www.zorg.org/vsound/index.html' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] 0K -> .. . Last-modified header missing -- time-stamps turned off. 11:10:07 (71.12 KB/s) - `www.zorg.org/vsound/index.html' saved [12235] Loading robots.txt; please ignore errors. --11:10:07-- http://www.zorg.org/robots.txt => `www.zorg.org/robots.txt' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 404 Not Found 11:10:07 ERROR 404: Not Found. --11:10:07-- http://www.zorg.org/vsound/vsound.jpg => `www.zorg.org/vsound/vsound.jpg' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 200 OK Length: 27,629 [image/jpeg] 0K -> .. .. .. [100%] 11:10:08 (51.49 KB/s) - `www.zorg.org/vsound/vsound.jpg' saved [27629/27629] --11:10:09-- http://www.zorg.org/vsound/vsound-0.2.tar.gz => `www.zorg.org/vsound/vsound-0.2.tar.gz' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 200 OK Length: 108,987 [application/x-tar] 0K -> .. .. .. .. .. [ 46%] 50K -> .. .. .. .. .. [ 93%] 100K -> .. [100%] 11:10:12 (46.60 KB/s) - `www.zorg.org/vsound/vsound-0.2.tar.gz' saved [108987/108987] --11:10:12-- http://www.zorg.org/vsound/vsound-0.5.tar.gz => `www.zorg.org/vsound/vsound-0.5.tar.gz' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 200 OK Length: 116,904 [application/x-tar] 0K -> .. .. .. .. .. [ 43%] 50K -> .. .. .. .. .. [ 87%] 100K -> .. [100%] 11:10:14 (60.44 KB/s) - `www.zorg.org/vsound/vsound-0.5.tar.gz' saved [116904/116904] --11:10:14-- http://www.zorg.org/vsound/vsound => `www.zorg.org/vsound/vsound' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 200 OK Length: 3,365 [text/plain] 0K -> ...[100%] 11:10:14 (3.21 MB/s) - `www.zorg.org/vsound/vsound' saved [3365/3365] Converting www.zorg.org/vsound/index.html... done. FINISHED --11:10:14-- Downloaded: 269,120 bytes in 5 files Converting www.zorg.org/vsound/index.html... done. See ? It gets the links inside of index.html, and mirrors those links, and converts them - just like it should. Why does 1.8.2 have a problem with this site ? Other sites are handled just fine by 1.8.2 with the same command line ... it makes no sense that wget 1.8.2 has problems with particular web sites. This is incorrect behavior - and if you try the same URL with 1.8.2 you can reproduce the same results. The second problem, and I cannot currently give you an example to try yourself but _it does happen_, is if you use this command line: wget --tries=inf -nH --no-parent --directory-prefix=/usr/data/www.explodingdog.com--random-wait -r -l inf --convert-links --html-extension --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; AOL 7.0; Windows NT 5.1)" www.example.com At first it will act normally, just going over the site in question, but sometimes, you will come back to the terminal and see if grabbing all sorts of pages from totally different sites (!) I have seen