Re: how do I download a/this URL that redirects at the server side?
phil curb wrote: I am downloading a page -r -l 1, so downloading URLs on that page, and some of them are like this http://www.theregister.co.uk/content/4/23517.html if I try to download it with wget, I get a 404. Which is probably technically correct, the URL probably does not exist. But a browser when I go to that URL, redirects me. I was told it is a server end, probably ASP thing, where given that wrong URL, ASP code - server side - generates the page. It redirects me to http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f bi/ which is probably http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f bi/index.html wget can get that, but the html page with all the URLs, does not use that URL, and wget seems to not be able to download it. it seems that wget in cygwin does download it. As does the wget that linux users are using. it is the windows port of wget, that you get from google wget interlog, that does not work with it. somebody suggested , man wget, send a fake user agent header (since browsers are getting it). But I doubt that is it. The working one returns stuff like $ wget http://www.theregister.co.uk/content/4/23517.html --02:37:20-- http://www.theregister.co.uk/content/4/23517.html HTTP request sent, awaiting response... 301 Moved Permanently Location: http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f bi/ [following] ... 02:37:21 (952.75 KB/s) - `index.html' saved [32688] The windows port, wget interlog one, returned .. Connecting to www.theregister.co.uk:80... connected! HTTP request sent, awaiting response... 404 Not Found 02:32:09 ERROR 404: Not Found. I guess the windows port doesn`t deal with 301 error or something
Re: how do I download a/this URL that redirects at the server side?
Zitat von [EMAIL PROTECTED] [EMAIL PROTECTED]: The windows port, wget interlog one, returned .. Connecting to www.theregister.co.uk:80... connected! HTTP request sent, awaiting response... 404 Not Found 02:32:09 ERROR 404: Not Found. I guess the windows port doesn`t deal with 301 error or something As always, a full output with -d option would help here. What do you mean e.g. with wget interlog one ? My Windows version works fine with that URL: C:\ wget -d http://www.theregister.co.uk/content/4/23517.html DEBUG output created by Wget 1.10.2 on Windows. --18:49:24-- http://www.theregister.co.uk/content/4/23517.html = `23517.html' Resolving www.theregister.co.uk... seconds 0.00, 212.100.234.54 Caching www.theregister.co.uk = 212.100.234.54 Connecting to www.theregister.co.uk|212.100.234.54|:80... seconds 0.00, connecte d. Created socket 760. Releasing 0x008b4ba0 (new refcount 1). ---request begin--- GET /content/4/23517.html HTTP/1.0 User-Agent: Wget/1.10.2 Accept: */* Host: www.theregister.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 301 Moved Permanently Date: Tue, 25 Dec 2007 17:49:32 GMT Location: http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f bi/ Cache-Control: max-age=1800 Expires: Tue, 25 Dec 2007 18:19:32 GMT Content-Length: 378 Connection: close Content-Type: text/html; charset=iso-8859-1 ---response end--- 301 Moved Permanently Location: http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f bi/ [following] Closed fd 760 --18:49:25-- http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_ by_fbi/ = `index.html' Found www.theregister.co.uk in host_name_addresses_map (008B4BA0) Connecting to www.theregister.co.uk|212.100.234.54|:80... seconds 0.00, connecte d. Created socket 760. Releasing 0x008b4ba0 (new refcount 1). ---request begin--- GET /2001/12/31/winxp_hole_misrepresented_by_fbi/ HTTP/1.0 User-Agent: Wget/1.10.2 Accept: */* Host: www.theregister.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Date: Tue, 25 Dec 2007 17:49:32 GMT Server: Apache/2.0.54 (Debian GNU/Linux) Accept-Ranges: bytes Cache-Control: max-age=1800 Expires: Tue, 25 Dec 2007 18:19:32 GMT Vary: Accept-Encoding,User-Agent Connection: close Content-Type: text/html ---response end--- 200 OK Length: unspecified [text/html] [ = ] 27.556--.--K/s Closed fd 760 18:49:25 (353.44 KB/s) - `index.html' saved [27556] Best regards, Jochen Roderburg ZAIK/RRZK University of Cologne Robert-Koch-Str. 10Tel.: +49-221/478-7024 D-50931 Koeln E-Mail: [EMAIL PROTECTED] Germany
Re: how do I download a/this URL that redirects at the server side?
Zitat von Jochen Roderburg [EMAIL PROTECTED]: What do you mean e.g. with wget interlog one ? Hmm, googled for wget interlog and found a veery old Windows version 1.5.3 from 1999 there, which indeed gets a 404 Error from your host. I think the server does not like the request header Host: www.theregister.co.uk:80 with port number which is sent by this version. You can get a current Windows version on http://www.christopherlewis.com/WGet/default.htm Best regards, Jochen Roderburg ZAIK/RRZK University of Cologne Robert-Koch-Str. 10Tel.: +49-221/478-7024 D-50931 Koeln E-Mail: [EMAIL PROTECTED] Germany