Re: how do I download a/this URL that redirects at the server side?

2007-12-25 Thread [EMAIL PROTECTED]
phil curb wrote:

 I am downloading a page -r -l 1, so downloading URLs
 on that page, and some of them are like this
 
 http://www.theregister.co.uk/content/4/23517.html 
 
 if I try to download it with wget, I get a 404. Which
 is probably technically correct, the URL probably does
 not exist.
 
 But a browser when I go to that URL, redirects me. 
 I was told it is a server end, probably ASP thing,
 where given that wrong URL, ASP code  - server side -
 generates the page.
 
 It redirects me to
 
 http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f
 bi/ which is probably
 http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f
 bi/index.html
 
 wget can get that, but the html page with all the
 URLs, does not use that URL, and wget seems to not be
 able to download it.
 

it seems that wget in cygwin does download it. As does the wget that
linux users are using.

it is the windows port of wget, that you get from google wget interlog,
that does not work with it.

somebody suggested , man wget, send a fake user agent header (since
browsers are getting it). But I doubt that is it.


The working one returns
stuff like
$ wget http://www.theregister.co.uk/content/4/23517.html
--02:37:20--  http://www.theregister.co.uk/content/4/23517.html

HTTP request sent, awaiting response... 301 Moved Permanently
Location:
http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f
bi/ [following]
...
02:37:21 (952.75 KB/s) - `index.html' saved [32688]



The windows port, wget interlog one, returned
..
Connecting to www.theregister.co.uk:80... connected!
HTTP request sent, awaiting response... 404 Not Found
02:32:09 ERROR 404: Not Found.



I guess the windows port doesn`t deal with 301 error or something


Re: how do I download a/this URL that redirects at the server side?

2007-12-25 Thread Jochen Roderburg
Zitat von [EMAIL PROTECTED] [EMAIL PROTECTED]:

  The windows port, wget interlog one, returned
 ..
 Connecting to www.theregister.co.uk:80... connected!
 HTTP request sent, awaiting response... 404 Not Found
 02:32:09 ERROR 404: Not Found.

 I guess the windows port doesn`t deal with 301 error or something


As always, a full output with -d option would help here.
What do you mean e.g. with wget interlog one  ?

My Windows version works fine with that URL:

C:\ wget -d http://www.theregister.co.uk/content/4/23517.html
DEBUG output created by Wget 1.10.2 on Windows.

--18:49:24--  http://www.theregister.co.uk/content/4/23517.html
   = `23517.html'
Resolving www.theregister.co.uk... seconds 0.00, 212.100.234.54
Caching www.theregister.co.uk = 212.100.234.54
Connecting to www.theregister.co.uk|212.100.234.54|:80... seconds 0.00, connecte
d.
Created socket 760.
Releasing 0x008b4ba0 (new refcount 1).

---request begin---
GET /content/4/23517.html HTTP/1.0
User-Agent: Wget/1.10.2
Accept: */*
Host: www.theregister.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 301 Moved Permanently
Date: Tue, 25 Dec 2007 17:49:32 GMT
Location: http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f
bi/
Cache-Control: max-age=1800
Expires: Tue, 25 Dec 2007 18:19:32 GMT
Content-Length: 378
Connection: close
Content-Type: text/html; charset=iso-8859-1

---response end---
301 Moved Permanently
Location: http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_by_f
bi/ [following]
Closed fd 760
--18:49:25--  http://www.theregister.co.uk/2001/12/31/winxp_hole_misrepresented_
by_fbi/
   = `index.html'
Found www.theregister.co.uk in host_name_addresses_map (008B4BA0)
Connecting to www.theregister.co.uk|212.100.234.54|:80... seconds 0.00, connecte
d.
Created socket 760.
Releasing 0x008b4ba0 (new refcount 1).

---request begin---
GET /2001/12/31/winxp_hole_misrepresented_by_fbi/ HTTP/1.0
User-Agent: Wget/1.10.2
Accept: */*
Host: www.theregister.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Date: Tue, 25 Dec 2007 17:49:32 GMT
Server: Apache/2.0.54 (Debian GNU/Linux)
Accept-Ranges: bytes
Cache-Control: max-age=1800
Expires: Tue, 25 Dec 2007 18:19:32 GMT
Vary: Accept-Encoding,User-Agent
Connection: close
Content-Type: text/html

---response end---
200 OK
Length: unspecified [text/html]

[ = ] 27.556--.--K/s

Closed fd 760
18:49:25 (353.44 KB/s) - `index.html' saved [27556]


Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany


Re: how do I download a/this URL that redirects at the server side?

2007-12-25 Thread Jochen Roderburg
Zitat von Jochen Roderburg [EMAIL PROTECTED]:

 What do you mean e.g. with wget interlog one  ?

Hmm, googled for wget interlog and found a veery old Windows version 1.5.3
from 1999 there, which indeed gets a 404 Error from your host.

I think the server does not like the request header
Host: www.theregister.co.uk:80
with port number which is sent by this version.

You can get a current Windows version on
http://www.christopherlewis.com/WGet/default.htm

Best regards,

Jochen Roderburg
ZAIK/RRZK
University of Cologne
Robert-Koch-Str. 10Tel.:   +49-221/478-7024
D-50931 Koeln  E-Mail: [EMAIL PROTECTED]
Germany