Greetings,

[wget version: 1.10.2]

suppose that i run

  wget -r -l 1 http://some-host.com/index.html

and index.html contains a link like this:

  <A HREF="../directory/file.html">file</A>

then wget, when it tries to download this file, sends the following
HTTP request:

GET /../directory/file.html HTTP/1.0

instead of

GET /directory/file.html HTTP/1.0

and i found that some HTTP servers consider the first request not
equivalent to the second (i.e. in fact the first request generates an
error page instead of the supposed document).

moreover, wget saves that file in

some-host.com/%2E%2E/file.html

instead of just some-host.com/file.html

likewise, i would suppose (but did not check), if wget downloaded some
page with URL http://some-host.com/dir1/file1.html
then if that file1.html contains a relative link to "../dir2/file2.html"
wget will probably issue a request like this:

GET /dir1/../dir2/file2.html HTTP/1.0

instead of

GET /dir2/file2.html HTTP/1.0

Could you please either fix wget to automatically "normalize" the URLs
in HTTP requests to avoid sending the "../", or at least provide an
option which would force wget to normalize URLs?

Best,
v.

Reply via email to