Mauro Tortonesi wrote:
Frank McCown wrote:
It would be nice if wget could handle these mal-adjusted URLs properly
since they do appear from time to time. (In the case of
www.merseyfire.gov.uk, they appear very frequently unfortunately.)
yes, it would be nice. but how are you exactly proposing to achieve this
result? by adding yet another cryptic --ignore-double-dot option?
I don't think it would be necessary to add an option as this should be
the expected behavior all the time.
Right now wget properly handles '..' if it occurs later in the URL.
Example:
wget -r http://www.merseyfire.gov.uk/BLAH/../pages/fire_auth/councillors.htm
correctly changes the URL to request
http://www.merseyfire.gov.uk/pages/fire_auth/councillors.htm
All that would be needed is to ignore all '..' in URLs that appear
directly after the domain name.
wget -r http://www.foo.org/../../../../page.html
should simply request
http://www.foo.org/page.html
BTW, you guys have done a great job with wget and I hope you don't take
any of my suggestions as criticism about your software.
Thanks,
Frank