I'd like to suggest an enhancement that would help people who are
downloading web sites housed on a Windows server. (I couldn't find any
discussion of this in the email list archive or any mention in the
on-line documentation.)
Since Windows has a case insensitive file system, Apache and IIS running
on a Windows box will think the following URLs are referencing the same
resource:
http://foo.org/bar.html
http://foo.org/BAR.html
Apache on a *nix box treats these URLs as references to two different
resources.
Wget 1.10 running on *nix currently treats the 2 urls as referring to
different resources regardless of the operating system housing the web
server. Therefore 2 files will be created by wget when only 1 file
actually exists on the Windows web server. I ran into this problem when
using wget with http://www.harding.edu/hr/.
I'd like to suggest a new parameter --ignore-case that would tell wget
to convert all URLs to lowercase when retrieving them. This would allow
a more accurate downloading of the files residing on a Windows file
system and would require fewer files being downloaded. Of course this
would not be as useful for mirroring a site on a *nix box since URLs
referring to BAR.html would now break.
A script could also be used to manually go through and delete redundant
files (as was suggested in
http://www.mail-archive.com/wget@sunsite.dk/msg08373.html to remove the
index.html?BLAH files), but it would be nice to save the user this effort.
Regards,
Frank