I'd like to suggest an enhancement that would help people who are downloading web sites housed on a Windows server. (I couldn't find any discussion of this in the email list archive or any mention in the on-line documentation.)

Since Windows has a case insensitive file system, Apache and IIS running on a Windows box will think the following URLs are referencing the same resource:

http://foo.org/bar.html
http://foo.org/BAR.html

Apache on a *nix box treats these URLs as references to two different resources.

Wget 1.10 running on *nix currently treats the 2 urls as referring to different resources regardless of the operating system housing the web server. Therefore 2 files will be created by wget when only 1 file actually exists on the Windows web server. I ran into this problem when using wget with http://www.harding.edu/hr/.

I'd like to suggest a new parameter --ignore-case that would tell wget to convert all URLs to lowercase when retrieving them. This would allow a more accurate downloading of the files residing on a Windows file system and would require fewer files being downloaded. Of course this would not be as useful for mirroring a site on a *nix box since URLs referring to BAR.html would now break.

A script could also be used to manually go through and delete redundant files (as was suggested in http://www.mail-archive.com/wget@sunsite.dk/msg08373.html to remove the index.html?BLAH files), but it would be nice to save the user this effort.

Regards,
Frank

Reply via email to