Hello! Here are some ideas for wget:
1. wget to handle compressed files well. Some web sites hold htmls as compressed htmls so that auto recursive downloaders like wget won't work. The browsers just open the data and show it. handle that too (libz.so and maybe other compression libraries). 2. wget to remove files which were not fully retrived. Lets say that I'm mirroring a site and I'm out of disk space. The last file downloaded will be half finished. When I free the disk space and then continue the mirror (without downloading everything again ofcourse) then wget will think that I already downloaded the last file which is not right since the file is buggy (especially true if the file is binary). Have that as a command line switch where people who do mirroring could use it and know for certain that every file on disk was fully retrieved. 3. user and password issues cross site. When going to a site that has absolute links but is protected by a user and a password then browsers handle that well while wget doesnt. For instance: www.site.com/my_private_stuff/index.html is protected by a user and a password. wget http://user:[EMAIL PROTECTED]/my_private_stuff/index.html will work well. But if index.html has a reference of this type: http://www.site.com/my_private_stuff/more_private_stuff.html then wget will not use the user/password to access that and won't be able to download it. A solution could be for the user to specify (using a command line switch) that user/password should always be sent to the specified site even if the link is absolute and does not contain a user/password. 4. wget to create stub files for dead resources on replicated host. It is no secret that a lot of sites have dead links to areas in the site. When mirroring the large sites I don't want to ask the remote site again for links that he already told me he doesn't have. Wget could leave a file on disk that is a stub file - a small file which is only there so that wget will not attempt to fetch that resource in the future. This behaviour or it's lack thereof could be controlled using a command line switch (very usefull for mirroring large sites). This way continued replication of hosts could be a lot faster. Please CC me on replies since I'm not subscribed to the list. Cheers, and thanks for a great tool. Mark -- Name: Mark Veltzer Title: Research and Development, Meta Ltd. Address: Habikaa 17/3, Kiriat-Sharet, city.holon, Gush-Dan, country.israel 58495 Phone: +972-03-5581310 Fax: +972-03-5581310 Email: mailto:[EMAIL PROTECTED] Homepage: http://www.veltzer.org OpenSource: CPAN, user: VELTZER, mailto:[EMAIL PROTECTED], url: http://search.cpan.org/author/VELTZER/ Public key: http://www.veltzer.org/ascx/public_key.asc, wwwkeys.pgp.net, 0xC71E5D38
