Re: wget re-download fully downloaded files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maksim Ivanov wrote: I'm trying to download the same file from the same server, command line I use: wget --debug -o log -c -t 0 --load-cookies=cookie_file http://rapidshare.com/files/153131390/Blind-Test.rar Below attached 2 files: log with 1.9.1 and log with 1.10.2 Both logs are made when Blind-Test.rar was already on my HDD. Sorry for some mess in logs, but russian language used on my console. Thanks very much for providing these, Maksim; they were very helpful. (Sorry for getting back to you so late: it's been busy lately). I've confirmed this behavioral difference (though I compared the current development sources against 1.8.2, rather than 1.10.2 to 1.9.1). Your logs involve a 302 redirection before arriving at the real file, but that's just a red herring. The difference is that when 1.9.1 encountered a server that would respond to a byte-range request with 200 (meaning it doesn't know how to send partial contents), but with a Content-Length value matching the size of the local file, then wget would close the connection and not proceed to redownload. 1.10.2, on the other hand, would just re-download it. Actually, I'll have to confirm this, but I think that current Wget will re-download it, but not overwrite the current content, until it arrives at some content corresponding to bytes beyond the current content. I need to investigate further to see if this change was somehow intentional (though I can't imagine what the reasoning would be); if I don't find a good reason not to, I'll revert this behavior. Probably for the 1.12 release, but I might possibly punt it to 1.13 on the grounds that it's not a recent regression (however, it should really be a quick fix, so most likely it'll be in for 1.12). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBfOj7M8hyUobTrERAjNTAJ9ayaKLvN4bYS/7o0kYcQywDvfwNgCfcGzz P9aAwVD6Q/xQuACjU7KF1ng= =m5QO -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brock Murch wrote: I try to keep a mirror of NASA atteph ancilliary data for modis processing. I know that means little, but I have a cron script that runs 2 times a day. Sometimes it works, and others, not so much. The sh script is listed at the end of this email below. As is the contents of the remote ftp server's root and portions fo the log. I don't need all the data on the remote server, only some thus I use --cut-dirs.To make matters stranger, the software (also from NASA) that uses these files, looks for them in a single place on the client machine where the software runs, but needs data from 2 different directories on the remote ftp server. If the data is not on the client machine, the software kindly ftp's the files to the local directory. However, I don't allow write access to that directory as many people use the software and when it is d/l'ed it has the wrong perms for others to use it, thus I mirror the data I need from the ftp site locally. In the script below, there are 2 wget commands, but they are to slightly different directories (MODISA MODIST). I wouldn't recommend that. Using the same output directory for two different source directories seems likely to lead to problems. You'd most likely be better off by pulling to two locations, and then combining them afterwards. I don't know for sure that it _will_ cause problems (except if they happen to have same-named files), as long as .listing files are being properly removed (there were some recently-fixed bugs related to that, I think? ...just appending new listings on top of existing files). It appears to me that the problem occurs if there is a ftp server error, and wget starts a retry. wget goes to the server root, gets the .listing from there for some reason (as opposed to the directory it should go to on the server), and then goes to the dir it needs to mirror and can't find the files (that are listed in the root dir) and creates dirs, and then I get No such file errors and recursive directories created. Any advice would be appreciated. This snippet seems to be the source of the problem: Error in server response, closing control connection. Retrying. - --14:53:53-- ftp://oceans.gsfc.nasa.gov/MODIST/ATTEPH/2002/110/ (try: 2) = `/home1/software/modis/atteph/2002/110/.listing' Connecting to oceans.gsfc.nasa.gov|169.154.128.45|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD not required. == PASV ... done.== LIST ... done. That CWD not required bit is erroneous. I'm 90% sure we fixed this issue recently (though I'm not 100% sure that it went to release: I believe so). I believe we made some related fixes more recently. You provided a great amount of useful information, but one thing that seems to be missing (or I missed it) is the Wget version number. Judging from the log, I'd say it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could you please try to verify whether Wget continues to exhibit this problem in the latest release version? I'll also try to look into this as I have time (but it might be awhile before I can give it some serious attention; it'd be very helpful if you could do a little more legwork). - -- Thanks very much, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBgNh7M8hyUobTrERAuGoAKCCUoBN0sURKA/51x0o4HN59K8+AACfUYuj i8XW58MvjvbS3oy4OsOmbpc= =4kpD -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Micah Cowan wrote: I believe we made some related fixes more recently. You provided a great amount of useful information, but one thing that seems to be missing (or I missed it) is the Wget version number. Judging from the log, I'd say it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could you please try to verify whether Wget continues to exhibit this problem in the latest release version? This problem looks like the one that Mike Grant fixed in October of 2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it should definitely be fixed in 1.11.4. Please let me know if it isn't. - -- Regards, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBgY+7M8hyUobTrERArrRAJ4p4Y7jwWfic0Wul7UBnBXlSzD2XQCePifc kWs00JOULkzJmzozK7lmcfA= =iSL3 -END PGP SIGNATURE-
Re: --mirror and --cut-dirs=2 bug?
Micah, Thanks for your quick attention to this. Yous, I probably forgot to include the version # [EMAIL PROTECTED] atteph]# wget --version GNU Wget 1.10.2 (Red Hat modified) Copyright (C) 2005 Free Software Foundation, Inc. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Originally written by Hrvoje Niksic [EMAIL PROTECTED]. I will see if I can get the newest version for: [EMAIL PROTECTED] atteph]# cat /etc/redhat-release CentOS release 4.2 (Final) I'll let you know how that goes. Brock On Monday 27 October 2008 2:19 pm, Micah Cowan wrote: Micah Cowan wrote: I believe we made some related fixes more recently. You provided a great amount of useful information, but one thing that seems to be missing (or I missed it) is the Wget version number. Judging from the log, I'd say it's 1.10.2 or older; the most recent version of Wget is 1.11.4; could you please try to verify whether Wget continues to exhibit this problem in the latest release version? This problem looks like the one that Mike Grant fixed in October of 2006: http://hg.addictivecode.org/wget/1.11/rev/161aa64e7e8f, so it should definitely be fixed in 1.11.4. Please let me know if it isn't.
More on query matching [Re: Need Design Documents]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 kalpana ravi wrote: Hi Everybody, Hi kalpana, You sent this message to me and [EMAIL PROTECTED]; you wanted [EMAIL PROTECTED] My name is kalpana Ravi.I am planning to contribute to add one of the features listed in https://savannah.gnu.org/bugs/?22089. For that i need to know the design diagrams to understand better. Does anybody know where the UML diagrams are there? We don't have UML diagrams for wget: you'll just have to read the sources (which, unfortunately, are messy). I have some rough-draft diagrams of how I _want_ wget to look eventually, but I'm not done with those, and anyway they wouldn't help you with wget now. Even if you had the UML diagrams for the current state, you'd still need to understand the sources; I really don't think they'd help you much. More important than understanding the design, is understanding what needs to be done; we're still getting a grip on that. My current thought is that there should be a --query-reject (and probably --query-accept, though the former seems far more useful) that should be matched against key/value pairs; thus, --query-reject 'foo=baraction=edit' would reject anything that has foo=bar and action=edit as the key/value pairs in the query string, even if they're not actually next to each other; an example rejected URL might be http://example.com/index.php?a=baction=edittoken=blahfoo=barhergle. Not all query strings are in the key=value format, so --query-reject 'abc1254' would be allowed, and match against the entire query string. For an idea how URL filename matching is currently done, you might check out acceptable src/util.c and the functions it calls, to get an idea of how query matching might be implemented. However, I'll probably tackle this bug myself pretty soon if no one else has managed it yet, as I'm very interested in getting Wget 1.12 finished before long into the new year (ideally, _before_ the new year, but that probably ain't gonna happen). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBgt77M8hyUobTrERAnqrAJ921WjEax0kMFf5Ls70Lvvq6LBItgCeL6wj UWA/2b+kVMw8L8IsVjIAGhI= =WKJk -END PGP SIGNATURE-
Re: wget re-download fully downloaded files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Maksim Ivanov wrote: I'm trying to download the same file from the same server, command line I use: wget --debug -o log -c -t 0 --load-cookies=cookie_file http://rapidshare.com/files/153131390/Blind-Test.rar Below attached 2 files: log with 1.9.1 and log with 1.10.2 Both logs are made when Blind-Test.rar was already on my HDD. Sorry for some mess in logs, but russian language used on my console. This is currently being tracked at https://savannah.gnu.org/bugs/?24662 A similar and related bug report is at https://savannah.gnu.org/bugs/?24642 in which the logs show that rapidshare.com issues also issues erroneous Content-Range information when it responds with a 206 Partial Content, which exercised a different regression* introduced in 1.11.x. * It's not really a regression, since it's desirable behavior: we now determine the size of the content from the content-range header, since content-length is often missing or erroneous for partial content. However, in this instance of server error, it resulted in less-desirable behavior than the previous version of Wget. Anyway... - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBhvA7M8hyUobTrERAty1AKCEscXut6FDXvXlxpuSBtKkii1/awCeJH0M +JcJ5xG67K7CxHBEcV1x/zY= =D2uE -END PGP SIGNATURE-
RE: wget re-download fully downloaded files
Micah Cowan wrote: Actually, I'll have to confirm this, but I think that current Wget will re-download it, but not overwrite the current content, until it arrives at some content corresponding to bytes beyond the current content. I need to investigate further to see if this change was somehow intentional (though I can't imagine what the reasoning would be); if I don't find a good reason not to, I'll revert this behavior. One reason to keep the current behavior is to retain all of the existing content in the event of another partial download that is shorter than the previous one. However, I think that only makes sense if wget is comparing the new content with what is already on disk. Tony