Re: Myriad merges
Zitat von Micah Cowan [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Micah Cowan wrote: Jochen Roderburg wrote: Unfortunately, however, a new regression crept in: In the case timestamping=on, content-disposition=off, no local file present it does now no HEAD (correctly), but two (!!) GETS and transfers the file two times. Ha! Okay, gotta get that one fixed... That should now be fixed. It's hard to be confident I'm not introducing more issues, with the state of http.c being what it is. So please beat on it! :) This time it survived the beating ;-) Seems that we are finally converging. The double GET is gone, and my other test cases still work as expected, including the -c variants. One issue I'm still aware of is that, if -c and -e contentdisposition=yes are specified for a file already fully downloaded, HEAD will be sent for the contentdisposition, and yet a GET will still be sent to fetch the remainder of the -c (resulting in a 416 Requested Range Not Satisfiable). Ideally, Wget should be smart enough to see from the HEAD that the Content-Length already matches the file's size, even though -c no longer requires a HEAD (again). We _got_ one, we should put it to good use. However, I'm not worried about addressing this before 1.11 releases; it's a minor complaint, and with content-disposition's current implementation, users are already going to be expecting an extra HEAD round-trip in the general case; what's a few extra? Agreed. I can confirm this behaviour, too. And I would also consider this a minor issue, at least the result is correct. I have also not made many tests where content-disposition is really used for the filename. Those few real-live cases that I have at hand do not send any special headers like timestamnps and filelengths with it. At least the local filename is set correctly and is correctly renamed if it exists. Best regards and thanks again for the repair of all the issues that I found, Jochen Roderburg
Re: Myriad merges
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Zitat von Micah Cowan [EMAIL PROTECTED]: It's hard to be confident I'm not introducing more issues, with the state of http.c being what it is. So please beat on it! :) This time it survived the beating ;-) Yay!! :D One issue I'm still aware of is that, if -c and -e contentdisposition=yes are specified for a file already fully downloaded, HEAD will be sent for the contentdisposition, and yet a GET will still be sent to fetch the remainder of the -c (resulting in a 416 Requested Range Not Satisfiable). Ideally, Wget should be smart enough to see from the HEAD that the Content-Length already matches the file's size, even though -c no longer requires a HEAD (again). We _got_ one, we should put it to good use. However, I'm not worried about addressing this before 1.11 releases; it's a minor complaint, and with content-disposition's current implementation, users are already going to be expecting an extra HEAD round-trip in the general case; what's a few extra? Agreed. I can confirm this behaviour, too. And I would also consider this a minor issue, at least the result is correct. I have also not made many tests where content-disposition is really used for the filename. Those few real-live cases that I have at hand do not send any special headers like timestamnps and filelengths with it. At least the local filename is set correctly and is correctly renamed if it exists. And I expect there are probably several bugs lurking here (which is why I've designated it as experimental). After the 1.11 release I want to revisit that section, and look more closely at what happens if we get a Content-Disposition at the last minute, especially if it specifies a local file name that we are rejecting. I'd prefer that it not use HEAD at all for that, as I expect Content-Disposition is rare enough that it doesn't justify issuing HEAD just to see if its present; and in any case it probably frequently isn't sent with HEAD responses, but only for GET. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHElVE7M8hyUobTrERCOG5AJ9xsAPlFyhXXC28E5TeqnoKXWuLPACbBAFN SfRAf4ZfMFwvYXDKlcDV3dA= =ZHVD -END PGP SIGNATURE-
Re: Myriad merges
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Micah Cowan wrote: Jochen Roderburg wrote: Unfortunately, however, a new regression crept in: In the case timestamping=on, content-disposition=off, no local file present it does now no HEAD (correctly), but two (!!) GETS and transfers the file two times. Ha! Okay, gotta get that one fixed... That should now be fixed. It's hard to be confident I'm not introducing more issues, with the state of http.c being what it is. So please beat on it! :) One issue I'm still aware of is that, if -c and -e contentdisposition=yes are specified for a file already fully downloaded, HEAD will be sent for the contentdisposition, and yet a GET will still be sent to fetch the remainder of the -c (resulting in a 416 Requested Range Not Satisfiable). Ideally, Wget should be smart enough to see from the HEAD that the Content-Length already matches the file's size, even though -c no longer requires a HEAD (again). We _got_ one, we should put it to good use. However, I'm not worried about addressing this before 1.11 releases; it's a minor complaint, and with content-disposition's current implementation, users are already going to be expecting an extra HEAD round-trip in the general case; what's a few extra? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHCy5M7M8hyUobTrERCBAnAJ4kvG/5zlr23dr2aAwEpyQr+U1VmACeIvjn nUIFmAfUpV0WqpzAZMxgu00= =/XdC -END PGP SIGNATURE-
Re: Myriad merges
Zitat von Micah Cowan [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Zitat von Micah Cowan [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Yes, this one is still open, and the other one that wget -c always starts at 0 again. Do you mean the (local 0) thing? That should have been fixed in 674cc935f7c8 [subversion r2382]. Can you re-check? No, that is ok now. I saw my little patch for this included as of this weekend ;-) The one I mean is: wget -c continuation is not done in the HEADless cases http://www.mail-archive.com/wget%40sunsite.dk/msg10265.html ff. This should be fixed now, along with the timestamping issues. And now the test results of this weekend ;-) First the good news, the recent problems *are* fixed now, namely: In the case with default options (timestamping=off, content-disposition=off) we have now: The timestamps on the downloaded files are set correctly. Continued HTTP transfer (wget -c) is done correctly. Unfortunately, however, a new regression crept in: In the case timestamping=on, content-disposition=off, no local file present it does now no HEAD (correctly), but two (!!) GETS and transfers the file two times. All other combinations of these options and conditions are OK. Best regards, J.Roderburg
Re: Myriad merges
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Unfortunately, however, a new regression crept in: In the case timestamping=on, content-disposition=off, no local file present it does now no HEAD (correctly), but two (!!) GETS and transfers the file two times. Ha! Okay, gotta get that one fixed... - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHCQrf7M8hyUobTrERCGZYAJ4s/wKsoi7pnPjMYYuD5Xn1QZ1ttgCeIbV9 KbiJKfmK32Uil6/00SJaWcY= =CViU -END PGP SIGNATURE-
Re: Myriad merges
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Zitat von Micah Cowan [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Yes, this one is still open, and the other one that wget -c always starts at 0 again. Do you mean the (local 0) thing? That should have been fixed in 674cc935f7c8 [subversion r2382]. Can you re-check? No, that is ok now. I saw my little patch for this included as of this weekend ;-) The one I mean is: wget -c continuation is not done in the HEADless cases. http://www.mail-archive.com/wget%40sunsite.dk/msg10265.html ff. This should be fixed now, along with the timestamping issues. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHBdtP7M8hyUobTrERCIg5AJ4oZ9Yy177t6XJ7P3XAugNVRZXjkwCcDoOu HQ2j7vXqsh0HflkjhNkmASg= =RKxE -END PGP SIGNATURE-
Re: Myriad merges
Zitat von Micah Cowan [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: And now, for a change, a case, that works now (better) ;-) This is an example where a HEAD request gets a 500 Error response. Wget default options again, but contentdisposition=yes to force a HEAD. wget.111-svn-0709 --debug -e contentdisposition = yes http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109 Setting contentdisposition (contentdisposition) to yes DEBUG output created by Wget 1.10+devel on linux-gnu. --15:26:54-- http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109 Resolving www.eudora.com... 199.106.114.30 Caching www.eudora.com = 199.106.114.30 Connecting to www.eudora.com|199.106.114.30|:80... connected. Created socket 3. Releasing 0x080888d8 (new refcount 1). ---request begin--- HEAD /cgi-bin/export.cgi?productid=EUDORA_win_7109 HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 500 Server Error Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:54 GMT Content-length: 305 Content-type: text/html Connection: keep-alive ---response end--- 500 Server Error Registered socket 3 for persistent reuse. --15:26:56-- (try: 2) http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109 Disabling further reuse of socket 3. Closed fd 3 Found www.eudora.com in host_name_addresses_map (0x80888d8) Connecting to www.eudora.com|199.106.114.30|:80... connected. Created socket 3. Releasing 0x080888d8 (new refcount 1). ---request begin--- GET /cgi-bin/export.cgi?productid=EUDORA_win_7109 HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 Moved Temporarily Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:55 GMT Location: http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe Content-length: 0 Connection: keep-alive ---response end--- 302 Moved Temporarily Registered socket 3 for persistent reuse. Location: http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe [following] Skipping 0 bytes of body: [] done. --15:26:56-- http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe Reusing existing connection to www.eudora.com:80. Reusing fd 3. ---request begin--- HEAD /download/eudora/windows/7.1/Eudora_7.1.0.9.exe HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:56 GMT Content-type: application/octet-stream Last-modified: Thu, 05 Oct 2006 18:45:18 GMT Content-length: 17416184 Accept-ranges: bytes Connection: keep-alive ---response end--- 200 OK Length: 17416184 (17M) [application/octet-stream] --15:26:56-- http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe Reusing existing connection to www.eudora.com:80. Reusing fd 3. ---request begin--- GET /download/eudora/windows/7.1/Eudora_7.1.0.9.exe HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:56 GMT Content-type: application/octet-stream Last-modified: Thu, 05 Oct 2006 18:45:18 GMT Content-length: 17416184 Accept-ranges: bytes Connection: keep-alive ---response end--- 200 OK Length: 17416184 (17M) [application/octet-stream] Saving to: `Eudora_7.1.0.9.exe' 100%[=] 17,416,184 397K/s in 44s 15:27:40 (386 KB/s) - `Eudora_7.1.0.9.exe' saved [17416184/17416184] ls -l Eudora_7.1.0.9.exe -rw-r- 1 a0045 RRZK 17416184 05.10.2006 20:45 Eudora_7.1.0.9.exe This seems also to use the only available source for the timestamp, the response to the GET request. Sorry to reproduce that in full, but I thought it might be helpful to see the full transcript again, since you sent this a while ago. I was going back through this thread to refresh my memory on some things. I noticed, and wanted to point out, that actually, the GET request was _not_ the only available source for the timestamp; HEAD was answered with a 500, but only the first one. The HEAD issued after the redirect gives a timestamp. Yes indeed, you are right, I overlooked the second HEAD after the redirect ;-) My main message here was of course that the changes regarding the 500 error response to the HEAD
Re: Myriad merges
Zitat von Micah Cowan [EMAIL PROTECTED]: The problem you pointed out that causes the failure to properly timestamp when HEADs aren't issued seems, to my reading, to be simply regressable for the fix. Mauro's fixes don't look as if they depend upon that line being there, but I'm waiting for him to have a chance to look over it before I commit to that as the fix (both he and I have been busy lately). Yes, this one is still open, and the other one that wget -c always starts at 0 again. On the other hand, with the combination of options that I usually use in my daily wget practice (timestampng and content-disposition on) everything works fine now ;-) I've also got trying to deal with content-disposition issues for when HEAD fails, on my todo list. I have not done real-life tests with content-disposition cases, but I have also some feeling that not all combination with other options (like timestamping and continuation) work with these yet. These may be minor issues again, as usually content-disposition is used when the contents are generated somehow dynmically and there are no static timestamps and filelengths at all. Best regards, J. Roderburg
Re: Myriad merges
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Zitat von Micah Cowan [EMAIL PROTECTED]: The problem you pointed out that causes the failure to properly timestamp when HEADs aren't issued seems, to my reading, to be simply regressable for the fix. Mauro's fixes don't look as if they depend upon that line being there, but I'm waiting for him to have a chance to look over it before I commit to that as the fix (both he and I have been busy lately). Yes, this one is still open, and the other one that wget -c always starts at 0 again. Do you mean the (local 0) thing? That should have been fixed in 674cc935f7c8 [subversion r2382]. Can you re-check? On the other hand, with the combination of options that I usually use in my daily wget practice (timestampng and content-disposition on) everything works fine now ;-) I've also got trying to deal with content-disposition issues for when HEAD fails, on my todo list. I have not done real-life tests with content-disposition cases, but I have also some feeling that not all combination with other options (like timestamping and continuation) work with these yet. These may be minor issues again, as usually content-disposition is used when the contents are generated somehow dynmically and there are no static timestamps and filelengths at all. Yes. Currently Content-Disposition is not working when the HEAD fails or doesn't include Content-Disposition, which is problematic since this is a very frequent case. However, I think the necessary changes would be a bit invasive, and I'm not prepared to make them in time for the 1.11 release; so in essence, Content-Disposition, for now, will sometimes work and sometimes not. It'll be nice to fix this in 1.12, along with implementing changes to reduce the number of HEADs we issue (I'd prefer to skip HEAD completely for just content-disposition, and assume we'll accept it, and terminate the connection if we won't; at any rate, it will need some discussion, most of which would probably be more appropriate at the Wgiki). - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHAT+27M8hyUobTrERCAvRAKCANM2nkxvZAN1CZYRmMKlo8FSDrQCeNwWj aUA37hJ+EaZ/fI6pBNL7P68= =u5FR -END PGP SIGNATURE-
Re: Myriad merges
Zitat von Micah Cowan [EMAIL PROTECTED]: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Yes, this one is still open, and the other one that wget -c always starts at 0 again. Do you mean the (local 0) thing? That should have been fixed in 674cc935f7c8 [subversion r2382]. Can you re-check? No, that is ok now. I saw my little patch for this included as of this weekend ;-) The one I mean is: wget -c continuation is not done in the HEADless cases. http://www.mail-archive.com/wget%40sunsite.dk/msg10265.html ff. Regards, J.Roderburg
Re: Myriad merges
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Zitat von Micah Cowan [EMAIL PROTECTED]: Jochen Roderburg wrote: Yes, this one is still open, and the other one that wget -c always starts at 0 again. Do you mean the (local 0) thing? That should have been fixed in 674cc935f7c8 [subversion r2382]. Can you re-check? No, that is ok now. I saw my little patch for this included as of this weekend ;-) The one I mean is: wget -c continuation is not done in the HEADless cases. http://www.mail-archive.com/wget%40sunsite.dk/msg10265.html ff. Ah, thanks for the reminder. Apparently I'd forgotten to track that. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHAU/t7M8hyUobTrERCE3jAJ0TJrS+83Tv5qZK4TZqvyZBcEKwpACghJu8 gXkWE9BP42KMNXE55ce2v7o= =k857 -END PGP SIGNATURE-
Re: Myriad merges
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: And now, for a change, a case, that works now (better) ;-) This is an example where a HEAD request gets a 500 Error response. Wget default options again, but contentdisposition=yes to force a HEAD. wget.111-svn-0709 --debug -e contentdisposition = yes http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109 Setting contentdisposition (contentdisposition) to yes DEBUG output created by Wget 1.10+devel on linux-gnu. --15:26:54-- http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109 Resolving www.eudora.com... 199.106.114.30 Caching www.eudora.com = 199.106.114.30 Connecting to www.eudora.com|199.106.114.30|:80... connected. Created socket 3. Releasing 0x080888d8 (new refcount 1). ---request begin--- HEAD /cgi-bin/export.cgi?productid=EUDORA_win_7109 HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 500 Server Error Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:54 GMT Content-length: 305 Content-type: text/html Connection: keep-alive ---response end--- 500 Server Error Registered socket 3 for persistent reuse. --15:26:56-- (try: 2) http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109 Disabling further reuse of socket 3. Closed fd 3 Found www.eudora.com in host_name_addresses_map (0x80888d8) Connecting to www.eudora.com|199.106.114.30|:80... connected. Created socket 3. Releasing 0x080888d8 (new refcount 1). ---request begin--- GET /cgi-bin/export.cgi?productid=EUDORA_win_7109 HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 Moved Temporarily Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:55 GMT Location: http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe Content-length: 0 Connection: keep-alive ---response end--- 302 Moved Temporarily Registered socket 3 for persistent reuse. Location: http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe [following] Skipping 0 bytes of body: [] done. --15:26:56-- http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe Reusing existing connection to www.eudora.com:80. Reusing fd 3. ---request begin--- HEAD /download/eudora/windows/7.1/Eudora_7.1.0.9.exe HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:56 GMT Content-type: application/octet-stream Last-modified: Thu, 05 Oct 2006 18:45:18 GMT Content-length: 17416184 Accept-ranges: bytes Connection: keep-alive ---response end--- 200 OK Length: 17416184 (17M) [application/octet-stream] --15:26:56-- http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe Reusing existing connection to www.eudora.com:80. Reusing fd 3. ---request begin--- GET /download/eudora/windows/7.1/Eudora_7.1.0.9.exe HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:56 GMT Content-type: application/octet-stream Last-modified: Thu, 05 Oct 2006 18:45:18 GMT Content-length: 17416184 Accept-ranges: bytes Connection: keep-alive ---response end--- 200 OK Length: 17416184 (17M) [application/octet-stream] Saving to: `Eudora_7.1.0.9.exe' 100%[=] 17,416,184 397K/s in 44s 15:27:40 (386 KB/s) - `Eudora_7.1.0.9.exe' saved [17416184/17416184] ls -l Eudora_7.1.0.9.exe -rw-r- 1 a0045 RRZK 17416184 05.10.2006 20:45 Eudora_7.1.0.9.exe This seems also to use the only available source for the timestamp, the response to the GET request. Sorry to reproduce that in full, but I thought it might be helpful to see the full transcript again, since you sent this a while ago. I was going back through this thread to refresh my memory on some things. I noticed, and wanted to point out, that actually, the GET request was _not_ the only available source for the timestamp; HEAD was answered with a 500, but only the first one. The HEAD issued after the redirect gives a timestamp. The problem you pointed out that causes the failure to properly timestamp when HEADs aren't issued seems, to my reading, to be simply regressable for the fix. Mauro's fixes don't look as if they depend upon that line being there, but I'm waiting for him to have a chance to look over it before I commit to that as the fix (both he and I have
Re: Myriad merges
Zitat von Micah Cowan [EMAIL PROTECTED]: And the only other code I found which parses the remote date is in the part which handles the logic around the timestamping option. In older versions this was a conditional block starting with if (!got_head) ... , now it starts with if (send_head_first !got_head) ... Could this mean that this code is now only executed when a HEAD response is examined ?? Hm... that change came from the Content-Disposition fixes. I'll investigate. OK, but I hope I am still allowed to help a little with the investigation ;-) I made a few more tests and some debugging now and I am convinced now that this if send_head_first is definitely the immediate cause for the new problem that the remote timestamp is not picked up on GET-only requests. This change is relatively new, it had not been in the next-to-last svn version that I compiled a month ago. Certainly there must have been a reason for this but one sure side effect is that this if-block of code is not executed any longer for the HEAD-less case. Btw, continued downloads (wget -c) are also broken now in this case (probably for the same reason). I meanwhile also believe that the primary issue we are trying to repair (first found remote time-stamp is used for local and not last found) has always been there. Only a year ago when the contentdisposition stuff was included and more HEAD requests were made I really noticed it. I remember that it had always been more difficult to get a newer file downloaded through the proxy-cache when a local file was present, but as these cases were rare, I had never tried to investigate this before ;-) Jochen Roderburg ZAIK/RRZK University of Cologne Robert-Koch-Str. 10Tel.: +49-221/478-7024 D-50931 Koeln E-Mail: [EMAIL PROTECTED] Germany
Re: Myriad merges
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Zitat von Micah Cowan [EMAIL PROTECTED]: Hm... that change came from the Content-Disposition fixes. I'll investigate. OK, but I hope I am still allowed to help a little with the investigation ;-) Oh, I'm always very, _very_ happy to get help. :D I made a few more tests and some debugging now and I am convinced now that this if send_head_first is definitely the immediate cause for the new problem that the remote timestamp is not picked up on GET-only requests. snip Btw, continued downloads (wget -c) are also broken now in this case (probably for the same reason). Really? I've been using this Wget version for a bit, and haven't noticed this problem. Could you give an invocation that produces this problem? I meanwhile also believe that the primary issue we are trying to repair (first found remote time-stamp is used for local and not last found) has always been there. Only a year ago when the contentdisposition stuff was included and more HEAD requests were made I really noticed it. I remember that it had always been more difficult to get a newer file downloaded through the proxy-cache when a local file was present, but as these cases were rare, I had never tried to investigate this before ;-) I'm not surprised to hear this; it didn't look like it had ever been working before... and it's not a common situation, so I'm not surprised it wasn't caught earlier, either. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG49HJ7M8hyUobTrERCJvXAJ0QHN8/8e9EcWUFV10RIWOIisRrnwCggzqI 62SZmq7si3/p3be41IVIjj0= =TBid -END PGP SIGNATURE-
Re: Myriad merges
Zitat von Micah Cowan [EMAIL PROTECTED]: Btw, continued downloads (wget -c) are also broken now in this case (probably for the same reason). Really? I've been using this Wget version for a bit, and haven't noticed this problem. Could you give an invocation that produces this problem? I'll make a new thread for this problem, as it meanwhile looks like a different case again ;-) J.Roderburg
Re: Myriad merges
Zitat von Micah Cowan [EMAIL PROTECTED]: Zitat von Jochen Roderburg [EMAIL PROTECTED]: So it looks now to me, that the new error (local timestamp not set to remote) only occurs in the cases when no HEAD is used. This (new) piece of code in http.c (line 2666 ff.) looks very suspicious to me, especially the time_came_from_head bit: /* Reparse time header, in case it's changed. */ if (time_came_from_head hstat.remote_time hstat.remote_time[0]) { newtmr = http_atotm (hstat.remote_time); if (newtmr != -1) tmr = newtmr; } The intent behind this code is to ensure that we parse the Last-Modified date again, even if we already parsed Last-Modified, if the last one we parsed came from the HEAD. Hmm, yes, but that is not what it does ;-) I mean, it does not parse the date again even if it was already parsed, but only if it was already parsed. So especially it does *not* parse it if there had been no HEAD at all before. And the only other code I found which parses the remote date is in the part which handles the logic around the timestamping option. In older versions this was a conditional block starting with if (!got_head) ... , now it starts with if (send_head_first !got_head) ... Could this mean that this code is now only executed when a HEAD response is examined ?? Anyway, I think everything is ok again when you just eliminate this time_came_from_head logic completely. The above piece of code then just sets the local timestamp to the last remote timestamp which was seen and does not care from which HEAD or GET requests it actually came. Jochen Roderburg ZAIK/RRZK University of Cologne Robert-Koch-Str. 10 Tel.: +49-221/478-7024 D-50931 Koeln E-Mail: [EMAIL PROTECTED] Germany
Re: Myriad merges
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Zitat von Micah Cowan [EMAIL PROTECTED]: Zitat von Jochen Roderburg [EMAIL PROTECTED]: So it looks now to me, that the new error (local timestamp not set to remote) only occurs in the cases when no HEAD is used. This (new) piece of code in http.c (line 2666 ff.) looks very suspicious to me, especially the time_came_from_head bit: /* Reparse time header, in case it's changed. */ if (time_came_from_head hstat.remote_time hstat.remote_time[0]) { newtmr = http_atotm (hstat.remote_time); if (newtmr != -1) tmr = newtmr; } The intent behind this code is to ensure that we parse the Last-Modified date again, even if we already parsed Last-Modified, if the last one we parsed came from the HEAD. Hmm, yes, but that is not what it does ;-) I mean, it does not parse the date again even if it was already parsed, but only if it was already parsed. So especially it does *not* parse it if there had been no HEAD at all before. That's actually what I said it does (somewhat clumsily: if the last one we parsed came from the HEAD). Yes, as I said, if there had been no HEAD before, it should already have been parsed in earlier code, and no action should be necessary. That's what time_came_from_head is for, to prevent us from parsing it twice from GET. And the only other code I found which parses the remote date is in the part which handles the logic around the timestamping option. In older versions this was a conditional block starting with if (!got_head) ... , now it starts with if (send_head_first !got_head) ... Could this mean that this code is now only executed when a HEAD response is examined ?? Hm... that change came from the Content-Disposition fixes. I'll investigate. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG4XQO7M8hyUobTrERCEe3AJ9ywjfcxZl0a9vAQSWaBspuPsAXmQCdEflk VQvp1HYcvm2gCE0ogJiD04I= =SDe0 -END PGP SIGNATURE-
Re: Myriad merges
Zitat von Jochen Roderburg [EMAIL PROTECTED]: So it looks now to me, that the new error (local timestamp not set to remote) only occurs in the cases when no HEAD is used. This (new) piece of code in http.c (line 2666 ff.) looks very suspicious to me, especially the time_came_from_head bit: /* Reparse time header, in case it's changed. */ if (time_came_from_head hstat.remote_time hstat.remote_time[0]) { newtmr = http_atotm (hstat.remote_time); if (newtmr != -1) tmr = newtmr; } Other than that I have used the current svn version now a few days more with all my work and I would say all the issues that had bothered me in the recent development cycles are corrected now. I'll see, however, that I can make a few more systematic tests with some combination of the relevant options which I usually do not use in my practice. What I have seen new are some cosmetic issues in the program output when HTTP restarts happen. Such restarts are normally rare these days, but I have some sites far away where suddenly bad connections and timeouts reappeared. One looks pretty simple, I think I can prepare a patch myself on the weekend when I have access to my Linux development system at home again. I'll report details in separate mail later, when I have examples for the cases. Best regards, Jochen Roderburg ZAIK/RRZK University of Cologne Robert-Koch-Str. 10 Tel.: +49-221/478-7024 D-50931 Koeln E-Mail: [EMAIL PROTECTED] Germany
Re: Myriad merges
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: Zitat von Jochen Roderburg [EMAIL PROTECTED]: So it looks now to me, that the new error (local timestamp not set to remote) only occurs in the cases when no HEAD is used. This (new) piece of code in http.c (line 2666 ff.) looks very suspicious to me, especially the time_came_from_head bit: /* Reparse time header, in case it's changed. */ if (time_came_from_head hstat.remote_time hstat.remote_time[0]) { newtmr = http_atotm (hstat.remote_time); if (newtmr != -1) tmr = newtmr; } The intent behind this code is to ensure that we parse the Last-Modified date again, even if we already parsed Last-Modified, if the last one we parsed came from the HEAD. This whole block of code that you've pasted is new, not just the surrounding if clause; if we never sent a HEAD but only a GET, the Last-Modified _should_ have been parsed in code that appears before here. ...but, obviously, things aren't working quite as they should, so I need to look into it more closely. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG4DD77M8hyUobTrERCFf0AJ9MVT0+eTCidH63YTBuHKrXTmA+3QCeIzav x1bSxRx1I3I1eXnvz8Pv384= =EfI4 -END PGP SIGNATURE-
Re: Myriad merges
Zitat von Micah Cowan [EMAIL PROTECTED]: Hm, that should not be. It should definitely set the timestamp if it gets downloaded... I'll investigate. OOC, was there a specific resource you tested against (just in case I have difficulty reproducing)? Not a very specific one, just used our university homepage for this test ;-) Here a full protocol: ls -l index.html ls: cannot access index.html: No such file or directory HEAD http://www.uni-koeln.de/index.html 200 OK Connection: close Date: Mon, 03 Sep 2007 11:44:59 GMT Accept-Ranges: bytes Server: Apache/2.0.59 Content-Language: de Content-Type: text/html Last-Modified: Mon, 03 Sep 2007 11:04:09 GMT Client-Date: Mon, 03 Sep 2007 11:44:59 GMT Client-Response-Num: 1 wget.111-svn-0709 --debug http://www.uni-koeln.de/index.html DEBUG output created by Wget 1.10+devel on linux-gnu. --13:45:12-- http://www.uni-koeln.de/index.html Resolving www.uni-koeln.de... 134.95.19.39 Caching www.uni-koeln.de = 134.95.19.39 Connecting to www.uni-koeln.de|134.95.19.39|:80... connected. Created socket 3. Releasing 0x08088820 (new refcount 1). ---request begin--- GET /index.html HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.uni-koeln.de Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Date: Mon, 03 Sep 2007 11:45:12 GMT Server: Apache/2.0.59 Last-Modified: Mon, 03 Sep 2007 11:04:09 GMT Accept-Ranges: bytes Content-Type: text/html Content-Language: de Connection: close ---response end--- 200 OK Length: unspecified [text/html] Saving to: `index.html' [ = ] 9,131 --.-K/s in 0s Closed fd 3 13:45:12 (207 MB/s) - `index.html' saved [9131] ls -l index.html -rw-r- 1 a0045 RRZK 9131 03.09.2007 13:45 index.html date Mon Sep 3 13:45:24 CEST 2007 Jochen Roderburg ZAIK/RRZK University of Cologne Robert-Koch-Str. 10 Tel.: +49-221/478-7024 D-50931 Koeln E-Mail: [EMAIL PROTECTED] Germany
Re: Myriad merges
And now, for a change, a case, that works now (better) ;-) This is an example where a HEAD request gets a 500 Error response. Wget default options again, but contentdisposition=yes to force a HEAD. wget.111-svn-0709 --debug -e contentdisposition = yes http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109 Setting contentdisposition (contentdisposition) to yes DEBUG output created by Wget 1.10+devel on linux-gnu. --15:26:54-- http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109 Resolving www.eudora.com... 199.106.114.30 Caching www.eudora.com = 199.106.114.30 Connecting to www.eudora.com|199.106.114.30|:80... connected. Created socket 3. Releasing 0x080888d8 (new refcount 1). ---request begin--- HEAD /cgi-bin/export.cgi?productid=EUDORA_win_7109 HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 500 Server Error Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:54 GMT Content-length: 305 Content-type: text/html Connection: keep-alive ---response end--- 500 Server Error Registered socket 3 for persistent reuse. --15:26:56-- (try: 2) http://www.eudora.com/cgi-bin/export.cgi?productid=EUDORA_win_7109 Disabling further reuse of socket 3. Closed fd 3 Found www.eudora.com in host_name_addresses_map (0x80888d8) Connecting to www.eudora.com|199.106.114.30|:80... connected. Created socket 3. Releasing 0x080888d8 (new refcount 1). ---request begin--- GET /cgi-bin/export.cgi?productid=EUDORA_win_7109 HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 Moved Temporarily Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:55 GMT Location: http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe Content-length: 0 Connection: keep-alive ---response end--- 302 Moved Temporarily Registered socket 3 for persistent reuse. Location: http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe [following] Skipping 0 bytes of body: [] done. --15:26:56-- http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe Reusing existing connection to www.eudora.com:80. Reusing fd 3. ---request begin--- HEAD /download/eudora/windows/7.1/Eudora_7.1.0.9.exe HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:56 GMT Content-type: application/octet-stream Last-modified: Thu, 05 Oct 2006 18:45:18 GMT Content-length: 17416184 Accept-ranges: bytes Connection: keep-alive ---response end--- 200 OK Length: 17416184 (17M) [application/octet-stream] --15:26:56-- http://www.eudora.com/download/eudora/windows/7.1/Eudora_7.1.0.9.exe Reusing existing connection to www.eudora.com:80. Reusing fd 3. ---request begin--- GET /download/eudora/windows/7.1/Eudora_7.1.0.9.exe HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: www.eudora.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 200 OK Server: Netscape-Enterprise/6.0 Date: Mon, 03 Sep 2007 13:26:56 GMT Content-type: application/octet-stream Last-modified: Thu, 05 Oct 2006 18:45:18 GMT Content-length: 17416184 Accept-ranges: bytes Connection: keep-alive ---response end--- 200 OK Length: 17416184 (17M) [application/octet-stream] Saving to: `Eudora_7.1.0.9.exe' 100%[=] 17,416,184 397K/s in 44s 15:27:40 (386 KB/s) - `Eudora_7.1.0.9.exe' saved [17416184/17416184] ls -l Eudora_7.1.0.9.exe -rw-r- 1 a0045 RRZK 17416184 05.10.2006 20:45 Eudora_7.1.0.9.exe This seems also to use the only available source for the timestamp, the response to the GET request. Best regards, J.Roderburg
Re: Myriad merges
And now, finally, the ultimate real-life test with proxy-cache, timestamping and contentdisposition, where HEAD and GET have different timestamps. And this is perfectly correct now ! So it looks now to me, that the new error (local timestamp not set to remote) only occurs in the cases when no HEAD is used. Best regards, J.Roderburg HEAD -p http://wwwcache.uni-koeln.de:8080 http://download.lavasoft.com/public/core.zip 200 OK Date: Thu, 30 Aug 2007 09:31:35 GMT Accept-Ranges: bytes Age: 361684 ETag: 3014d-233e3c-cbcb000 Server: Apache/2.0.55 (Ubuntu) mod_ssl/2.0.55 OpenSSL/0.9.8a Content-Length: 2309692 Content-Type: application/zip Last-Modified: Mon, 27 Aug 2007 13:08:16 GMT Client-Date: Mon, 03 Sep 2007 13:59:39 GMT Client-Response-Num: 1 Proxy-Connection: close X-Cache: HIT from wwwcache.uni-koeln.de HEAD http://download.lavasoft.com/public/core.zip 200 OK Connection: close Date: Mon, 03 Sep 2007 14:00:48 GMT Accept-Ranges: bytes ETag: 3016f-275cfc-f35fc640 Server: Apache/2.0.55 (Ubuntu) mod_ssl/2.0.55 OpenSSL/0.9.8a Content-Length: 2579708 Content-Type: application/zip Last-Modified: Mon, 03 Sep 2007 08:28:01 GMT Client-Date: Mon, 03 Sep 2007 14:00:48 GMT Client-Response-Num: 1 wget.111-svn-0709 --debug http://download.lavasoft.com/public/core.zip DEBUG output created by Wget 1.10+devel on linux-gnu. --16:04:16-- http://download.lavasoft.com/public/core.zip Resolving wwwcache.uni-koeln.de... 134.95.19.61 Caching wwwcache.uni-koeln.de = 134.95.19.61 Connecting to wwwcache.uni-koeln.de|134.95.19.61|:8080... connected. Created socket 3. Releasing 0x080889b8 (new refcount 1). ---request begin--- HEAD http://download.lavasoft.com/public/core.zip HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: download.lavasoft.com ---request end--- Proxy request sent, awaiting response... ---response begin--- HTTP/1.0 200 OK Date: Thu, 30 Aug 2007 09:31:35 GMT Server: Apache/2.0.55 (Ubuntu) mod_ssl/2.0.55 OpenSSL/0.9.8a Last-Modified: Mon, 27 Aug 2007 13:08:16 GMT ETag: 3014d-233e3c-cbcb000 Accept-Ranges: bytes Content-Length: 2309692 Content-Type: application/zip Age: 361961 X-Cache: HIT from wwwcache.uni-koeln.de Proxy-Connection: close ---response end--- 200 OK Length: 2309692 (2.2M) [application/zip] Closed fd 3 --16:04:16-- http://download.lavasoft.com/public/core.zip Found wwwcache.uni-koeln.de in host_name_addresses_map (0x80889b8) Connecting to wwwcache.uni-koeln.de|134.95.19.61|:8080... connected. Created socket 3. Releasing 0x080889b8 (new refcount 1). ---request begin--- GET http://download.lavasoft.com/public/core.zip HTTP/1.0 User-Agent: Wget/1.10+devel Accept: */* Host: download.lavasoft.com ---request end--- Proxy request sent, awaiting response... ---response begin--- HTTP/1.0 200 OK Date: Mon, 03 Sep 2007 14:04:16 GMT Server: Apache/2.0.55 (Ubuntu) mod_ssl/2.0.55 OpenSSL/0.9.8a Last-Modified: Mon, 03 Sep 2007 08:28:01 GMT ETag: 2370002-275cfc-f35fc640 Accept-Ranges: bytes Content-Length: 2579708 Content-Type: application/zip X-Cache: MISS from wwwcache.uni-koeln.de Proxy-Connection: close ---response end--- 200 OK Length: 2579708 (2.5M) [application/zip] Saving to: `core.zip' 100%[=] 2,579,708155K/s in 15s Closed fd 3 16:04:31 (169 KB/s) - `core.zip' saved [2579708/2579708] ls -l core.zip -rw-r- 1 a0045 RRZK 2579708 03.09.2007 10:28 core.zip
Re: Myriad merges
Zitat von Micah Cowan [EMAIL PROTECTED]: I've just merged a bunch of things into the current trunk, including Mauro's latest changes related to when HEAD is sent (concerning which he recently sent an email). Please feel free to beat on it, and report any bugs here! Ah, finally something to test again ;-) The ChangeLogs look interesting, all the issues I had seem to be repaired. I did only a few first tests now, because the basic test already had a problem: With default options the local timestamps are not set at all. I still made one series of tests regarding the HEAD/GET logic. Options: no spider, no -O, no content-disposition: no timestamping, no local file no HEAD but: local timestamp not set to remote no timestamping,local file no HEAD but: local timestamp not set to remote timestamping, no local file no HEAD but: local timestamp not set to remote timestamping,local file HEAD local timestamp set to remote In these cases the HEAD is now used again only for the case where it is necessary, but the timestamp .. One could think that it is now taken only from the HEAD and not from GET. I'll see what happens in the case where the two are different, this can not easily be constructed, must way till such a case just comes along ;-) Best Regards, Jochen Roderburg ZAIK/RRZK University of Cologne Robert-Koch-Str. 10Tel.: +49-221/478-7024 D-50931 Koeln E-Mail: [EMAIL PROTECTED] Germany
Re: Myriad merges
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Jochen Roderburg wrote: I did only a few first tests now, because the basic test already had a problem: With default options the local timestamps are not set at all. Options: no spider, no -O, no content-disposition: no timestamping, no local file no HEAD but: local timestamp not set to remote no timestamping,local file no HEAD but: local timestamp not set to remote timestamping, no local file no HEAD but: local timestamp not set to remote timestamping,local file HEAD local timestamp set to remote In these cases the HEAD is now used again only for the case where it is necessary, but the timestamp .. One could think that it is now taken only from the HEAD and not from GET. Hm, that should not be. It should definitely set the timestamp if it gets downloaded... I'll investigate. OOC, was there a specific resource you tested against (just in case I have difficulty reproducing)? - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG2xIY7M8hyUobTrERCAQcAJ9ezRd9v+DwE1JuUCVl4bZMswr/NgCdEOTR D2Q9WwL+8HOzWxIizClXTvk= =zNXl -END PGP SIGNATURE-