Re: Wget exit codes
On December 09, 2007 at 07:03PM Stuart Moore wrote: Could the exit code used be determined by a flag? E.g. by default it uses unix convention, 0 for any success; with an --extended_error_codes flag or similar then it uses extra error codes depending on the type of success (but for sanity uses the same codes for failure with or without the flag) That should allow both of you to use it for scripting. Curl has a -w option that takes several different variables. I presently use the 'http_code' and 'size_download' variables to gather information I need to determine what exactly transpired. They are also great for debugging a script; however, that is another matter. Presently, I direct this info to a file and then grep the file to gather the information I need. It works faster than doing multiple file comparisons, especially when the files are compressed. If wget were to implement something along this line, possibly even setting a specific variable indicating whether or not a file was downloaded, it would make scripting a lot easier and less prone to breakage. Curl presently has one of the best exit code implementations available. Studying their model would seem like a worth while venture. -- Gerard
Re: Wget exit codes
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Gerard wrote: On December 09, 2007 at 07:03PM Stuart Moore wrote: Could the exit code used be determined by a flag? E.g. by default it uses unix convention, 0 for any success; with an --extended_error_codes flag or similar then it uses extra error codes depending on the type of success (but for sanity uses the same codes for failure with or without the flag) That should allow both of you to use it for scripting. Curl has a -w option that takes several different variables. I presently use the 'http_code' and 'size_download' variables to gather information I need to determine what exactly transpired. They are also great for debugging a script; however, that is another matter. Presently, I direct this info to a file and then grep the file to gather the information I need. It works faster than doing multiple file comparisons, especially when the files are compressed. If wget were to implement something along this line, possibly even setting a specific variable indicating whether or not a file was downloaded, it would make scripting a lot easier and less prone to breakage. This is the sort of thing I think I was talking about when I referred to a mapping file. I was also thinking it might not hurt to have a minor tool to aid in reading information back out from such a file, to avoid having everyone reinvent the wheel scores of times to do the same thing, if nothing else. The concept also ties in rather well with the Wget 2.0 concept of having a metadatabase, containing information about mappings between original URLs and local filenames, MIME types, portions downloaded, etc. And, I still like this idea better than having an option to switch between exit status modes. Curl presently has one of the best exit code implementations available. Studying their model would seem like a worth while venture. Will do. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHXTZl7M8hyUobTrERAq3mAJ0Q9k2s8VGitMjrub6t1kpYyWwtVACgjChR NCZpEEj4k0uNJPwee0dGvzY= =nTih -END PGP SIGNATURE-
Informar al remitente
Información de incidente:- Base de datos: e:/lotus/domino/data/mail2.box Originador: wget@sunsite.dk Destinatarios: [EMAIL PROTECTED] Asunto: Re: Protected Mail Delivery Fecha/Hora: 04/12/2007 09:36:22 El archivo adjunto msg.zip que envió a los destinatarios que figuran más arriba fueron infectados con el virus W32/[EMAIL PROTECTED] y se eliminó.
Re: Content disposition question
Micah Cowan [EMAIL PROTECTED] writes: Actually, the reason it is not enabled by default is that (1) it is broken in some respects that need addressing, and (2) as it is currently implemented, it involves a significant amount of extra traffic, regardless of whether the remote end actually ends up using Content-Disposition somewhere. I'm curious, why is this the case? I thought the code was refactored to determine the file name after the headers arrive. It certainly looks that way by the output it prints: {mulj}[~]$ wget www.cnn.com [...] HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `index.html' # not saving to only after the HTTP response Where does the extra traffic come from? Note that it is not available at all in any release version of Wget; only in the current development versions. We will be releasing Wget 1.11 very shortly, which will include the --content-disposition functionality; however, this functionality is EXPERIMENTAL only. It doesn't quite behave properly, and needs some severe adjustments before it is appropriate to leave as default. If it is not ready for general use, we should consider removing it from NEWS. If not, it should be properly documented in the manual. I am aware that the NEWS entry claims that the feature is experimental, but why even mention it if it's not ready for general consumption? Announcing experimental features in NEWS is a good way to make testers aware of them during the alpha/beta release cycle, but it should be avoid in production releases of mature software. As to breaking old scripts, I'm not really concerned about that (and people who read the NEWS file, as anyone relying on previous behaviors for Wget should do, would just need to set --no-content-disposition, when the time comes that we enable it by default). Agreed.
NEWS file
I've noticed that the NEWS file now includes contents that would previously not have been included. NEWS was conceived as a resource for end users, not for developers or distribution maintainers. (Other GNU software seems to follow a similar policy.) I tried hard to keep it readable by only including important or at least relevant entries, sorted roughly by descending importance. Developer information can be obtained through other means: the web page, the version control logs, and the detailed ChangeLogs we keep. The recent entries were added to the front, without regard for relative importance. For example, NEWS now begins with announcement of the move to Mercurial, the new Autoconf 2.61 requirement, and the removal of PATCH and TODO files (!). These entries are relevant to developers, but almost completely meaningless to end users. If there is a need to include developer information in NEWS, I suggest that it be pushed to the bottom of the list, perhaps under a Development information section.
GnuTLS
If GnuTLS support will not be ready for the 1.11 release, may I suggest that we not advertise it in NEWS? After all, it's badly broken in that it doesn't support certificate validation, which is one of the most important features of an SSL client. It also doesn't support many of our SSL command-line options, which makes Wget almost broken, https-wise, under GnuTLS. IMO announcing such unfinished work brings more harm than good in a stable release.
Re: Content disposition question
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hrvoje Niksic wrote: Micah Cowan [EMAIL PROTECTED] writes: Actually, the reason it is not enabled by default is that (1) it is broken in some respects that need addressing, and (2) as it is currently implemented, it involves a significant amount of extra traffic, regardless of whether the remote end actually ends up using Content-Disposition somewhere. I'm curious, why is this the case? I thought the code was refactored to determine the file name after the headers arrive. It certainly looks that way by the output it prints: {mulj}[~]$ wget www.cnn.com [...] HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `index.html' # not saving to only after the HTTP response Where does the extra traffic come from? Your example above doesn't set --content-disposition; if you do, there is an extra HEAD request sent. As to why this is the case, I believe it was so that we could properly handle accepts/rejects, whereas we will otherwise usually assume that we can match accept/reject against the URL itself (we currently do this improperly for the -nd -r case, still matching using the generated file name's suffix). Beyond that, I'm not sure as to why, and it's my intention that it not be done in 1.12. Removing it for 1.11 is too much trouble, as the sending-HEAD and sending-GET is not nearly decoupled enough to do it without risk (and indeed, we were seeing trouble where everytime we fixed an issue with the send-head-first issue, something else would break). I want to do some reworking of gethttp and http_loop before I will feel comfortable in changing how they work. If it is not ready for general use, we should consider removing it from NEWS. I had thought of that. The thing that has kept me from it so far is that it is a feature that is desired by many people, and for most of them, it will work (the issues are pretty minor, and mainly corner-case, except perhaps for the fact that they are apparently always downloaded to the top directory, and not the one in which the URL was found). And, if we leave it out of NEWS and documentation, then, when we answer people who ask How can I get Wget to respect Content-Disposition headers?, the natural follow-up will be, Why isn't this mentioned anywhere in the documentation?. :) If not, it should be properly documented in the manual. Yes... I should be more specific about its shortcomings. I am aware that the NEWS entry claims that the feature is experimental, but why even mention it if it's not ready for general consumption? Announcing experimental features in NEWS is a good way to make testers aware of them during the alpha/beta release cycle, but it should be avoid in production releases of mature software. It's pretty much good enough; it's not where I want it, but it _is_ usable. The extra traffic is really the main reason I don't want it on-by-default. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHXUFY7M8hyUobTrERAkGPAJwLTDHPqdfP3kIN7Zfxmh8RmjbdMACaA6yG bkKcZfTt0lGpbU79y+AYXF8= =ZHEv -END PGP SIGNATURE-
Re: NEWS file
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hrvoje Niksic wrote: I've noticed that the NEWS file now includes contents that would previously not have been included. NEWS was conceived as a resource for end users, not for developers or distribution maintainers. (Other GNU software seems to follow a similar policy.) I tried hard to keep it readable by only including important or at least relevant entries, sorted roughly by descending importance. Developer information can be obtained through other means: the web page, the version control logs, and the detailed ChangeLogs we keep. The recent entries were added to the front, without regard for relative importance. For example, NEWS now begins with announcement of the move to Mercurial, the new Autoconf 2.61 requirement, and the removal of PATCH and TODO files (!). These entries are relevant to developers, but almost completely meaningless to end users. Very good point. I agree wrt PATCH, TODO and Autoconf. As to Mercurial, it replaces the already-present entry regarding our move to Subversion, and I believe it is of interest to non-developers, who want to know where the latest source is. Certainly to the casual developer, who will want to know where to find the code to submit patches against (I want to try to discourage submission of patches against just the latest release, as I tend to get). If there is a need to include developer information in NEWS, I suggest that it be pushed to the bottom of the list, perhaps under a Development information section. Developer information, for the most part, should probably go in the Wgiki, when appropriate. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHXUKj7M8hyUobTrERAnXnAJwLm99wHmGoWp4sibero2sUDXhsmgCfbIdY 8Dv+CyXH1p4moGbD5HUaU2w= =HTWe -END PGP SIGNATURE-
Re: Content disposition question
Micah Cowan [EMAIL PROTECTED] writes: I thought the code was refactored to determine the file name after the headers arrive. It certainly looks that way by the output it prints: {mulj}[~]$ wget www.cnn.com [...] HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `index.html' # not saving to only after the HTTP response Where does the extra traffic come from? Your example above doesn't set --content-disposition; I'm aware of that, but the above example was supposed to point out the refactoring that has already taken place, regardless of whether --content-disposition is specified. As shown above, Wget always waits for the headers before determining the file name. If that is the case, it would appear that no additional traffic is needed to get Content-Disposition, Wget simply needs to use the information already received. As to why this is the case, I believe it was so that we could properly handle accepts/rejects, Issuing another request seems to be the wrong way to go about it, but I haven't thought about it hard enough, so I could be missing a lot of subtleties. I am aware that the NEWS entry claims that the feature is experimental, but why even mention it if it's not ready for general consumption? Announcing experimental features in NEWS is a good way to make testers aware of them during the alpha/beta release cycle, but it should be avoid in production releases of mature software. It's pretty much good enough; it's not where I want it, but it _is_ usable. The extra traffic is really the main reason I don't want it on-by-default. It should IMHO be documented, then. Even if it's documented as experimental.