Re: Wget exit codes

2007-12-10 Thread Gerard
 On December 09, 2007 at 07:03PM Stuart Moore wrote:

 Could the exit code used be determined by a flag? E.g. by default it
 uses unix convention, 0 for any success; with an
 --extended_error_codes flag or similar then it uses extra error codes
 depending on the type of success (but for sanity uses the same codes
 for failure with or without the flag)
 
 That should allow both of you to use it for scripting.

Curl has a -w option that takes several different variables. I presently use
the 'http_code' and 'size_download' variables to gather information I need to
determine what exactly transpired. They are also great for debugging a script;
however, that is another matter. Presently, I direct this info to a file and
then grep the file to gather the information I need. It works faster than
doing multiple file comparisons, especially when the files are compressed.

If wget were to implement something along this line, possibly even setting a
specific variable indicating whether or not a file was downloaded, it would
make scripting a lot easier and less prone to breakage.

Curl presently has one of the best exit code implementations available.
Studying their model would seem like a worth while venture.


-- 
Gerard


Re: Wget exit codes

2007-12-10 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Gerard wrote:
 On December 09, 2007 at 07:03PM Stuart Moore wrote:
 
 Could the exit code used be determined by a flag? E.g. by default it
 uses unix convention, 0 for any success; with an
 --extended_error_codes flag or similar then it uses extra error codes
 depending on the type of success (but for sanity uses the same codes
 for failure with or without the flag)

 That should allow both of you to use it for scripting.
 
 Curl has a -w option that takes several different variables. I presently use
 the 'http_code' and 'size_download' variables to gather information I need to
 determine what exactly transpired. They are also great for debugging a script;
 however, that is another matter. Presently, I direct this info to a file and
 then grep the file to gather the information I need. It works faster than
 doing multiple file comparisons, especially when the files are compressed.
 
 If wget were to implement something along this line, possibly even setting a
 specific variable indicating whether or not a file was downloaded, it would
 make scripting a lot easier and less prone to breakage.

This is the sort of thing I think I was talking about when I referred to
a mapping file. I was also thinking it might not hurt to have a minor
tool to aid in reading information back out from such a file, to avoid
having everyone reinvent the wheel scores of times to do the same
thing, if nothing else.

The concept also ties in rather well with the Wget 2.0 concept of
having a metadatabase, containing information about mappings between
original URLs and local filenames, MIME types, portions downloaded, etc.

And, I still like this idea better than having an option to switch
between exit status modes.

 Curl presently has one of the best exit code implementations available.
 Studying their model would seem like a worth while venture.

Will do.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHXTZl7M8hyUobTrERAq3mAJ0Q9k2s8VGitMjrub6t1kpYyWwtVACgjChR
NCZpEEj4k0uNJPwee0dGvzY=
=nTih
-END PGP SIGNATURE-


Informar al remitente

2007-12-10 Thread PCRBUE01/PCRSA

Información de incidente:-

Base de datos:   e:/lotus/domino/data/mail2.box
Originador:  wget@sunsite.dk
Destinatarios:   [EMAIL PROTECTED]
Asunto:  Re: Protected Mail Delivery
Fecha/Hora:  04/12/2007 09:36:22

El archivo adjunto msg.zip que envió a los destinatarios que figuran más
arriba fueron infectados con el virus W32/[EMAIL PROTECTED] y se eliminó.



Re: Content disposition question

2007-12-10 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 Actually, the reason it is not enabled by default is that (1) it is
 broken in some respects that need addressing, and (2) as it is currently
 implemented, it involves a significant amount of extra traffic,
 regardless of whether the remote end actually ends up using
 Content-Disposition somewhere.

I'm curious, why is this the case?  I thought the code was refactored
to determine the file name after the headers arrive.  It certainly
looks that way by the output it prints:

{mulj}[~]$ wget www.cnn.com
[...]
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html'   # not saving to only after the HTTP response

Where does the extra traffic come from?

 Note that it is not available at all in any release version of Wget;
 only in the current development versions. We will be releasing Wget 1.11
 very shortly, which will include the --content-disposition
 functionality; however, this functionality is EXPERIMENTAL only. It
 doesn't quite behave properly, and needs some severe adjustments before
 it is appropriate to leave as default.

If it is not ready for general use, we should consider removing it
from NEWS.  If not, it should be properly documented in the manual.  I
am aware that the NEWS entry claims that the feature is experimental,
but why even mention it if it's not ready for general consumption?
Announcing experimental features in NEWS is a good way to make testers
aware of them during the alpha/beta release cycle, but it should be
avoid in production releases of mature software.

 As to breaking old scripts, I'm not really concerned about that (and
 people who read the NEWS file, as anyone relying on previous
 behaviors for Wget should do, would just need to set
 --no-content-disposition, when the time comes that we enable it by
 default).

Agreed.


NEWS file

2007-12-10 Thread Hrvoje Niksic
I've noticed that the NEWS file now includes contents that would
previously not have been included.  NEWS was conceived as a resource
for end users, not for developers or distribution maintainers.  (Other
GNU software seems to follow a similar policy.)  I tried hard to keep
it readable by only including important or at least relevant entries,
sorted roughly by descending importance.  Developer information can be
obtained through other means: the web page, the version control logs,
and the detailed ChangeLogs we keep.

The recent entries were added to the front, without regard for
relative importance.  For example, NEWS now begins with announcement
of the move to Mercurial, the new Autoconf 2.61 requirement, and the
removal of PATCH and TODO files (!).  These entries are relevant
to developers, but almost completely meaningless to end users.

If there is a need to include developer information in NEWS, I suggest
that it be pushed to the bottom of the list, perhaps under a
Development information section.


GnuTLS

2007-12-10 Thread Hrvoje Niksic
If GnuTLS support will not be ready for the 1.11 release, may I
suggest that we not advertise it in NEWS?  After all, it's badly
broken in that it doesn't support certificate validation, which is one
of the most important features of an SSL client.  It also doesn't
support many of our SSL command-line options, which makes Wget almost
broken, https-wise, under GnuTLS.  IMO announcing such unfinished work
brings more harm than good in a stable release.


Re: Content disposition question

2007-12-10 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hrvoje Niksic wrote:
 Micah Cowan [EMAIL PROTECTED] writes:
 
 Actually, the reason it is not enabled by default is that (1) it is
 broken in some respects that need addressing, and (2) as it is currently
 implemented, it involves a significant amount of extra traffic,
 regardless of whether the remote end actually ends up using
 Content-Disposition somewhere.
 
 I'm curious, why is this the case?  I thought the code was refactored
 to determine the file name after the headers arrive.  It certainly
 looks that way by the output it prints:
 
 {mulj}[~]$ wget www.cnn.com
 [...]
 HTTP request sent, awaiting response... 200 OK
 Length: unspecified [text/html]
 Saving to: `index.html'   # not saving to only after the HTTP response
 
 Where does the extra traffic come from?

Your example above doesn't set --content-disposition; if you do, there
is an extra HEAD request sent.

As to why this is the case, I believe it was so that we could properly
handle accepts/rejects, whereas we will otherwise usually assume that we
can match accept/reject against the URL itself (we currently do this
improperly for the -nd -r case, still matching using the generated
file name's suffix).

Beyond that, I'm not sure as to why, and it's my intention that it not
be done in 1.12. Removing it for 1.11 is too much trouble, as the
sending-HEAD and sending-GET is not nearly decoupled enough to do it
without risk (and indeed, we were seeing trouble where everytime we
fixed an issue with the send-head-first issue, something else would
break). I want to do some reworking of gethttp and http_loop before I
will feel comfortable in changing how they work.

 If it is not ready for general use, we should consider removing it
 from NEWS.

I had thought of that. The thing that has kept me from it so far is that
 it is a feature that is desired by many people, and for most of them,
it will work (the issues are pretty minor, and mainly corner-case,
except perhaps for the fact that they are apparently always downloaded
to the top directory, and not the one in which the URL was found).

And, if we leave it out of NEWS and documentation, then, when we answer
people who ask How can I get Wget to respect Content-Disposition
headers?, the natural follow-up will be, Why isn't this mentioned
anywhere in the documentation?. :)

 If not, it should be properly documented in the manual.

Yes... I should be more specific about its shortcomings.

 I am aware that the NEWS entry claims that the feature is experimental,
 but why even mention it if it's not ready for general consumption?
 Announcing experimental features in NEWS is a good way to make testers
 aware of them during the alpha/beta release cycle, but it should be
 avoid in production releases of mature software.

It's pretty much good enough; it's not where I want it, but it _is_
usable. The extra traffic is really the main reason I don't want it
on-by-default.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHXUFY7M8hyUobTrERAkGPAJwLTDHPqdfP3kIN7Zfxmh8RmjbdMACaA6yG
bkKcZfTt0lGpbU79y+AYXF8=
=ZHEv
-END PGP SIGNATURE-


Re: NEWS file

2007-12-10 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hrvoje Niksic wrote:
 I've noticed that the NEWS file now includes contents that would
 previously not have been included.  NEWS was conceived as a resource
 for end users, not for developers or distribution maintainers.  (Other
 GNU software seems to follow a similar policy.)  I tried hard to keep
 it readable by only including important or at least relevant entries,
 sorted roughly by descending importance.  Developer information can be
 obtained through other means: the web page, the version control logs,
 and the detailed ChangeLogs we keep.
 
 The recent entries were added to the front, without regard for
 relative importance.  For example, NEWS now begins with announcement
 of the move to Mercurial, the new Autoconf 2.61 requirement, and the
 removal of PATCH and TODO files (!).  These entries are relevant
 to developers, but almost completely meaningless to end users.

Very good point. I agree wrt PATCH, TODO and Autoconf. As to Mercurial,
it replaces the already-present entry regarding our move to Subversion,
and I believe it is of interest to non-developers, who want to know
where the latest source is. Certainly to the casual developer, who
will want to know where to find the code to submit patches against (I
want to try to discourage submission of patches against just the latest
release, as I tend to get).

 If there is a need to include developer information in NEWS, I suggest
 that it be pushed to the bottom of the list, perhaps under a
 Development information section.

Developer information, for the most part, should probably go in the
Wgiki, when appropriate.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHXUKj7M8hyUobTrERAnXnAJwLm99wHmGoWp4sibero2sUDXhsmgCfbIdY
8Dv+CyXH1p4moGbD5HUaU2w=
=HTWe
-END PGP SIGNATURE-


Re: Content disposition question

2007-12-10 Thread Hrvoje Niksic
Micah Cowan [EMAIL PROTECTED] writes:

 I thought the code was refactored to determine the file name after
 the headers arrive.  It certainly looks that way by the output it
 prints:
 
 {mulj}[~]$ wget www.cnn.com
 [...]
 HTTP request sent, awaiting response... 200 OK
 Length: unspecified [text/html]
 Saving to: `index.html'   # not saving to only after the HTTP response
 
 Where does the extra traffic come from?

 Your example above doesn't set --content-disposition;

I'm aware of that, but the above example was supposed to point out the
refactoring that has already taken place, regardless of whether
--content-disposition is specified.  As shown above, Wget always waits
for the headers before determining the file name.  If that is the
case, it would appear that no additional traffic is needed to get
Content-Disposition, Wget simply needs to use the information already
received.

 As to why this is the case, I believe it was so that we could
 properly handle accepts/rejects,

Issuing another request seems to be the wrong way to go about it, but
I haven't thought about it hard enough, so I could be missing a lot of
subtleties.

 I am aware that the NEWS entry claims that the feature is experimental,
 but why even mention it if it's not ready for general consumption?
 Announcing experimental features in NEWS is a good way to make testers
 aware of them during the alpha/beta release cycle, but it should be
 avoid in production releases of mature software.

 It's pretty much good enough; it's not where I want it, but it
 _is_ usable. The extra traffic is really the main reason I don't
 want it on-by-default.

It should IMHO be documented, then.  Even if it's documented as
experimental.