Hi all!

> There is no easy way to punish the culprit.  The only thing you can do
> in the long run is refuse to interoperate with something that openly
> breaks applicable standards.  Otherwise you're not only rewarding the
> culprit, but destroying all the other tools because they will sooner
> or later collapse under the weight of kludges needed to support the
> broken HTML.

I can't argue with that. 
However, from the _user's_ point of view, _wget_ would seem to be broken, 
as the user's webbrowser probably shows everything correctly.
If it is decided that wget does not consider links with LF/CR 
in them, then IMHO, the user should get informed what happened. 

> > But if (for whatever reasons) an option is unavoidable, I would 
> > suggest something like
> > --relax_html_rules #integer
> > where integer is a bit-code (I hope that's the right term).
> 
> This is not what GNU options usually look like and how they work
> (underscores in option name, bitfields).  
underscores: Sorry, I just gave an example, I'm not a GNUer ;)
bitfields: Ok. Any (short) reason for that? Is it consider 
as not transparent or as ugly?

> But more importantly, I
> really don't think this kind of option is appropriate.  Wget should
> either detect the brokenness and handle it automatically, or refuse to
> acknowledge it altogether.  The worst thing to do is require the user
> to investigate why the HTML didn't parse, only to discover that Wget
> in fact had the ability to process it, but didn't bother to do so by
> default.

Hm, well, I can see your point there. 
But I think wget is already breaking this rule with the 
implementation of comment-parsing, or am I mistaken?
We could make "full relaxation" the default and 
use the inverted option --stricthtmlrules, to exclude certain 
relaxations. This is probably more "automatic downloading"ish.

CU
Jens






-- 
GMX Weihnachts-Special: Seychellen-Traumreise zu gewinnen!

Rentier entlaufen. Finden Sie Rudolph! Als Belohnung winken
tolle Preise. http://www.gmx.net/de/cgi/specialmail/

+++ GMX - die erste Adresse f�r Mail, Message, More! +++

Reply via email to