Hi all! > There is no easy way to punish the culprit. The only thing you can do > in the long run is refuse to interoperate with something that openly > breaks applicable standards. Otherwise you're not only rewarding the > culprit, but destroying all the other tools because they will sooner > or later collapse under the weight of kludges needed to support the > broken HTML.
I can't argue with that. However, from the _user's_ point of view, _wget_ would seem to be broken, as the user's webbrowser probably shows everything correctly. If it is decided that wget does not consider links with LF/CR in them, then IMHO, the user should get informed what happened. > > But if (for whatever reasons) an option is unavoidable, I would > > suggest something like > > --relax_html_rules #integer > > where integer is a bit-code (I hope that's the right term). > > This is not what GNU options usually look like and how they work > (underscores in option name, bitfields). underscores: Sorry, I just gave an example, I'm not a GNUer ;) bitfields: Ok. Any (short) reason for that? Is it consider as not transparent or as ugly? > But more importantly, I > really don't think this kind of option is appropriate. Wget should > either detect the brokenness and handle it automatically, or refuse to > acknowledge it altogether. The worst thing to do is require the user > to investigate why the HTML didn't parse, only to discover that Wget > in fact had the ability to process it, but didn't bother to do so by > default. Hm, well, I can see your point there. But I think wget is already breaking this rule with the implementation of comment-parsing, or am I mistaken? We could make "full relaxation" the default and use the inverted option --stricthtmlrules, to exclude certain relaxations. This is probably more "automatic downloading"ish. CU Jens -- GMX Weihnachts-Special: Seychellen-Traumreise zu gewinnen! Rentier entlaufen. Finden Sie Rudolph! Als Belohnung winken tolle Preise. http://www.gmx.net/de/cgi/specialmail/ +++ GMX - die erste Adresse f�r Mail, Message, More! +++
