Re: [Python-Dev] cpython: #15114: the strict mode of HTMLParser and the HTMLParseError exception are

2012-06-23 Thread Antoine Pitrou
On Sat, 23 Jun 2012 15:28:00 +0200
ezio.melotti python-check...@python.org wrote:
  
 +   .. deprecated-removed:: 3.3 3.5
 +  The *strict* argument and the strict mode have been deprecated.
 +  The parser is now able to accept and parse invalid markup too.
 +

What if people want to accept only valid markup?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: #15114: the strict mode of HTMLParser and the HTMLParseError exception are

2012-06-23 Thread Ezio Melotti
On Sat, Jun 23, 2012 at 3:29 PM, Antoine Pitrou solip...@pitrou.net wrote:
 On Sat, 23 Jun 2012 15:28:00 +0200
 ezio.melotti python-check...@python.org wrote:

 +   .. deprecated-removed:: 3.3 3.5
 +      The *strict* argument and the strict mode have been deprecated.
 +      The parser is now able to accept and parse invalid markup too.
 +

 What if people want to accept only valid markup?


The problem with the strict mode is that is not really strict.
Originally the parser was trying to work around some common errors
(e.g. missing quotes around attribute values), but was giving up when
other markup errors were encountered.  When the non-strict mode was
introduced, the old behavior was called strict and left unchanged
for backward compatibility, even thought it wasn't strict enough to be
used for validation and it was happy to parse some broken markup (but
not other).  At the same time the non-strict mode was able to accept
some markup errors but not others, and sometimes parsing valid markup
yielded different results in strict and non-strict modes.

Then HTML5 was announced, with specific algorithms to parse both valid
and invalid markup, so I improved the non-strict mode to 1) be able to
parse everything; 2) try to be as close as the HTML5 standard as
possible (I don't claim HTML5 conformance though).  Now parsing a
valid HTML page should give the same result in strict and non-strict
mode, so the strict mode is now only useful if you want
HTMLParseErrors for an arbitrary subset of markup errors.

As someone already suggested, I should write a blog post explaining
all this, but I'm still working on ironing out the last things in the
code, so the blog post has yet to reach the top of my todo list.

Best Regards,
Ezio Melotti


 Regards

 Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com