[issue14538] HTMLParser: parsing error

R. David Murray Thu, 12 Apr 2012 08:26:51 -0700

R. David Murray <rdmur...@bitdance.com> added the comment:

Yes, after considerable discussion those of working on this stuff decided that 
the goal should be that the parser be able to complete parsing, without error, 
anything the typical browsers can parse (which means, pretty much anything, 
though that says nothing about whether the result of the parse is useful in any 
way).  In other words, we've been treating it as a bug when the parser throws 
an error, since one generally uses the library to parse web pages from the 
internet and having the parse fail leaves you SOL for doing anything useful 
with the bad pages one gets therefrom.  (Note that if the parser was doing 
strict adherence to the older RFCs our decision would have been different...but 
it is not.  It has always accepted *some* badly formed documents, and rejected 
others.)


Also note that BeautifulSoup in Python2 used the sgml parser, which didn't 
throw errors, but that is gone in Python3.  In Python3 BeautifulSoup uses the 
html parser...which is what started us down this road to begin with.

----------
nosy: +r.david.murray

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14538>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14538] HTMLParser: parsing error

Reply via email to