[issue32876] HTMLParser raises exception on some inputs

Steven D'Aprano Mon, 19 Feb 2018 15:02:34 -0800

Steven D'Aprano <[email protected]> added the comment:

The stdlib HTML parser requires correct HTML.


To parse broken HTML, as you find in the real world, you need a third-party 
library like BeautifulSoup. BeautifulSoup is much more complex (about 7-8 times 
as many LOC) but can handle nearly anything a browser can.

I doubt the stdlib will ever compete with BeautifulSoup.

----------
nosy: +steven.daprano

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue32876>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue32876] HTMLParser raises exception on some inputs

Reply via email to