Steven D'Aprano <> added the comment:

The stdlib HTML parser requires correct HTML.

To parse broken HTML, as you find in the real world, you need a third-party 
library like BeautifulSoup. BeautifulSoup is much more complex (about 7-8 times 
as many LOC) but can handle nearly anything a browser can.

I doubt the stdlib will ever compete with BeautifulSoup.

nosy: +steven.daprano

Python tracker <>
Python-bugs-list mailing list

Reply via email to