Bill Janssen, 09.12.2011 19:15:
I think another thing that might go into "refreshing the batteries" is a
feature comparison of BeautifulSoup and HTML5lib against the stdlib
competition, to see what needs to be added/revised. Having to switch to
an outside package for parsing possibly invalid HTML is a pain.
Such a feature request should be worth a separate thread.
Note, however, that html5lib is likely way too big to add it to the stdlib,
and that BeautifulSoup lacks a parser for non-conforming HTML in Python 3,
which would be the target release series for better HTML support. So,
whatever library or API you would want to use for HTML processing is
currently only the second question as long as Py3 lacks a real-world HTML
parser in the stdlib, as well as a robust character detection mechanism. I
don't think that can be fixed all that easily.
Stefan
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com