Re: [Python-Dev] Fixing the XML batteries

Stefan Behnel Fri, 09 Dec 2011 23:40:48 -0800

Bill Janssen, 09.12.2011 19:15:

I think another thing that might go into "refreshing the batteries" is a
feature comparison of BeautifulSoup and HTML5lib against the stdlib
competition, to see what needs to be added/revised.  Having to switch to
an outside package for parsing possibly invalid HTML is a pain.


Such a feature request should be worth a separate thread.

Note, however, that html5lib is likely way too big to add it to the stdlib,and that BeautifulSoup lacks a parser for non-conforming HTML in Python 3,which would be the target release series for better HTML support. So,whatever library or API you would want to use for HTML processing iscurrently only the second question as long as Py3 lacks a real-world HTMLparser in the stdlib, as well as a robust character detection mechanism. Idon't think that can be fixed all that easily.


Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fixing the XML batteries

Reply via email to