On Fri, 29 Jul 2011 13:34:13 -0700 Brett Cannon <br...@python.org> wrote: > On Fri, Jul 29, 2011 at 13:16, Glyph Lefkowitz <gl...@twistedmatrix.com>wrote: > > > On Jul 29, 2011, at 3:00 PM, Matt wrote: > > > > I don't see any real reason to drop a decent piece of code (HTMLParser, > > that is) in favor of a third party library when only relatively minor > > updates are needed to bring it up to speed with the latest spec. > > > > > > I am not really one to throw stones here, as Twisted contains a lenient > > pseudo-XML parser which I still maintain - one which decidedly does *not* > > agree > > with html5's requirements for dealing with invalid data, but just a bunch of > > ad-hoc guesses of my own. > > > > My impression of HTML5 is that HTMLParser would require significant > > modifications and possibly a drastic re-architecture in order to really do > > HTML5 "right"; especially the parts that the html5lib authors claim makes > > HTML5 streaming-unfriendly, i.e. subtree reordering when encountering > > certain types of invalid data. > > > > We could also have the code live side-by-side for a while (or indefinitely > if that was really desired) by bringing html5lib in as either a separate > module or having the relevant classes live in htmllib under different names.
Unless html5lib is better in some fundamental ways which are difficult to fix in htmllib, I'm not sure there's any point in adding it to the stdlib. We don't really do users a service if we keep adding alternative APIs for common functionality. Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com