Re: [Python-Dev] HTMLParser and HTML5

Antoine Pitrou Fri, 29 Jul 2011 14:45:38 -0700

On Fri, 29 Jul 2011 13:34:13 -0700
Brett Cannon <[email protected]> wrote:
> On Fri, Jul 29, 2011 at 13:16, Glyph Lefkowitz <[email protected]>wrote:
> 
> > On Jul 29, 2011, at 3:00 PM, Matt wrote:
> >
> > I don't see any real reason to drop a decent piece of code (HTMLParser,
> > that is) in favor of a third party library when only relatively minor
> > updates are needed to bring it up to speed with the latest spec.
> >
> >
> > I am not really one to throw stones here, as Twisted contains a lenient
> > pseudo-XML parser which I still maintain - one which decidedly does *not* 
> > agree
> > with html5's requirements for dealing with invalid data, but just a bunch of
> > ad-hoc guesses of my own.
> >
> > My impression of HTML5 is that HTMLParser would require significant
> > modifications and possibly a drastic re-architecture in order to really do
> > HTML5 "right"; especially the parts that the html5lib authors claim makes
> > HTML5 streaming-unfriendly, i.e. subtree reordering when encountering
> > certain types of invalid data.
> >
> 
> We could also have the code live side-by-side for a while (or indefinitely
> if that was really desired) by bringing html5lib in as either a separate
> module or having the relevant classes live in htmllib under different names.


Unless html5lib is better in some fundamental ways which are difficult
to fix in htmllib, I'm not sure there's any point in adding it to the
stdlib.

We don't really do users a service if we keep adding alternative APIs
for common functionality.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] HTMLParser and HTML5

Reply via email to