On Thu, Mar 5, 2009 at 2:39 AM, Stefan Behnel <stefan...@behnel.de> wrote: > Ivan Krstić wrote: >> On Mar 4, 2009, at 12:32 PM, James Y Knight wrote: >>> I think html5lib would be a better candidate for an imrpoved HTML >>> parser in the stdlib than BeautifulSoup. >> >> While we're talking about alternatives, Ian Bicking appears to swear by >> lxml: >> >> <http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/> > > I second that. ;) > > And, BTW, I wouldn't mind getting lxml into the stdlib either.
No matter how beautiful and fast lxml is, it has one downside where it comes to installing it into the stdlib: it is based on large, complex 3rd party libraries, libxml2 and libxslt. Based on the sad example of BerkeleyDB, which was initially welcomed into the stdlib but more recently booted out for reasons having to do with the release cycle of the external dependency and other issues typical for large external dependencies, I think we should be very careful with including it in the standard library. Instead, let's hope Linux distros pick it up (and if anyone knows how to encourage that, let us know). -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com