I haven't seen a lot of discussion on this - maybe I didn't search hard enough - but what are people's thoughts on including BeautifulSoup in stdlib? It's small, fast, and pretty widely-liked by the people who know about it. Someone mentioned that web scraping needs are infrequent. My argument is that people ask questions about them less because they feel they can just reinvent the wheel really easily using urllib and regexes. It seems like this is similar to the CSV problem from a while back actually, with everyone implementing their own parsers.

We do have HTMLParser, but that doesn't handle malformed pages well, and just isn't as nice as BeautifulSoup.

In a not-entirely-unrelated vein, has there been any discussion on just throwing all of Mechanize into stdlib?

BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
mechanize: http://wwwsearch.sourceforge.net/mechanize/

Regards,
Vaibhav Mallya
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to