At 2:56 PM +0000 3/4/09, Chris Withers wrote: >Vaibhav Mallya wrote: >> We do have HTMLParser, but that doesn't handle malformed pages well, and >> just isn't as nice as BeautifulSoup. > >Interesting, given that BeautifulSoup is built on HTMLParser ;-)
In BeautifulSoup >= 3.1, yes. Before that (<= 3.07a), it was based on the more robust sgmllib.SGMLParser. The current BeautifulSoup can't handle '<foo a="bc"b="cd">', while the earlier SGMLParser versions can. I don't know quite how common that missing space is in the wild, but I've personally made HTML with that problem. Maybe this is the only problem with using HTMLParser instead of SGMLParser; I don't know. In the mean time, if I have a need for BeautifulSoup in Python3.x, I'll port sgmllib and use the older BeautifulSoup. -- ____________________________________________________________________ TonyN.:' <mailto:tonynel...@georgeanelson.com> ' <http://www.georgeanelson.com/> _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com