"Lawrence D'Oliveiro" <[EMAIL PROTECTED]> writes: > I've been using HTMLParser to scrape Web sites. The trouble with this > is, there's a lot of malformed HTML out there. Real browsers have to be > written to cope gracefully with this, but HTMLParser does not. Not only > does it raise an exception, but the parser object then gets into a > confused state after that so you cannot continue using it. [...]
sgmllib.SGMLParser (or htmllib.HTMLParser) is more tolerant than HTMLParser.HTMLParser. BeautifulSoup derives from sgmllib.SGMLParser, and introduces extra robustness, of a sort. John -- http://mail.python.org/mailman/listinfo/python-list