Kenneth McDonald wrote:
The problem I'm having with HTMLParser is simple; I don't seem to be
getting the actual text in the HTML document. I've implemented the
do_data method of HTMLParser.HTMLParser in my HTMLParser subclass, but
it never seems to receive any data. Is there another way to
Fredrik Lundh wrote:
the only difference between the libs (*) is that HTMLParser is a bit
stricter
*) the libs referring to htmllib and HTMLParser, not htmllib and sgmllib.
/F
--
http://mail.python.org/mailman/listinfo/python-list
I'm writing a program that will parse HTML and (mostly) convert it to
MediaWiki format. The two Python modules I'm aware of to do this are
HTMLParser and htmllib. However, I'm currently experiencing either real
or conceptual difficulty with both, and was wondering if I could get
some advice.
from HTMLParser import HTMLParser
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.TokenList = []
def handle_data( self,data):
data = data.strip()
if data and len(data) 0:
self.TokenList.append(data)