h0uk <vardan.pogos...@gmail.com> writes: > On 8 янв, 08:44, Water Lin <water...@ymail.invalid> wrote: >> I am a new guy to use Python, but I want to parse a html page now. I >> tried to use HTMLParse. Here is my sample code: >> ---------------------- >> from HTMLParser import HTMLParser >> from urllib2 import urlopen >> >> class MyParser(HTMLParser): >> title = "" >> is_title = "" >> def __init__(self, url): >> HTMLParser.__init__(self) >> req = urlopen(url) >> self.feed(req.read()) >> >> def handle_starttag(self, tag, attrs): >> if tag == 'div' and attrs[0][1] == 'articleTitle': >> print "Found link => %s" % attrs[0][1] >> self.is_title = 1 >> >> def handle_data(self, data): >> if self.is_title: >> print "here" >> self.title = data >> print self.title >> self.is_title = 0 >> ----------------------- >> >> For the tag >> ------- >> <div class="articleTitle">open article title</div> >> ------- >> >> I use my code to parse it. I can locate the div tag but I don't know how >> to get the text for the tag which is "open article title" in my example. >> >> How can I get the html content? What's wrong in my handle_data function? >> >> Thanks >> >> Water Lin >> >> -- >> Water Lin's notes and pencils:http://en.waterlin.org >> Email: water...@ymail.com > > I want to say your code works well
But in handle_data I can't print self.title. I don't why I can't set the self.title in handle_data. Thanks Water Lin -- Water Lin's notes and pencils: http://en.waterlin.org Email: water...@ymail.com -- http://mail.python.org/mailman/listinfo/python-list