Re: [Tutor] Value Error solved. Another question

Kent Johnson Mon, 14 Feb 2005 03:12:12 -0800

Ron Nixon wrote:

Ignore my first posting. Here's what I'm trying to do.
I want to extract headlines from a newspaper's website
using this code. It works, but I want to match the
second group in <h2><a href="(.*)">(.*)</p> and print
that out.
Sugguestions
import urllib, re pattern = re.compile("""<h2><a href="(.*)">(.*)</p>""", re.DOTALL) page = urllib.urlopen("http://www.startribune.com";).read() for headline in pattern.findall(page): print headline


I think you want
for headline, body in pattern.findall(page):
    print body

pattern.findall() returns a list of tuples of groups. You have two groups in your regex so in your code headline is being assigned to a tuple with two items. In my code the tuple is split and you can print just the second item.

PS You might want to look at BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/

Kent

__________________________________ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail _______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor


_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Value Error solved. Another question

Reply via email to