Ron Nixon wrote:
Ignore my first posting. Here's what I'm trying to do.
I want to extract headlines from a newspaper's website
using this code. It works, but I want to match the
second group in <h2><a href="(.*)">(.*)</p> and print
that out.
Sugguestions


import urllib, re
pattern = re.compile("""<h2><a
href="(.*)">(.*)</p>""", re.DOTALL)
page =
urllib.urlopen("http://www.startribune.com";).read() for headline in pattern.findall(page):
print headline

I think you want for headline, body in pattern.findall(page): print body

pattern.findall() returns a list of tuples of groups. You have two groups in your regex so in your code headline is being assigned to a tuple with two items. In my code the tuple is split and you can print just the second item.

PS You might want to look at BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/

Kent





__________________________________ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor



_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor

Reply via email to