Ignore my first posting. Here's what I'm trying to do. I want to extract headlines from a newspaper's website using this code. It works, but I want to match the second group in <h2><a href="(.*)">(.*)</p> and print that out. Sugguestions
import urllib, re
pattern = re.compile("""<h2><a
href="(.*)">(.*)</p>""", re.DOTALL)
page =
urllib.urlopen("http://www.startribune.com").read() for headline in pattern.findall(page):
print headline
I think you want for headline, body in pattern.findall(page): print body
pattern.findall() returns a list of tuples of groups. You have two groups in your regex so in your code headline is being assigned to a tuple with two items. In my code the tuple is split and you can print just the second item.
PS You might want to look at BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
Kent
__________________________________ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor