[Tutor] BeautifulSoup confusion

Steve Lyskawa Thu, 09 Apr 2009 16:27:56 -0700

I am not a programmer by trade but I've been using Python for 10+ years,
usually for text file conversion and protocol analysis.  I'm having a
problem with Beautiful Soup.  I can get it to scrape off all the href links
on a web page but I am having problems selecting specific URI's from the
output supplied by Beautiful Soup.
What exactly is it returning to me and what command would I use to find that
out?  Do I have to take each line it give me and put it into a list before I
can, for example, get only certain URI's containing a certain string or use
the results to get the web page that the URI is referring to?


The pseudo code for what I am trying to do:

Get all URI's from web page that contain string "env.html"
Open the web page it is referring to.
Scrape selected information off of that page.

I'm have problem with step #1.  I can get all URI's but I can't see to get
re.compile to work right.  If I could get it to give me the URI only without
tags or link description, that would be ideal.

Thanks for your help.

Steve Lyskawa

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] BeautifulSoup confusion

Reply via email to