elca wrote:
yes i want to extract this text 'CNN Shop' and linked page
'http://www.turnerstoreonline.com'.
Well then.
First, we'll get the page using urrlib2:
doc=urllib2.urlopen("http://www.cnn.com")
Then we'll feed it into the HTML parser:
soup=BeautifulSoup(doc)
Next, we'll look at all the links in the page:
for a in soup.findAll("a"):
and when a link has the text 'CNN Shop', we have a hit,
and print the URL:
if a.renderContents()=="CNN Shop":
print a["href"]
The complete program is thus:
import urllib2
from BeautifulSoup import BeautifulSoup
doc=urllib2.urlopen("http://www.cnn.com")
soup=BeautifulSoup(doc)
for a in soup.findAll("a"):
if a.renderContents()=="CNN Shop":
print a["href"]
The example above can be condensed because BeautifulSoup's find function
can also look for texts:
print soup.find("a",text="CNN Shop")
and since that's a navigable string, we can ascend to its parent and
display the href attribute:
print soup.find("a",text="CNN Shop").findParent()["href"]
So eventually the whole program could be collapsed into one line:
print
BeautifulSoup(urllib2.urlopen("http://www.cnn.com")).find("a",text="CNN
Shop").findParent()["href"]
...but I think this is very ugly!
> im very sorry my english.
You English is quite understandable. The hard part is figuring out what
exactly you wanted to achieve ;-)
I have a question too. Why did you think JavaScript was necessary to
arrive at this result?
Greetings,
--
http://mail.python.org/mailman/listinfo/python-list