elca wrote:

yes i want to extract this text 'CNN Shop' and linked page
'http://www.turnerstoreonline.com'.

Well then.
First, we'll get the page using urrlib2:

    doc=urllib2.urlopen("http://www.cnn.com";)

Then we'll feed it into the HTML parser:

    soup=BeautifulSoup(doc)

Next, we'll look at all the links in the page:

    for a in soup.findAll("a"):

and when a link has the text 'CNN Shop', we have a hit,
and print the URL:

        if a.renderContents()=="CNN Shop":
            print a["href"]


The complete program is thus:

import urllib2
from BeautifulSoup import BeautifulSoup

doc=urllib2.urlopen("http://www.cnn.com";)
soup=BeautifulSoup(doc)
for a in soup.findAll("a"):
    if a.renderContents()=="CNN Shop":
        print a["href"]


The example above can be condensed because BeautifulSoup's find function can also look for texts:

    print soup.find("a",text="CNN Shop")

and since that's a navigable string, we can ascend to its parent and display the href attribute:

    print soup.find("a",text="CNN Shop").findParent()["href"]

So eventually the whole program could be collapsed into one line:

print BeautifulSoup(urllib2.urlopen("http://www.cnn.com";)).find("a",text="CNN Shop").findParent()["href"]

...but I think this is very ugly!


> im very sorry my english.

You English is quite understandable. The hard part is figuring out what exactly you wanted to achieve ;-)

I have a question too. Why did you think JavaScript was necessary to arrive at this result?

Greetings,
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to