Hello, thanks for your reply. actually what i want to parse website is some different language site. so i was quote some common english website for easy understand. :) by the way, is it possible to use with PAMIE and beautifulsoup work together? Thanks a lot
motoom wrote: > > elca wrote: > >> yes i want to extract this text 'CNN Shop' and linked page >> 'http://www.turnerstoreonline.com'. > > Well then. > First, we'll get the page using urrlib2: > > doc=urllib2.urlopen("http://www.cnn.com") > > Then we'll feed it into the HTML parser: > > soup=BeautifulSoup(doc) > > Next, we'll look at all the links in the page: > > for a in soup.findAll("a"): > > and when a link has the text 'CNN Shop', we have a hit, > and print the URL: > > if a.renderContents()=="CNN Shop": > print a["href"] > > > The complete program is thus: > > import urllib2 > from BeautifulSoup import BeautifulSoup > > doc=urllib2.urlopen("http://www.cnn.com") > soup=BeautifulSoup(doc) > for a in soup.findAll("a"): > if a.renderContents()=="CNN Shop": > print a["href"] > > > The example above can be condensed because BeautifulSoup's find function > can also look for texts: > > print soup.find("a",text="CNN Shop") > > and since that's a navigable string, we can ascend to its parent and > display the href attribute: > > print soup.find("a",text="CNN Shop").findParent()["href"] > > So eventually the whole program could be collapsed into one line: > > print > BeautifulSoup(urllib2.urlopen("http://www.cnn.com")).find("a",text="CNN > Shop").findParent()["href"] > > ...but I think this is very ugly! > > > > im very sorry my english. > > You English is quite understandable. The hard part is figuring out what > exactly you wanted to achieve ;-) > > I have a question too. Why did you think JavaScript was necessary to > arrive at this result? > > Greetings, > -- > http://mail.python.org/mailman/listinfo/python-list > > -- View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26045979.html Sent from the Python - python-list mailing list archive at Nabble.com. -- http://mail.python.org/mailman/listinfo/python-list