On Wed, 29 Oct 2008 09:45:31 -0700 (PDT), luca72 <[EMAIL PROTECTED]> wrote: > Hello > I try to use beautifulsoup > i have this: > sito = urllib.urlopen('http://www.prova.com/') > esamino = BeautifulSoup(sito) > luca = esamino.findAll('tr', align='center') > > print luca[0] > [The following long string has been wrapped.] >>><tr align="center"><th width="5%"><a onclick="t('Only|G|BoT|05','#1');" href="#">#1</a></th><td width="10%">44.4MB</td> <td width="90%" align="left"> <font color="orange"> Pc-prova.rar </font></td></tr> > > I need to get the following information: > 1)Only|G|BoT|05 > 2)#1 > 3)44.4MB > 4)Pc-prova.rar > with: print luca[0].a.string i get #1 > with print luca[0].td.string i get 44.4MB > can you explain me how to get the others two value
Like you, I struggle with BeautifulSoup; but perhaps this will help while waiting for somebody smarter to join the thread: >>> soup = BeautifulSoup.BeautifulSoup( ... """<tr align="center"><th width="5%">""" ... """<a onclick="t('Only|G|BoT|05','#1');" href="#">#1</a>""" ... """</th><td width="10%">44.4MB</td><td width="90%" align="left">""" ... """<font color="orange"> Pc-prova.rar </font></td></tr>""" ) >>> tr = soup.findAll( 'tr' ) >>> tr[0].findAll( text = True ) [u'#1', u'44.4MB', u' Pc-prova.rar '] >>> c = tr[0].findChild( attrs={"onclick": True} ) >>> print c[ "onclick" ] t('Only|G|BoT|05','#1'); -- To email me, substitute nowhere->spamcop, invalid->net. -- http://mail.python.org/mailman/listinfo/python-list