Re: Fetching data from a HTML file

Jon Clements Fri, 23 Mar 2012 22:18:23 -0700

On Friday, 23 March 2012 13:52:05 UTC, Sangeet  wrote:
> Hi,
> 
> I've got to fetch data from the snippet below and have been trying to match 
> the digits in this to specifically to specific groups. But I can't seem to 
> figure how to go about stripping the tags! :(
> 
> <tr><td align="center"><b>Sum</b></td><td></td><td align='center' 
> class="green">245</td><td align='center' class="red">11</td><td 
> align='center'>0</td><td align='center' >256</td><td align='center' >1.496 
> [min]</td></tr>
> </table>
> 
> Actually, I'm working on ROBOT Framework, and haven't been able to figure out 
> how to read data from HTML tables. Reading from the source, is the best (read 
> rudimentary) way I could come up with. Any suggestions are welcome!
> 
> Thanks,
> Sangeet


I would personally use lxml - a quick example:

# -*- coding: utf-8 -*-
import lxml.html

text = """
<tr><td align="center"><b>Sum</b></td><td></td><td align='center' 
class="green">245</td><td align='center' class="red">11</td><td 
align='center'>0</td><td align='center' >256</td><td align='center' >1.496 
[min]</td></tr>
</table>
"""

table = lxml.html.fromstring(text)
for tr in table.xpath('//tr'):
    print [ (el.get('class', ''), el.text_content()) for el in 
tr.iterfind('td') ]

[('', 'Sum'), ('', ''), ('green', '245'), ('red', '11'), ('', '0'), ('', 
'256'), ('', '1.496 [min]')]

It does a reasonable job, but if it doesn't work quite right, then there's a 
.fromstring(parser=...) option, and you should be able to pass in ElementSoup 
and try your luck from there. 

hth,

Jon.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Fetching data from a HTML file

Reply via email to