Its an extremely bad idea to use regex for HTML. You want to change one tiny
little thing and you have to write the regex all over again. if its a
throwaway script, then go ahead.
2010/3/20 Luis M. González <luis...@gmail.com>

> On Mar 20, 12:04 am, Jimbo <nill...@yahoo.com> wrote:
> > Hello
> >
> > I am trying to grab some numbers from a string containing HTML text.
> > Can you suggest any good functions that I could use to do this? What
> > would be the easiest way to extract the following numbers from this
> > string...
> >
> > My String has this layout & I have commented what I want to grab:
> > [CODE] """</th>
> >                                 <td class="last">43.200 </td>
> >                                 <td class="change indicator" nowrap>0.040
> </td>
> >
> >                                                    <td>43.150 </td> #
> > I need to grab this number only
> >                                 <td>43.200 </td>
> >                                                    <td>43.130 </td> #
> > I need to grab this number only
> >                                 <td>43.290 </td>
>                 <td>43.100 </td> # I need to
> > grab this number only
> >                                 <td>7,450,447 </td>
> >                                 <td class="middle"><a
> >
> href="/asx/markets/optionPrices.do?
> > by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
> > a></td>
> >                                 <td class="middle"><a
> >
> href="/asx/markets/warrantPrices.do?
> > by=underlyingAsxCode&underlyingCode=BHP">Warrants & Structured
> > Products</a></td>
> >                                 <td class="middle"><a
> >                                         href="/asx/markets/cfdPrices.do?
> > by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
> >                                 <td class="middle"><a href="
> http://hfgapps.hubb.com/asxtools/
> > Charts.aspx?
> >
> TimeFrame=D6&compare=comp_index&indicies=XJO&pma1=20&pma2=20&asxCode=BHP"><
> img
> > src="/images/chart.gif" border="0" height="15" width="15"></a>
> > </td>
> >                                 <td><a
> href="/research/announcements/status_notes.htm#XD">XD</a>
> >                                 </td>
> >                                 <td><a
> href="/asx/statistics/announcements.do?
> > by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
> > </td>
> >                         </tr>"""[/CODE]
>
>
> You should use BeautifulSoup or perhaps regular expressions.
> Or if you are not very smart, lik me, just try a brute force approach:
>
> >>> for i in s.split('>'):
>        for e in i.split():
>                if '.' in e and e[0].isdigit():
>                        print (e)
>
>
> 43.200
> 0.040
> 43.150
> 43.200
> 43.130
> 43.290
> 43.100
> >>>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to