This is very useful. I'm just making new agreggator and this will come
in handy. For scraping purposes.
As I see it, this would be some sort of jquery for HTML in
python. :))))

On 24 maj, 22:25, mdipierro <[email protected]> wrote:
> I liked your suggestion and I used it to make
> gluon.html.web2pyHTMLParser, take a look and let me know what you
> think.
>
> On May 23, 2:20 pm, RobertVa <[email protected]> wrote:
>
> > I did.
>
> > It has xmlescape function, but reverse function (unescape) is not
> > defined.
>
> > On 23 maj, 20:59, Yarko Tymciurak <[email protected]> wrote:
>
> > > Have you looked at the XML()  helper?  
> > > http://www.web2py.com/book/default/section/5/2?search=XML
>
> > > On May 23, 1:41 pm, RobertVa <[email protected]> wrote:
>
> > > > Hi.
>
> > > > I found function to unescape html data, which I believe would be very
> > > > prudent to put into framework itself.
>
> > > > from htmlentitydefs import name2codepoint
> > > > def replace_entities(match):
> > > >     try:
> > > >         ent = match.group(1)
> > > >         if ent[0] == "#":
> > > >             if ent[1] == 'x' or ent[1] == 'X':
> > > >                 return unichr(int(ent[2:], 16))
> > > >             else:
> > > >                 return unichr(int(ent[1:], 10))
> > > >         return unichr(name2codepoint[ent])
> > > >     except:
> > > >         return match.group()
>
> > > > entity_re = re.compile(r'&(#?[A-Za-z0-9]+?);')
>
> > > > def html_unescape(data):
> > > >     return entity_re.sub(replace_entities, data)
>
> > > > Tnx to 
> > > > author.http://blog.client9.com/2008/10/html-unescape-in-python.html

Reply via email to