[web2py] Re: html unescape - if anyone needs it

mdipierro Tue, 25 May 2010 11:23:57 -0700

yes. If you just do str(TAG(text)) this will un-escape te text as you
suggest (but to utf8 not unicode).


On May 25, 12:58 pm, RobertVa <[email protected]> wrote:
> This is very useful. I'm just making new agreggator and this will come
> in handy. For scraping purposes.
> As I see it, this would be some sort of jquery for HTML in
> python. :))))
>
> On 24 maj, 22:25, mdipierro <[email protected]> wrote:
>
> > I liked your suggestion and I used it to make
> > gluon.html.web2pyHTMLParser, take a look and let me know what you
> > think.
>
> > On May 23, 2:20 pm, RobertVa <[email protected]> wrote:
>
> > > I did.
>
> > > It has xmlescape function, but reverse function (unescape) is not
> > > defined.
>
> > > On 23 maj, 20:59, Yarko Tymciurak <[email protected]> wrote:
>
> > > > Have you looked at the XML()  helper?  
> > > > http://www.web2py.com/book/default/section/5/2?search=XML
>
> > > > On May 23, 1:41 pm, RobertVa <[email protected]> wrote:
>
> > > > > Hi.
>
> > > > > I found function to unescape html data, which I believe would be very
> > > > > prudent to put into framework itself.
>
> > > > > from htmlentitydefs import name2codepoint
> > > > > def replace_entities(match):
> > > > >     try:
> > > > >         ent = match.group(1)
> > > > >         if ent[0] == "#":
> > > > >             if ent[1] == 'x' or ent[1] == 'X':
> > > > >                 return unichr(int(ent[2:], 16))
> > > > >             else:
> > > > >                 return unichr(int(ent[1:], 10))
> > > > >         return unichr(name2codepoint[ent])
> > > > >     except:
> > > > >         return match.group()
>
> > > > > entity_re = re.compile(r'&(#?[A-Za-z0-9]+?);')
>
> > > > > def html_unescape(data):
> > > > >     return entity_re.sub(replace_entities, data)
>
> > > > > Tnx to 
> > > > > author.http://blog.client9.com/2008/10/html-unescape-in-python.html

[web2py] Re: html unescape - if anyone needs it

Reply via email to