yes. If you just do str(TAG(text)) this will un-escape te text as you suggest (but to utf8 not unicode).
On May 25, 12:58 pm, RobertVa <[email protected]> wrote: > This is very useful. I'm just making new agreggator and this will come > in handy. For scraping purposes. > As I see it, this would be some sort of jquery for HTML in > python. :)))) > > On 24 maj, 22:25, mdipierro <[email protected]> wrote: > > > I liked your suggestion and I used it to make > > gluon.html.web2pyHTMLParser, take a look and let me know what you > > think. > > > On May 23, 2:20 pm, RobertVa <[email protected]> wrote: > > > > I did. > > > > It has xmlescape function, but reverse function (unescape) is not > > > defined. > > > > On 23 maj, 20:59, Yarko Tymciurak <[email protected]> wrote: > > > > > Have you looked at the XML() helper? > > > > http://www.web2py.com/book/default/section/5/2?search=XML > > > > > On May 23, 1:41 pm, RobertVa <[email protected]> wrote: > > > > > > Hi. > > > > > > I found function to unescape html data, which I believe would be very > > > > > prudent to put into framework itself. > > > > > > from htmlentitydefs import name2codepoint > > > > > def replace_entities(match): > > > > > try: > > > > > ent = match.group(1) > > > > > if ent[0] == "#": > > > > > if ent[1] == 'x' or ent[1] == 'X': > > > > > return unichr(int(ent[2:], 16)) > > > > > else: > > > > > return unichr(int(ent[1:], 10)) > > > > > return unichr(name2codepoint[ent]) > > > > > except: > > > > > return match.group() > > > > > > entity_re = re.compile(r'&(#?[A-Za-z0-9]+?);') > > > > > > def html_unescape(data): > > > > > return entity_re.sub(replace_entities, data) > > > > > > Tnx to > > > > > author.http://blog.client9.com/2008/10/html-unescape-in-python.html

