Hi.

I found function to unescape html data, which I believe would be very
prudent to put into framework itself.

from htmlentitydefs import name2codepoint
def replace_entities(match):
    try:
        ent = match.group(1)
        if ent[0] == "#":
            if ent[1] == 'x' or ent[1] == 'X':
                return unichr(int(ent[2:], 16))
            else:
                return unichr(int(ent[1:], 10))
        return unichr(name2codepoint[ent])
    except:
        return match.group()

entity_re = re.compile(r'&(#?[A-Za-z0-9]+?);')

def html_unescape(data):
    return entity_re.sub(replace_entities, data)


Tnx to author.
http://blog.client9.com/2008/10/html-unescape-in-python.html

Reply via email to