Senthil, That worked like a charm, thank you for the help! Now my Snipt's are actually legible :)
On Wed, Mar 4, 2009 at 12:01 AM, Senthil Kumaran <orsent...@gmail.com>wrote: > On Wed, Mar 4, 2009 at 11:13 AM, Eric Dorsey <dors...@gmail.com> wrote: > > I know, for example, that the > code means >, but what I don't know is > > how to convert it in all my data to show properly? I > > Feedparser returns the output in html only so except html tags and > entities in the output. > What you want is to Unescape HTML entities ( > http://effbot.org/zone/re-sub.htm#unescape-html ) > > import feedparser > import re, htmlentitydefs > > def unescape(text): > def fixup(m): > text = m.group(0) > if text[:2] == "&#": > # character reference > try: > if text[:3] == "&#x": > return unichr(int(text[3:-1], 16)) > else: > return unichr(int(text[2:-1])) > except ValueError: > pass > else: > # named entity > try: > text = unichr(htmlentitydefs.name2codepoint[text[1:-1]]) > except KeyError: > pass > return text # leave as is > return re.sub("&#?\w+;", fixup, text) > > > d = feedparser.parse('http://snipt.net/dorseye/feed') > > x=0 > for i in d['entries']: > print unescape(d['entries'][x].title) > print unescape(d['entries'][x].summary) > print > x+=1 > > > > HTH, > Senthil >
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor