Re: decode Numeric Character References to unicode

Duncan Booth Mon, 18 Feb 2008 04:11:28 -0800

7stud <[EMAIL PROTECTED]> wrote:

> On Feb 18, 4:53 am, 7stud <[EMAIL PROTECTED]> wrote:
>> On Feb 18, 3:20 am, William Heymann <[EMAIL PROTECTED]> wrote:
>>
>> > How do I decode a string back to useful unicode that has xml
>> > numeric cha 
> racter
>> > references in it?
>>
>> > Things like &#21344;  #which is: &_#21344_; (without the
>> > underscores) 
>>
>> BeautifulSoup can handle two of the three formats for html entities.
>> For instance, an 'o' with umlaut can be represented in three
>> different ways:
>>
>> &_ouml_;
>> ö
>> ö
>>
> 
> lol.  It's hard to even make posts about this stuff because html
> entities get converted by the forum software. Here are the three
> different formats for an 'o with umlaut' with some underscores added
> to keep the forum software from rendering the characters:
> 
> &_ouml_;
> &_#246_;
> &_#xf6_;


FWIW, your original post was fine, it was just the quoted text in your 
followup that was wrong.

I guess that is yet another reason to use a real newsreader or the mailing 
list rather than Google Groups.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: decode Numeric Character References to unicode

Reply via email to