Michael Butscher schrieb:
> Is this a bug or is SGMLParser not meant to be used for unicode strings 
> (it should be documented then)?

In a sense, SGML itself is not meant to be used for Unicode. In SGML,
the document character set is subject to the SGML application. So what
specific character a character reference refers to is also subject to
the SGML application.

This entire issue is already documented; see the discussion of
convert_charref and convert_codepoint in

http://docs.python.org/lib/module-sgmllib.html

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to