Michael Butscher schrieb: > Is this a bug or is SGMLParser not meant to be used for unicode strings > (it should be documented then)?
In a sense, SGML itself is not meant to be used for Unicode. In SGML, the document character set is subject to the SGML application. So what specific character a character reference refers to is also subject to the SGML application. This entire issue is already documented; see the discussion of convert_charref and convert_codepoint in http://docs.python.org/lib/module-sgmllib.html Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list