Ezio Melotti <ezio.melo...@gmail.com> added the comment: The problem is that the standard allows some charref to end without a ';', but not all of them.
So both "Éric" and Éric" will be parsed as "Éric", but only "αcentauri" will result in "αcentauri" -- "&alphacentauri" will be returned unchanged. I'm now working on #15156 to use this dict in HTMLParser, and detecting the ';'-less entities is not easy. A possible solution is to keep the names that are accepted without ',' in a separate (private) dict and expose a function like HTMLParser.unescape that implements all the necessary logic. Regarding ChainMap, the html5 dict should be a superset of the html4 one. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11113> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com