[issue11113] html.entities mapping dicts need updating?

Ezio Melotti Sat, 23 Jun 2012 20:11:42 -0700

Ezio Melotti <ezio.melo...@gmail.com> added the comment:

The problem is that the standard allows some charref to end without a ';', but 
not all of them.


So both "&Eacuteric" and &Eacute;ric" will be parsed as "Éric", but only 
"&alpha;centauri" will result in "αcentauri" -- "&alphacentauri" will be 
returned unchanged.

I'm now working on #15156 to use this dict in HTMLParser, and detecting the 
';'-less entities is not easy.  A possible solution is to keep the names that 
are accepted without ',' in a separate (private) dict and expose a function 
like HTMLParser.unescape that implements all the necessary logic.

Regarding ChainMap, the html5 dict should be a superset of the html4 one.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11113>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11113] html.entities mapping dicts need updating?

Reply via email to