On Wed, Sep 26, 2012 at 6:02 PM, Walter Dörwald <wal...@livinglogic.de> wrote: > On 26.09.12 16:43, ezio.melotti wrote: > >> http://hg.python.org/cpython/rev/36f61661f71e >> changeset: 79194:36f61661f71e >> user: Ezio Melotti <ezio.melo...@gmail.com> >> date: Wed Sep 26 17:43:23 2012 +0300 >> summary: >> Add a few entries to whatsnew/3.3.rst. >> [...] >> >> + >> +A new :data:`~html.entities.html5` dictionary that maps HTML5 named >> character >> +references to the equivalent Unicode character(s) (e.g. ``html5['gt;'] == >> '>'``) >> +has been added to the :mod:`html.entities` module. The dictionary is now >> also >> +used by :class:`~html.parser.HTMLParser`. > > > Is there a reason why the trailing ';' is included in the entity names? >
Yes, to quote <http://bugs.python.org/issue11113#msg163706>: """ The problem is that the standard allows some charref to end without a ';', but not all of them. So both "Éric" and Éric" will be parsed as "Éric", but only "αcentauri" will result in "αcentauri" -- "&alphacentauri" will be returned unchanged. """ To preserve this I included them both, in the same way they are listed at <http://www.w3.org/TR/html5/named-character-references.html>. This is also explained at <http://docs.python.org/dev/library/html.entities.html#html.entities.html5>. Best Regards, Ezio Melotti > Servus, > Walter _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com