Re: [Python-Dev] [Python-checkins] cpython: Add a few entries to whatsnew/3.3.rst.
Ezio Melotti, 26.09.2012 18:30: The problem is that the standard allows some charref to end without a ';', but not all of them. So both Eacuteric and Eacute;ric will be parsed as Éric, but only alpha;centauri will result in αcentauri -- alphacentauri will be returned unchanged. To preserve this I included them both, in the same way they are listed at http://www.w3.org/TR/html5/named-character-references.html. Interesting. Seems to be missing on dailywtf, though. Maybe just an oversight. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Add a few entries to whatsnew/3.3.rst.
On 26.09.12 16:43, ezio.melotti wrote: http://hg.python.org/cpython/rev/36f61661f71e changeset: 79194:36f61661f71e user:Ezio Melotti ezio.melo...@gmail.com date:Wed Sep 26 17:43:23 2012 +0300 summary: Add a few entries to whatsnew/3.3.rst. [...] + +A new :data:`~html.entities.html5` dictionary that maps HTML5 named character +references to the equivalent Unicode character(s) (e.g. ``html5['gt;'] == ''``) +has been added to the :mod:`html.entities` module. The dictionary is now also +used by :class:`~html.parser.HTMLParser`. Is there a reason why the trailing ';' is included in the entity names? Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Add a few entries to whatsnew/3.3.rst.
On Wed, Sep 26, 2012 at 6:02 PM, Walter Dörwald wal...@livinglogic.de wrote: On 26.09.12 16:43, ezio.melotti wrote: http://hg.python.org/cpython/rev/36f61661f71e changeset: 79194:36f61661f71e user:Ezio Melotti ezio.melo...@gmail.com date:Wed Sep 26 17:43:23 2012 +0300 summary: Add a few entries to whatsnew/3.3.rst. [...] + +A new :data:`~html.entities.html5` dictionary that maps HTML5 named character +references to the equivalent Unicode character(s) (e.g. ``html5['gt;'] == ''``) +has been added to the :mod:`html.entities` module. The dictionary is now also +used by :class:`~html.parser.HTMLParser`. Is there a reason why the trailing ';' is included in the entity names? Yes, to quote http://bugs.python.org/issue3#msg163706: The problem is that the standard allows some charref to end without a ';', but not all of them. So both Eacuteric and Eacute;ric will be parsed as Éric, but only alpha;centauri will result in αcentauri -- alphacentauri will be returned unchanged. To preserve this I included them both, in the same way they are listed at http://www.w3.org/TR/html5/named-character-references.html. This is also explained at http://docs.python.org/dev/library/html.entities.html#html.entities.html5. Best Regards, Ezio Melotti Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com