On Wed, Sep 26, 2012 at 6:02 PM, Walter Dörwald <wal...@livinglogic.de> wrote:
> On 26.09.12 16:43, ezio.melotti wrote:
>
>> http://hg.python.org/cpython/rev/36f61661f71e
>> changeset:   79194:36f61661f71e
>> user:        Ezio Melotti <ezio.melo...@gmail.com>
>> date:        Wed Sep 26 17:43:23 2012 +0300
>> summary:
>>    Add a few entries to whatsnew/3.3.rst.
>> [...]
>>
>> +
>> +A new :data:`~html.entities.html5` dictionary that maps HTML5 named 
>> character
>> +references to the equivalent Unicode character(s) (e.g. ``html5['gt;'] == 
>> '>'``)
>> +has been added to the :mod:`html.entities` module.  The dictionary is now 
>> also
>> +used by :class:`~html.parser.HTMLParser`.
>
>
> Is there a reason why the trailing ';' is included in the entity names?
>

Yes, to quote <http://bugs.python.org/issue11113#msg163706>:

"""
The problem is that the standard allows some charref to end without a
';', but not all of them.

So both "&Eacuteric" and &Eacute;ric" will be parsed as "Éric", but
only "&alpha;centauri" will result in "αcentauri" -- "&alphacentauri"
will be returned unchanged.
"""

To preserve this I included them both, in the same way they are listed
at <http://www.w3.org/TR/html5/named-character-references.html>.
This is also explained at
<http://docs.python.org/dev/library/html.entities.html#html.entities.html5>.

Best Regards,
Ezio Melotti

> Servus,
>    Walter
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to