Re: [Python-Dev] [Python-checkins] cpython: Add a few entries to whatsnew/3.3.rst.

2012-09-27 Thread Stefan Behnel
Ezio Melotti, 26.09.2012 18:30:
 
 The problem is that the standard allows some charref to end without a
 ';', but not all of them.
 
 So both Eacuteric and Eacute;ric will be parsed as Éric, but
 only alpha;centauri will result in αcentauri -- alphacentauri
 will be returned unchanged.
 
 
 To preserve this I included them both, in the same way they are listed
 at http://www.w3.org/TR/html5/named-character-references.html.

Interesting. Seems to be missing on dailywtf, though. Maybe just an oversight.

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Add a few entries to whatsnew/3.3.rst.

2012-09-26 Thread Walter Dörwald

On 26.09.12 16:43, ezio.melotti wrote:


http://hg.python.org/cpython/rev/36f61661f71e
changeset:   79194:36f61661f71e
user:Ezio Melotti ezio.melo...@gmail.com
date:Wed Sep 26 17:43:23 2012 +0300
summary:
   Add a few entries to whatsnew/3.3.rst.
[...]
+
+A new :data:`~html.entities.html5` dictionary that maps HTML5 named character
+references to the equivalent Unicode character(s) (e.g. ``html5['gt;'] == 
''``)
+has been added to the :mod:`html.entities` module.  The dictionary is now also
+used by :class:`~html.parser.HTMLParser`.


Is there a reason why the trailing ';' is included in the entity names?

Servus,
   Walter


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Add a few entries to whatsnew/3.3.rst.

2012-09-26 Thread Ezio Melotti
On Wed, Sep 26, 2012 at 6:02 PM, Walter Dörwald wal...@livinglogic.de wrote:
 On 26.09.12 16:43, ezio.melotti wrote:

 http://hg.python.org/cpython/rev/36f61661f71e
 changeset:   79194:36f61661f71e
 user:Ezio Melotti ezio.melo...@gmail.com
 date:Wed Sep 26 17:43:23 2012 +0300
 summary:
Add a few entries to whatsnew/3.3.rst.
 [...]

 +
 +A new :data:`~html.entities.html5` dictionary that maps HTML5 named 
 character
 +references to the equivalent Unicode character(s) (e.g. ``html5['gt;'] == 
 ''``)
 +has been added to the :mod:`html.entities` module.  The dictionary is now 
 also
 +used by :class:`~html.parser.HTMLParser`.


 Is there a reason why the trailing ';' is included in the entity names?


Yes, to quote http://bugs.python.org/issue3#msg163706:


The problem is that the standard allows some charref to end without a
';', but not all of them.

So both Eacuteric and Eacute;ric will be parsed as Éric, but
only alpha;centauri will result in αcentauri -- alphacentauri
will be returned unchanged.


To preserve this I included them both, in the same way they are listed
at http://www.w3.org/TR/html5/named-character-references.html.
This is also explained at
http://docs.python.org/dev/library/html.entities.html#html.entities.html5.

Best Regards,
Ezio Melotti

 Servus,
Walter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com