Re: [Python-Dev] Python and the Unicode Character Database

Terry Reedy Tue, 30 Nov 2010 15:21:42 -0800

On 11/30/2010 10:05 AM, Alexander Belopolsky wrote:

My general answers to the questions you have raised are as follows:

1. Each new feature release should use the latest version of the UCD asof the first beta release (or perhaps a week or so before). New charsare new features and the beta period can be used to (hopefully) iron outany bugs introduced by a new UCD version.

2. The language specification should not be UCD version specific. Martinpointed out that the definition of identifiers was intentionally writtento not be, bu referring to 'current version' or some such. On the otherhand, the UCD version used should be programatically discoverable,perhaps as an attribute of sys or str.

3.. The UCD should not change in bugfix releases. New chars are newfeatures. Adding them in bugfix releases will introduce gratuitousimcompatibilities between releases. People who want the latest Unicodeshould either upgrade to the latest Python version or patch an olderversion (but not expect core support for any problems that creates).

Given that 2.7 will be maintained for 5 years and arguably Unicode
Consortium takes backward compatibility very seriously, wouldn't it
make sense to consider a backport at some point?

I am sure we will soon see a bug report that the following does not
work in 2.7: :-)

ord('\N{CAT FACE WITH WRY SMILE}')

3 (cont). 2.7 is no different in that regard. It is feature frozen justlike all other x.y releases. And that is the answer to any such report.If that code became valid in 2.7.2, for instance, it would still notwork in 2.7 and 2.7.1. Not working is not a bug; working is a newfeature introduced after 2.7 was released.

- How specific should library reference manual be in defining methods
affected by UCD such as str.upper()?


It should specify what this actually does in Unicode terminology
(probably in addition to a layman's rephrase of that)


I opened an issue for this:

http://bugs.python.org/issue10587


1,2 (cont). Good idea in general.

I was more concerned about wide an narrow unicode CPython builds.  Is
it a bug that   '\UXXXXXXXX'.isalpha() may disagree even when the two
implementations are based on the same version of UCD?

4. While the difference between narrow/wide builds of (CPython) x.y(which should have once constant UCD) cannot be completely masked, Iappreciate and generally agree with your efforts to minimize them. Insome cases, there will be a conflict/tradeoff between eliminating thisdifference versus that.


--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

Reply via email to