Re: [Python-Dev] PEP 393 Summer of Code Project

Raymond Hettinger Fri, 26 Aug 2011 23:00:12 -0700

On Aug 26, 2011, at 8:51 PM, Terry Reedy wrote:

> 
> 
> On 8/26/2011 8:42 PM, Guido van Rossum wrote:
>> On Fri, Aug 26, 2011 at 3:57 PM, Terry Reedy<[email protected]>  wrote:
> 
>>> My impression is that a UFT-16 implementation, to be properly called such,
>>> must do len and [] in terms of code points, which is why Python's narrow
>>> builds are called UCS-2 and not UTF-16.
>> 
>> I don't think anyone else has that impression. Please cite chapter and
>> verse if you really think this is important. IIUC, UCS-2 does not
>> allow surrogate pairs, whereas Python (and Java, and .NET, and
>> Windows) 16-bit strings all do support surrogate pairs. And they all
> 
> For that reason, I think UTF-16 is a better term that UCS-2 for narrow builds 
> (whether or not the above impression is true).


I agree.  It's weird to call something UCS-2 if code points above 65535 are 
representable.
The naming convention for codecs is that the UTF prefix is used for lossless 
encodings that cover the entire range of Unicode.

"The first amendment to the original edition of the UCS defined UTF-16, an 
extension of UCS-2, to represent code points outside the BMP."

Raymond

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

Reply via email to