Re: [Python-Dev] PEP 393 Summer of Code Project

Terry Reedy Fri, 09 Sep 2011 10:19:14 -0700

On 9/9/2011 12:12 PM, [email protected] wrote:

On Thu, Sep 8, 2011 at 10:39 PM, Terry Reedy<[email protected]>  wrote:

On 9/8/2011 6:15 PM, [email protected] wrote:


Oops, forgot to add the link for the gory details for Java and>    2 byte
unicode:

http://java.sun.com/developer/technicalArticles/Intl/Supplementary/


This is dated 2004. Basically, they considered several options, tried out 4,
and ended up sticking with char[] (sequences) as UTF-16 with char = 16 bit
code unit and added 32-bit Character(int) class for low-level manipulation
of code points.

I did not see the indexing problem mentioned. I get the impression that they
encourage sequence forward-backward iteration (cursor-based access) rather
than random-access indexing.

Hmmm, sorry for the irrelevant link - my lack of expertise here is
showing. What I do know is that we (meaning Jim Baker) are taking
great pains to always use codepoints even for random access in our
unicode code. I can't speak to the performance implications without
some deeper study into what Jim has done.

I am curious how you index by code point rather than code unit with16-bit code units and how it compares with the method I posted. Is thereanything I can read? Reply off list if you want.


--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

Reply via email to