Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:

On Fri, Dec 10, 2010 at 6:09 PM, Daniel Stutzbach
<rep...@bugs.python.org> wrote:
..
> The second check for surrogates in Py_UNICODE_PUT_NEXT is necessary, unless 
> you can prove that
> Py_UNICODE_SOME_TRANSFORMATION will never transform characters < 0x10000 into 
> characters >
> 0x10000 or vice versa.
>
> Can we prove will always be the case, for current and future versions of 
> Unicode, for all or almost-all of the
> transformations we care about?
>
Certainly not for all, but for some important transformations, I
believe Unicode Standard does promise that the transformation  maps
BMP to BMP and supplements to supplements.  For example case folding
and normalization are two important examples.

> Answering that question and figuring out what to do about it are probably 
> more trouble than it's worth.
>  If a particularly point proves to be a bottleneck, we can always specialize 
> the code there later.

Agree.  It is even more likely that the applications that have to deal
with lots of supplementary characters will be better off using a wide
unicode build rather than such specialization.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10542>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to