[issue10542] Py_UNICODE_NEXT and other macros for surrogates

Marc-Andre Lemburg Fri, 03 Dec 2010 12:15:25 -0800

Marc-Andre Lemburg <[email protected]> added the comment:

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <[email protected]> added the comment:
> 
> On Sat, Nov 27, 2010 at 6:38 PM, Raymond Hettinger
> <[email protected]> wrote:
> ..
>> I suggest Py_UNICODE_ADVANCE() to avoid false suggestion that the iterator 
>> protocol is being used.
>>
> 
> As a data point, ICU defines U16_NEXT() for similar purpose.  I also
> like ICU terminology for surrogates ("lead" and "trail") better than
> the backward "high" and "low".


"High" and "low" are Unicode standard terms, so we should use
those.

Regarding Py_UCS4_READ_CODE_POINT: you're right that surrogates
are code points, so how about Py_UCS4_READ_NEXT() ?!

Regarding Py_UCS4_READ_NEXT() vs. Py_UNICODE_READ_NEXT(): the return
value of the macro is a Py_UCS4 value, not a Py_UNICODE value. The
first argument of the macro can be any array, not just Py_UNICODE*,
but also Py_UCS4* or even int*.

Py_UCS2_READ_NEXT() would be plain wrong :-) Also note that Python
does have a Py_UCS4 type; it doesn't have a Py_UCS2 type.

That's why we should use *Py_UCS4*_READ_NEXT().

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue10542>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10542] Py_UNICODE_NEXT and other macros for surrogates

Reply via email to