Ezio Melotti <ezio.melo...@gmail.com> added the comment:

AFAIU the macro returns lone surrogates as they are, this means that:
  1) if the string contains only surrogate pairs, Py_UNICODE_NEXT will iterate 
on scalar values[0];
  2) if the string contains only lone surrogates, it will iterate on 
codepoints[1];
  3) if it contains both it will be half and half (i.e. scalar values if the 
surrogates are in pair, or falling back on codepoints if they aren't);
(for strings without surrogates, iterating on scalar values or codepoints is 
the same).

Is this semantic correct for all (or at least most of) the places where the 
macro will be used?
Would a stricter version (that rejects lone surrogates and iterates on scalar 
values only) be useful in addition or in alternative to Py_UNICODE_NEXT?

[0]: http://unicode.org/glossary/#unicode_scalar_value
[1]: http://unicode.org/glossary/#code_point

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10542>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to