Ezio Melotti <ezio.melo...@gmail.com> added the comment:

I think that methods like str.isalpha can and should be fixed. Since 
_PyUnicode_IsAlpha now accepts a Py_UCS4, the body of unicode_isalpha can be 
changed to convert normal chars and surrogates pairs to a Py_UCS4 before 
calling Py_UNICODE_ISALPHA.
The attached patch is a proof of concept of this approach and returns True for 
'\N{OLD ITALIC LETTER A}'.isalpha() on a narrow build.
It still has a number of issues that should be addressed (check for narrow 
builds, check for lone surrogates, check for high surrogate at the end of a 
string, fix compiler warnings ...) but it should be good enough as a PoC.

I would also suggest to introduce a set of macros to handle surrogates (e.g. 
detect, combine) and use it in all the functions that need to work with them.

----------
keywords: +patch
Added file: http://bugs.python.org/file19809/issue10521-isalpha.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10521>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to