Ezio Melotti <ezio.melo...@gmail.com> added the comment: I think that methods like str.isalpha can and should be fixed. Since _PyUnicode_IsAlpha now accepts a Py_UCS4, the body of unicode_isalpha can be changed to convert normal chars and surrogates pairs to a Py_UCS4 before calling Py_UNICODE_ISALPHA. The attached patch is a proof of concept of this approach and returns True for '\N{OLD ITALIC LETTER A}'.isalpha() on a narrow build. It still has a number of issues that should be addressed (check for narrow builds, check for lone surrogates, check for high surrogate at the end of a string, fix compiler warnings ...) but it should be good enough as a PoC.
I would also suggest to introduce a set of macros to handle surrogates (e.g. detect, combine) and use it in all the functions that need to work with them. ---------- keywords: +patch Added file: http://bugs.python.org/file19809/issue10521-isalpha.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10521> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com