On Thu, Dec 6, 2012 at 2:39 PM, Albert-Jan Roskam <[email protected]> wrote: > > http://pastecode.org/index.php/view/29608996 > > import ctypes > s = u'\u0627\u0644\u0633\u0644\u0627\u0645' > v = ctypes.c_wchar_p(s) > print v # prints c_wchar_p(u'\u0627\u0644\u0633\u0644\u0627\u0645') > v.value # prints u'\u0627\u0644\u0633\u0644\u0627\u0645'
Your decorator could end up encoding str or decoding unicode. Typically this gets routed through the default encoding (i.e. ASCII) and probably triggers a UnicodeDecodeError or UnicodeEncodeError. I'd limit encoding to unicode and decoding to bytes/str. On the subject of wchar_t, here's a funny rant: http://losingfight.com/blog/2006/07/28/wchar_t-unsafe-at-any-size/ The base type for Unicode in CPython isn't wchar_t on all platforms/builds. It depends on Py_UNICODE_SIZE (2 or 4 bytes) vs sizeof(wchar_t) (also on whether wchar_t is unsigned, but that's not relevant here). 3.3 is in its own flexible universe. I recently came across a bug in create_unicode_buffer on Windows Python 3.3. The new flexible string implementation uses Py_UCS4 instead of creating surrogate pairs on Windows. However, given that the size of c_wchar is 2 [bytes] on Windows, create_unicode_buffer still needs to factor in the surrogate pairs by calculating the target size = len(init) + sum(ord(c) > 0xffff for c in init) + 1. Naively it uses size = len(init) + 1, which fails if the string has multiple non-BMP characters. Here's another ctypes related issue. On a narrow build prior to 3.2, PyUnicode_AsWideChar returns a wide-character string that may contain surrogate pairs even if wchar_t is 32-bit. That isn't well-formed UTF-32. This was fixed in 3.2 as part of fixing a ctypes bug. ctypes u_set (type 'u' is c_wchar) was modified to use an updated PyUnicode_AsWideChar and Z_set (type 'Z' is c_wchar_p) was modified to use the new PyUnicode_AsWideCharString. 3.2.3 links: u_set: http://hg.python.org/cpython/file/3d0686d90f55/Modules/_ctypes/cfield.c#l1202 Z_set: http://hg.python.org/cpython/file/3d0686d90f55/Modules/_ctypes/cfield.c#l1401 The new PyUnicode_AsWideChar and PyUnicode_AsWideCharString call the helper function unicode_aswidechar. This was added in 3.2 to handle the different cases of Py_UNICODE_SIZE more carefully: http://hg.python.org/cpython/file/3d0686d90f55/Objects/unicodeobject.c#l1187 Py_UNICODE_SIZE == SIZEOF_WCHAR_T Py_UNICODE_SIZE == 2 && SIZEOF_WCHAR_T == 4 Py_UNICODE_SIZE == 4 && SIZEOF_WCHAR_T == 2 The 2nd case takes advantage of the larger wchar_t to recombine surrogate pairs. The 3rd case creates surrogate pairs instead of truncating the character code. (Note: this helper was updated in 3.3 to use the new function PyUnicode_AsUnicodeAndSize.) Prior to 3.2, PyUnicode_AsWideChar wasn't nearly as careful. See the version in 3.1.5: http://hg.python.org/cpython/file/7395330e495e/Objects/unicodeobject.c#l1085 _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
