STINNER Victor victor.stin...@haypocalc.com added the comment:
r85172 changes PyUnicode_AsWideCharString() (don't count the trailing nul
character in the output size) and add unit tests.
r85173 patches unicode_aswidechar() to supports non-BMP characters for all
known wchar_t/Py_UNICODE size
STINNER Victor victor.stin...@haypocalc.com added the comment:
r85174+r85177: ctypes.c_wchar supports non-BMP characters with 32 bits wchar_t
= fix this issue
(I commited also an unwanted change on _testcapi to fix r85172 in r85174:
r85175 reverts this change, and r85176 fixes the _testcapi
STINNER Victor victor.stin...@haypocalc.com added the comment:
r85173 patches unicode_aswidechar() to supports non-BMP characters
for all known wchar_t/Py_UNICODE size combinaisons (2/2, 2/4 and 4/2).
Oh, and 4/4 ;-)
--
___
Python tracker
Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:
Thanks for working on this!
Since this was a bugfix, it should be merged back into 2.7, yes?
--
stage: unit test needed - committed/rejected
___
Python tracker
STINNER Victor victor.stin...@haypocalc.com added the comment:
Since this was a bugfix, it should be merged back into 2.7, yes?
Mmmh, the fix requires to change PyUnicode_AsWideChar() function (support
non-BMP characters and surrogate pairs) (and maybe also to create
Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:
Since I noticed the bug through source code inspection and no one has reported
it occurring in practice, that sounds reasonable to me.
--
versions: -Python 2.7
___
Python tracker
STINNER Victor victor.stin...@haypocalc.com added the comment:
Update the patch for the new PyUnicode_AsWideCharString() function:
- use Py_UNICODE_SIZE and SIZEOF_WCHAR_T in the preprocessor tests
- faster loop: don't use a counter + pointer, but only use pointers (for the
stop condition)
Changes by STINNER Victor victor.stin...@haypocalc.com:
Removed file:
http://bugs.python.org/file17322/pyunicode_aswidechar_surrogates-py3k.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
STINNER Victor victor.stin...@haypocalc.com added the comment:
Patch version 3:
- fix unicode_aswidechar if Py_UNICODE_SIZE == SIZEOF_WCHAR_T and w == NULL
(return the number of characters, don't write into w!)
- improve unicode_aswidechar() comment
--
Added file:
STINNER Victor victor.stin...@haypocalc.com added the comment:
I don't know how to test if Py_UNICODE_SIZE == 4 SIZEOF_WCHAR_T == 2. On
Windows, sizeof(wchar_t) is 2, but it looks like Python is not prepared to have
Py_UNICODE != wchar_t for is Windows implementation.
wchar_t is 32 bits long
Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:
I, too, can't think of any platforms where Py_UNICODE_SIZE == 4
SIZEOF_WCHAR_T == 2 and I'm not sure what the previous policy has been. Have
you noticed any other code that would set a precedent?
If no one else chimes in,
Marc-Andre Lemburg m...@egenix.com added the comment:
STINNER Victor wrote:
STINNER Victor victor.stin...@haypocalc.com added the comment:
I don't know how to test if Py_UNICODE_SIZE == 4 SIZEOF_WCHAR_T == 2. On
Windows, sizeof(wchar_t) is 2, but it looks like Python is not prepared to
Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:
You can tweak the Windows pyconfig.h to use UCS4, AFAIK, if you want to
test drive this case.
I seem to recall seeing some other code that assumed Windows implied UCS2.
Proceed with caution. ;-)
But it's probably easier
Marc-Andre Lemburg m...@egenix.com added the comment:
Daniel Stutzbach wrote:
Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:
You can tweak the Windows pyconfig.h to use UCS4, AFAIK, if you want to
test drive this case.
I seem to recall seeing some other code that
STINNER Victor victor.stin...@haypocalc.com added the comment:
Patch version 4:
- implement unicode_aswidechar() for 16 bits wchar_t and 32 bits Py_UNICODE
- PyUnicode_AsWideWcharString() returns the number of wide characters
excluding the nul character as does PyUnicode_AsWideChar()
For 16
Changes by STINNER Victor victor.stin...@haypocalc.com:
Removed file: http://bugs.python.org/file19082/aswidechar_nonbmp-2.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
Changes by STINNER Victor victor.stin...@haypocalc.com:
Removed file: http://bugs.python.org/file19083/aswidechar_nonbmp-3.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
STINNER Victor victor.stin...@haypocalc.com added the comment:
Ooops, I lost my patch to fix the initial (ctypes) issue. Here is an updated
patch: ctypes_nonbmp.patch (which needs aswidechar_nonbmp-4.patch).
--
Added file: http://bugs.python.org/file19101/ctypes_nonbmp.patch
STINNER Victor victor.stin...@haypocalc.com added the comment:
#9979 proposes to create a new PyUnicode_AsWideCharString() function.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
Daniel Stutzbach dan...@stutzbachenterprises.com added the comment:
I know enough about Unicode to have reported this bug, but I don't feel
knowledgeable enough about Python's Unicode implementation to comment on your
suggested solution.
I'm adding the other people listed in
STINNER Victor victor.stin...@haypocalc.com added the comment:
Support of characters outside the Unicode BMP (code 0x) is not complete
in narrow build (sizeof(Py_UNICODE) == 2) for Python2:
$ ./python
Python 2.7b2+ (trunk:81139M, May 13 2010, 18:45:37)
x=u'\U0001'
x[0], x[1]
STINNER Victor victor.stin...@haypocalc.com added the comment:
Patch for Python3:
- Fix PyUnicode_AsWideChar() to support surrogates (Py_UNICODE: 2 bytes,
wchar_t: 4 bytes)
- u_set() of _ctypes uses PyUnicode_AsWideChar()
- add a test (skipped if sizeof(wchar_t) is smaller than 4 bytes)
Changes by STINNER Victor victor.stin...@haypocalc.com:
--
components: +Unicode
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
Changes by STINNER Victor victor.stin...@haypocalc.com:
--
nosy: +haypo
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8670
___
___
New submission from Daniel Stutzbach dan...@stutzbachenterprises.com:
Using a UCS2 Python on a platform with a 32-bit wchar_t, the following code
throws an exception (but should not):
ctypes.c_wchar('\u1')
Traceback (most recent call last):
File stdin, line 1, in module
TypeError: one
25 matches
Mail list logo