Eryk Sun added the comment:
It's undocumented that cast() should work to directly convert Python strings to
pointers. Even when it seems to work, it's a risky thing to depend on because
there's no source ctypes data object to reference. Thus there's neither
_b_base_ nor anything in _objects to support the reference. If the string has
since been deallocated, the pointer is invalid.
What you've uncovered is an implementation detail. Windows has a 16-bit
unsigned wchar_t type, so HAVE_USABLE_WCHAR_T is defined when building the
default narrow build in Python 2. In this case ctypes can use
PyUnicode_AS_UNICODE, which is why you can get the base address of the unicode
object's internal buffer on Windows.
Linux systems define wchar_t as a 4-byte signed value. IIRC it's a typedef for
int. Because wchar_t is signed in this case, HAVE_USABLE_WCHAR_T is not defined
even for a wide build. ctypes has to temporarily copy the string via
PyUnicode_AsWideChar. It references the memory in a capsule object. You can see
this by constructing a c_wchar_p instance, for example:
>>> p = ctypes.c_wchar_p(u'helloworld')
>>> p._objects
<capsule object "_ctypes/cfield.c wchar_t buffer from unicode" at
0x7fedb67d5f90>
In your case, by the time you actually look at the address, the capsule has
been deallocated, and the memory is no longer valid. For example:
>>> addr = ctypes.cast(u'helloworld', ctypes.c_void_p).value
>>> ctypes.wstring_at(addr, 10)
u'\U0150ccf0\x00\U0150cc00\x00oworld'
It works as expected if one instead casts a c_wchar_p instance, which
references the capsule to keep the memory alive:
>>> addr = ctypes.cast(p, ctypes.c_void_p).value
>>> ctypes.wstring_at(addr, 10)
u'helloworld'
However, that's not what you want since we know it's a copy. I think your only
option is to use the C API via ctypes.pythonapi. For example:
ctypes.pythonapi.PyUnicodeUCS4_AsUnicode.argtypes = (ctypes.py_object,)
ctypes.pythonapi.PyUnicodeUCS4_AsUnicode.restype = ctypes.c_void_p
s = u'helloworld'
addr = ctypes.pythonapi.PyUnicodeUCS4_AsUnicode(s)
>>> ctypes.wstring_at(addr, 10)
u'helloworld'
On narrow builds this function is exported a PyUnicodeUCS2_AsUnicode.
----------
nosy: +eryksun
resolution: -> not a bug
stage: -> resolved
status: open -> closed
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue30634>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com