Hello,
I'm sending the following here as they involve both cffi and PyPy.
For the last few years i have been trying to find the most efficient way
to pass UTF8 strings between PyPy and C code using cffi.
Right now when PyPy receives a utf8 string (from a C function) it has to
do 2 copies:
1. convert the cdata string to a pypy byte string via ffi.string
2. convert ffi.string to a unicode string
When pypy sends a utf8 string it also does 2 copies:
1. convert pypy unicode string to utf8-encoded byte string
2. copy the byte string into a cdata string.
From what i understand, there is a cffi optimization dealing with
windows unicode (via set_unicode) where on windows platforms and when
using the native windows unicode strings, cffi avoids doing one of the
copies in both of above cases.
On linux where the default unicode format for C libraries nowadays is
UTF8, there is no such optimization, so we have to do the two copies in
all string passing.
PyPy at some point was going towards using utf8 string internally, but i
don't know if this is still the plan or not. Using utf8 strings would
optimize away one of the two copies on the linux platform (utf8
encoding/decoding would become a nop operator).
All of the above is the current status of cffi and pypy string handling
as i understand it. So my proposal to reduce the string copies to a
minimum is this:
1. If PyPy doesn't go towards using utf8 strings internally, maybe we
need some special C type that denotes that the string is utf8 and
pypy/cffi should do the conversion from-to it automatically. Something
like "wchar_t" in windows but denoting a utf8 string. CFFI can define a
special type ("__utf8char_t"?) for these strings.
Alternatively, an encoding parameter could be added in ffi.string, so
that it'll do both the cdata and encoding conversions in one step.
2. If PyPy does go towards using utf8 string internally. Then it could
call C functions that do not mutate the pypy strings and do not store
pointers to them, by passing the strings directly. This could be
accomplished by using a cffi annotation for these kind of
non-string-mutating C functions.
Above ideas are based on my understanding of the current status and the
future directions of PyPy. If i have misunderstood something i would be
glad to be set right :).
Kind regards,
l.
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev