On 01.02.2021 17:10, Victor Stinner wrote: > On Mon, Feb 1, 2021 at 4:47 PM M.-A. Lemburg <m...@egenix.com> wrote: >> At the very least, we should have such APIs for going from wchar_t* >> to a Python object. >> >> The alternatives you provide all require creating an intermediate >> Python object for this purpose. > > We cannot optimize all use cases. IMO we should only optimize > conversions between char* and Python object. > > I don't see the need for two conversions (char* => Python and then > Python => wchar_t*) as an issue if you need wchar_t*.
The C code is already there, but it got hidden away in the Python 3.3 change to new internals. All that needs to be done is remove the intermediate Python Unicode object creation and have those encoder APIs again interface to the native C code. > Objects/unicodeobject.c is already very complex with specialization > for ASCII, Py_UCS1 (latin1), Py_UCS2 and Py_UCS4 kinds: 16k lines of C > code. I would prefer to make it simpler than more complex. > > Internally, functions like PyUnicode_EncodeLatin1() already do the two > conversions. So it's not like the PEP has any impact on performance. Before Python 3.3 all those APIs interfaced directly to the C codec functions. The introduction of an intermediate Python Unicode object was just done as quick work-around, even though it was not really needed, since Python 3.3 did not remove the C code of the encoders. >> That would keep extensions working after a recompile, since >> Py_UNICODE is already a typedef to wchar_t. > > Extensions should not use Py_UNICODE*/wchar_t*. They should not use Py_UNICODE. wchar_t is standard C and is in wide spread use in C code for storing Unicode data. This was one of the main reason for introducing UCS4 Python versions for Linux in the mid 2000s, since Linux apps used 4 byte wchar_t as native storage format. My point is that extensions would just need a recompile with the change from Py_UNICODE to wchar_t, since Py_UNICODE and wchar_t are already the same thing in Python 3.3+. > Can you explain where wchar_t* type is appropriate and how two > conversions is a performance bottleneck? If an extension has a wchar_t* string, it should be easy to convert this in to a Python bytes object for use in Python. Just like it should be easy to go from a char* string to a Python str object. The PEP breaks this symmetry by removing access to the encoder implementations. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 01 2021) >>> Python Projects, Coaching and Support ... https://www.egenix.com/ >>> Python Product Development ... https://consulting.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/ _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FSUPT6B26VJT7S6UCW4RYWRQ3LYLUINU/ Code of Conduct: http://python.org/psf/codeofconduct/