[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-04 Thread Inada Naoki
On Tue, Feb 2, 2021 at 8:40 PM Inada Naoki wrote: > > On Tue, Feb 2, 2021 at 7:37 PM M.-A. Lemburg wrote: > > > > BTW: I don't understand this comment: > > "They are inefficient on platforms wchar_t* is UTF-16. It is because > > built-in codecs supports only UCS-1, UCS-2, and UCS-4 input." > > >

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-02 Thread Victor Stinner
On Tue, Feb 2, 2021 at 11:47 PM Inada Naoki wrote: > So if we support add UTF-16 support to ucs2_utf8_encoder(), it means > we need to add code and maintain only for PyUnicode_EncodeUTF8 (encode > from wchar_t* into char*). > > I don't think it is a good deal. As described in the PEP, encoder

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-02 Thread Inada Naoki
On Tue, Feb 2, 2021 at 9:40 PM Emily Bowman wrote: > > On Tue, Feb 2, 2021 at 3:47 AM Inada Naoki wrote: >> >> But when wchar_t* is UTF-16, ucs2_utf8_encoder() can not handle >> surrogate escape. >> We need to use a temporary Unicode object. That is what "inefficient" means. > > > Since real

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-02 Thread Emily Bowman
On Tue, Feb 2, 2021 at 3:47 AM Inada Naoki wrote: > But when wchar_t* is UTF-16, ucs2_utf8_encoder() can not handle > surrogate escape. > We need to use a temporary Unicode object. That is what "inefficient" > means. > Since real UCS-2 is effectively dead, maybe it should be flipped around:

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-02 Thread Inada Naoki
On Tue, Feb 2, 2021 at 7:37 PM M.-A. Lemburg wrote: > > >> That would keep extensions working after a recompile, since > >> Py_UNICODE is already a typedef to wchar_t. > >> > > > > That idea is written in the PEP already. > >

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-02 Thread M.-A. Lemburg
On 02.02.2021 00:33, Inada Naoki wrote: > On Tue, Feb 2, 2021 at 12:43 AM M.-A. Lemburg wrote: >> >> Hi Inada-san, >> >> thank you for adding some comments, but they are not really capturing >> what I think is missing: >> >> """ >> Removing these APIs removes ability to use codec without

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Inada Naoki
On Tue, Feb 2, 2021 at 4:28 AM Steve Dower wrote: > > > I'm not defending the choice of wchar_t over UTF-8 (but I can: most of > these systems chose Unicode before UTF-8 was invented and never took the > backwards-incompatible change because they were so popular), but if we > want to

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Inada Naoki
On Tue, Feb 2, 2021 at 12:43 AM M.-A. Lemburg wrote: > > Hi Inada-san, > > thank you for adding some comments, but they are not really capturing > what I think is missing: > > """ > Removing these APIs removes ability to use codec without temporary Unicode. > > Codecs can not encode Unicode

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Steve Dower
On 2/1/2021 5:16 PM, Christian Heimes wrote: On 01/02/2021 17.39, M.-A. Lemburg wrote: Can you explain where wchar_t* type is appropriate and how two conversions is a performance bottleneck? If an extension has a wchar_t* string, it should be easy to convert this in to a Python bytes object

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Paul Moore
On Mon, 1 Feb 2021 at 17:19, Christian Heimes wrote: > How much software actually uses wchar_t these days and interfaces with > Python? Do you have examples for software that uses wchar_t and would > benefit from wchar_t support in Python? This is very much a drive-by comment (I haven't been

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Christian Heimes
On 01/02/2021 17.39, M.-A. Lemburg wrote: >> Can you explain where wchar_t* type is appropriate and how two >> conversions is a performance bottleneck? > > If an extension has a wchar_t* string, it should be easy > to convert this in to a Python bytes object for use in Python. How much software

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Victor Stinner
On Mon, Feb 1, 2021 at 5:58 PM M.-A. Lemburg wrote: > The fix is pretty simple, doesn't add a lot more code and gets > us the symmetry back that I had put into the Unicode C API when > I created this back in 2000. This sounds like a completely different PEP than PEP 624 (which aims to remove

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread M.-A. Lemburg
On 01.02.2021 17:51, Victor Stinner wrote: > On Mon, Feb 1, 2021 at 5:39 PM M.-A. Lemburg wrote: >> The C code is already there, but it got hidden away in the >> Python 3.3 change to new internals. > > Well, we are not in agreement and it's ok. Your objection is written > in the PEP. IMO it's

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Antoine Pitrou
On Mon, 1 Feb 2021 17:39:16 +0100 "M.-A. Lemburg" wrote: > > They should not use Py_UNICODE. > > wchar_t is standard C and is in wide spread use in C code for > storing Unicode data. Do you have any data points about "wide spread use"? I work in C++ daily and don't see any "wide spread use"

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Victor Stinner
On Mon, Feb 1, 2021 at 5:39 PM M.-A. Lemburg wrote: > The C code is already there, but it got hidden away in the > Python 3.3 change to new internals. Well, we are not in agreement and it's ok. Your objection is written in the PEP. IMO it's now up to the Steering Council to decide if the overall

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread M.-A. Lemburg
On 01.02.2021 17:10, Victor Stinner wrote: > On Mon, Feb 1, 2021 at 4:47 PM M.-A. Lemburg wrote: >> At the very least, we should have such APIs for going from wchar_t* >> to a Python object. >> >> The alternatives you provide all require creating an intermediate >> Python object for this purpose.

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Victor Stinner
On Mon, Feb 1, 2021 at 4:47 PM M.-A. Lemburg wrote: > At the very least, we should have such APIs for going from wchar_t* > to a Python object. > > The alternatives you provide all require creating an intermediate > Python object for this purpose. We cannot optimize all use cases. IMO we should

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread M.-A. Lemburg
Hi Inada-san, thank you for adding some comments, but they are not really capturing what I think is missing: """ Removing these APIs removes ability to use codec without temporary Unicode. Codecs can not encode Unicode buffer directly without temporary Unicode object since Python 3.3. All

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-01-21 Thread Inada Naoki
Hi, Lemburg. I want to send the PEP to SC. I think I wrote all your points in the PEP. Would you review it? Regards, On Tue, Aug 4, 2020 at 5:04 PM Inada Naoki wrote: > > On Tue, Aug 4, 2020 at 3:31 PM M.-A. Lemburg wrote: > > > > Hi Inada-san, > > > > thanks for attending EuroPython. I won't

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-08-04 Thread Inada Naoki
On Tue, Aug 4, 2020 at 3:31 PM M.-A. Lemburg wrote: > > Hi Inada-san, > > thanks for attending EuroPython. I won't be back online until > next Wednesday. Would it be possible to wait until then to continue > the discussion ? > Of course. The PEP is for Python 3.11. We have a lot of time. Bests,

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-08-04 Thread M.-A. Lemburg
Hi Inada-san, thanks for attending EuroPython. I won't be back online until next Wednesday. Would it be possible to wait until then to continue the discussion ? Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts >>> Python Projects, Coaching and

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-08-03 Thread Inada Naoki
Hi, Lemburg. Thank you for organizing the EuroPython 2020. I enjoyed watching some sessions from home. I think current PEP 624 covers all your points and ready for Steering Council discussion. Would you like to review the PEP before it? Regards, On Thu, Jul 9, 2020 at 8:19 AM Inada Naoki

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-07-09 Thread Inada Naoki
On Thu, Jul 9, 2020 at 10:13 PM Jim J. Jewett wrote: > > Unless I'm missing something, part of M.-A. Lemburg's objection is: > > 1. The wchar_t type is itself an important interoperability story in C. > (I'm not sure if this includes the ability, at compile time, to define > wchar_t as either

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-07-09 Thread Jim J. Jewett
Unless I'm missing something, part of M.-A. Lemburg's objection is: 1. The wchar_t type is itself an important interoperability story in C. (I'm not sure if this includes the ability, at compile time, to define wchar_t as either of two widths.) 2. The ability to work directly with wchar_t

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-07-08 Thread Inada Naoki
On Thu, Jul 9, 2020 at 5:46 AM M.-A. Lemburg wrote: > - the fact that the encode APIs encoding from a Unicode buffer > to a bytes object; this is an important fact, since the removal > removes access to this codec functionality for extensions > > - PyUnicode_AsEncodedString() is not a proper

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-07-08 Thread M.-A. Lemburg
Hi Inada-san, I am currently too busy with EuroPython to participate in longer discussions. FWIW: I intend to continue after EuroPython. In any case, thanks for writing up the PEP. Could you please add my points about: - the fact that the encode APIs encoding from a Unicode buffer to a bytes

[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2020-07-07 Thread Victor Stinner
Le mar. 7 juil. 2020 à 17:21, Inada Naoki a écrit : > This PEP proposes to remove deprecated ``Py_UNICODE`` encoder APIs in > Python 3.11: Overall, I like the plan. IMHO 3.11 is a reasonable target version, since on the top 4000 projects, only 2 are affected and it is easy to fix them. >