[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-01 Thread Inada Naoki
On Tue, Feb 2, 2021 at 12:23 AM Victor Stinner  wrote:
>
> Hi Inada-san,
>
> I followed the discussions on your different PEP and I like overall
> your latest PEP :-) I have some minor remarks.
>
> On Mon, Feb 1, 2021 at 6:55 AM Inada Naoki  wrote:
> > The warning is disabled by default. New ``-X warn_encoding``
> > command-line option and ``PYTHONWARNENCODING`` environment variable
> > are used to enable the warnings.
>
> Maybe "warn implicit encoding" or "warn omit encoding" (not sure if
> it's make sense written like that in english ;-)) would be more
> explicit.
>

Yes, it's explicit. So I used `PYTHONWARNDEFAULTENCODING` first.
But I feel it's unreadable. That's why I shorten the option name.

I wait to see more feedback about naming.

>
> > Options to enable the warning
> > --
> >
> > ``-X warn_encoding`` option and the ``PYTHONWARNENCODING``
> > environment variable are added. They are used to enable the
> > ``EncodingWarning``.
> >
> > ``sys.flags.encoding_warning`` is also added. The flag represents
> > ``EncodingWarning`` is enabled.
>
> Nitpick: I would prefer using the same name for the -X option and the
> sys.flags attribute (ex: sys.flags.warn_encoding).
>

OK, I will change the flag name same to option name.

>
> > ``encoding="locale"`` option
> > 
> >
> > ``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means
> > same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
> > emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
>
> Can you please define if os.device_encoding(fd) is called if
> encoding="locale" is used? It seems so, so it's not obvious from the
> PEP.
>

OK.

>
> In Python 3.10, I added _locale._get_locale_encoding() function which
> is exactly what the encoding used by open() when no encoding is
> specified (encoding=None) and when os.device_encoding(fd) returns
> None. See _Py_GetLocaleEncoding() for the C implementation
> (Python/fileutils.c).
>
> Maybe we should add a public locale.get_locale_encoding() function? On
> Unix, this function uses nl_langinfo(CODESET) *without* setting
> LC_CTYPE locale to the user preferred locale.
>

I can not imagine any use case. Isn't it just confusing?


> I understand that encoding=locale.get_locale_encoding() would be
> different from encoding="locale":
> encoding=locale.get_locale_encoding() doesn't call
> os.device_encoding(), right?
>

Yes.

>
> Maybe the PEP should also explain (in a "How to teach this" section?)
> when encoding="locale" is better than a specific encoding, like
> encoding="utf-8" or encoding="cp1252". In my experience, it's mostly
> for the inter-operability which other applications which also use the
> current locale encoding.
>

This option is for experts who are publishing cross-platform
libraries, frameworks, etc.

For students, I am suggesting another idea that make UTF-8 mode more accessible.

>
> > Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
> > be used to avoid confusing ``LookupError: unknown encoding: locale``
> > error when the code is run in old Python accidentally.
>
> I'm not sure that it is useful. I like a simple "locale" literal
> string. If there is a constant is io, people may start to think that
> it's specific and will add "import io" just to get the string
> "locale".
>
> I don't think that we should care too much about the error message
> rased by old Python versions.
>

This constant not only for replacing "locale" litera. As example code
in the PEP, it can be used to test wheather TextIOWrapper supports
`encoding="locale"` .

`open(fn, encoding=getattr(io, "LOCALE_ENCODING", None))` works both
for Python ~3.9 and Python 3.10~.


>
>
> > Opt-in warning
> > ---
> >
> > Although ``DeprecationWarning`` is suppressed by default, emitting
> > ``DeprecationWarning`` always when ``encoding`` option is omitted
> > would be too noisy.
>
> The PEP is not very clear. Does "-X warn_encoding" only emits the
> warning, or does it also display it by default? Does it add a warning
> filter for EncodingWarning?
>

This section is not the spec. This section is the rationale for adding
EncodingWarning instead of using DeprecationWarning.

As spec saying, EncodingWarning is a subclass of Warning. So it is
displayed by default. But it is not emitted by default.

When -X encoding_warning (or -X warn_default_encoding) is used, the
warning is emitted and shown unless the user suppresses warnings.

>
> The PEP has no "Backward compatibility" section. Is it possible to
> monkey-patch Python to implement this PEP (maybe only partially) on
> old Python versions? I'm asking to prepare existing projects for
> future EncodingWarning.
>

This PEP doesn't have "backward compatibility" section because the PEP
doesn't break any backward compatibility.
Unless the option is enabled, no warnings are emitted by the PEP, like
`-b` option and BytesWarning.

And if developers want to support Python ~3.9 

[Python-Dev] Re: Constructing expected_opinfo_* lists in test_dis.py

2021-02-01 Thread Mark Shannon

Hi Skip,

On 01/02/2021 9:50 pm, Skip Montanaro wrote:

Guido> Maybe these lines in test_dis.py?
...
Skip> Thanks, I'll take a look. I was expecting there'd be a standalone
Skip> script somewhere. Hadn't considered that comments would be hiding
Skip> code.

Indeed, that did the trick, however... I'm a bit uncomfortable with
the methodology. It seems test_dis is using the same method
(dis.get_instructions) to both generate the expected output and verify
that dis.get_instructions works as expected. For the most part, you
see the test case fails, rerun the code to generate the list,
substitute, et voila! The test (magically) passes. Somewhere along the
way, it seems there should be a way to alert the user that perhaps
dis.get_instructions is broken and its output is not to be trusted
completely.


The problem is not that dis.get_instructions can't be trusted, but that 
the test isn't testing the dis module at all. It is testing whether the 
output from the compiler has changed.

A lot of the tests in test_dis do that.

Cheers,
Mark.



Skip
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FAI7XYMYO3CGKJDU3WBD2AJ6Z6SEDPYD/
Code of Conduct: http://python.org/psf/codeofconduct/


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DDLDPU4SYEGM3QACYEEU6BN6HZGMIDM2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Understanding "is not safe" in typeobject.c

2021-02-01 Thread Greg Ewing

On 2/02/21 12:13 am, Phil Thompson via Python-Dev wrote:

TypeError: object.__new__(B) is not safe, use B.__new__()


It's not safe because object.__new__ doesn't know about any
C-level initialisation that A or B need.

At the C level, there is always a *single* inheritance hierarchy.
The right thing is for B's tp_new to directly call A's tp_new,
which calls object's tp_new.

Don't worry about Python-level multiple inheritance; the
interpreter won't let you create an inheritance structure
that would mess this up.

--
Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IW3OX6Y324VSF4WLQHGA7EFHJQ6XEBH4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Inada Naoki
On Tue, Feb 2, 2021 at 4:28 AM Steve Dower  wrote:
>
>
> I'm not defending the choice of wchar_t over UTF-8 (but I can: most of
> these systems chose Unicode before UTF-8 was invented and never took the
> backwards-incompatible change because they were so popular), but if we
> want to pragmatically weigh the needs of our users above our desire for
> purity, then we should try and support both equally wherever possible.
>

Note that we don't have "utf8 (char*) to Python bytes object" direct
encoder API.
If PEP 624 is accepted, utf8 and wchar_t* become equal.

So please don't think PEP 624 neglect only wchar_t*.

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZZLY6AFXYEQQ7PI6IXRNU3FWQ23MXPZU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Inada Naoki
On Tue, Feb 2, 2021 at 12:43 AM M.-A. Lemburg  wrote:
>
> Hi Inada-san,
>
> thank you for adding some comments, but they are not really capturing
> what I think is missing:
>
> """
> Removing these APIs removes ability to use codec without temporary Unicode.
>
> Codecs can not encode Unicode buffer directly without temporary Unicode
> object since Python 3.3. All these APIs creates temporary Unicode object for
> now. So removing them doesn't reduce any abilities.
> """
>
> The point is that while the decoders allow going from a C object
> to a Python object directly, we are missing a way to do the same
> for the encoders, since the Python 3.3 change in the Unicode internals.
>
> At the very least, we should have such APIs for going from wchar_t*
> to a Python object.

We already have PyUnicode_FromWideChar(). So I assume you mean
"wchar_t* to Python bytes object".

>
> The alternatives you provide all require creating an intermediate
> Python object for this purpose. The APIs you want to remove do that
> as well, but that's not the point. The point is to expose the codecs'
> decode mechanism which is available in the C code, but currently
> not exposed via C APIs, e.g. ucs4lib_utf8_encode().
>
> It would be breaking change, but those APIs in your list could
> simply be changed from using Py_UNICODE to using whcar_t instead
> and then interface directly to the internal functions we have for
> the encoders.
>

OK, I see codecs.h has three encoders.

* utf8_encode
* utf16_encode
* utf32_encode

But there are 13 encoders in my PEP:

PyUnicode_Encode()
PyUnicode_EncodeASCII()
PyUnicode_EncodeLatin1()
PyUnicode_EncodeUTF7()
PyUnicode_EncodeUTF8()
PyUnicode_EncodeUTF16()
PyUnicode_EncodeUTF32()
PyUnicode_EncodeUnicodeEscape()
PyUnicode_EncodeRawUnicodeEscape()
PyUnicode_EncodeCharmap()
PyUnicode_TranslateCharmap()
PyUnicode_EncodeDecimal()
PyUnicode_TransformDecimalToASCII()

Do you want to keep all encoders? or 3 encoders?


> That would keep extensions working after a recompile, since
> Py_UNICODE is already a typedef to wchar_t.
>

That idea is written in the PEP already.
https://www.python.org/dev/peps/pep-0624/#replace-py-unicode-with-wchar-t

Regards,
-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/USUH2YDEXW64NQYGJPG2OOLEJS3NJLXG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Constructing expected_opinfo_* lists in test_dis.py

2021-02-01 Thread Skip Montanaro
Guido> Maybe these lines in test_dis.py?
...
Skip> Thanks, I'll take a look. I was expecting there'd be a standalone
Skip> script somewhere. Hadn't considered that comments would be hiding
Skip> code.

Indeed, that did the trick, however... I'm a bit uncomfortable with
the methodology. It seems test_dis is using the same method
(dis.get_instructions) to both generate the expected output and verify
that dis.get_instructions works as expected. For the most part, you
see the test case fails, rerun the code to generate the list,
substitute, et voila! The test (magically) passes. Somewhere along the
way, it seems there should be a way to alert the user that perhaps
dis.get_instructions is broken and its output is not to be trusted
completely.

Skip
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FAI7XYMYO3CGKJDU3WBD2AJ6Z6SEDPYD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: [python-committers] [ Release ] Python 3.10a5 and release blockers

2021-02-01 Thread Senthil Kumaran
Hi Pablo,

Looks like alpha 5 was scheduled for today.  I am willing to take care of
this issue - https://bugs.python.org/issue42967
The patch is reasonable, but the changes are backwards incompatible.
Since it is with an underlying parsing library, the decision here is tricky
one way or the other.
I will let you know if I decide it for 3.10, an easier decision than
backport. I just have to see how other libraries took care of this issue.

But if for some reason, we couldn't include in -alpha5, we should plan
-alpha6.

Thanks,
Senthil





On Mon, Feb 1, 2021 at 11:48 AM Pablo Galindo Salgado 
wrote:

> Hi everyone,
>
> I am prepared to start the release process for Python 3.10 a5 but there
> are several
> issues marked as release blockers that affect 3.10:
>
> * https://bugs.python.org/issue38302
> * https://bugs.python.org/issue42634
> * https://bugs.python.org/issue41490
> * https://bugs.python.org/issue42967
> * https://bugs.python.org/issue42899
>
> Although release blockers mainly apply to important releases, there are
> two are security issues and some
> of the rest involve changes in bytecode or the tracing machinery, so I
> would prefer if these issues are addressed
> before making a new release as many maintainers are waiting for the next
> alpha to test again the bugs that they
> reported.
>
> Please, if you are involved in any of these issues try to see if is
> possible to address them or if you think
> is ok to release the alpha without a fix, please, drop me an email stating
> so.
>
> Thanks,
>
> Regards from cloudy London,
> Pablo Galindo Salgado
> ___
> python-committers mailing list -- python-committ...@python.org
> To unsubscribe send an email to python-committers-le...@python.org
> https://mail.python.org/mailman3/lists/python-committers.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-committ...@python.org/message/ZVOFYB5K6UZZLQXQCCCWAJNLTMBF5Z63/
> Code of Conduct: https://www.python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CMHC46SZZGJ37UAYWLPG6OAFVXAK3WWF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] [ Release ] Python 3.10a5 and release blockers

2021-02-01 Thread Pablo Galindo Salgado
Hi everyone,

I am prepared to start the release process for Python 3.10 a5 but there are
several
issues marked as release blockers that affect 3.10:

* https://bugs.python.org/issue38302
* https://bugs.python.org/issue42634
* https://bugs.python.org/issue41490
* https://bugs.python.org/issue42967
* https://bugs.python.org/issue42899

Although release blockers mainly apply to important releases, there are two
are security issues and some
of the rest involve changes in bytecode or the tracing machinery, so I
would prefer if these issues are addressed
before making a new release as many maintainers are waiting for the next
alpha to test again the bugs that they
reported.

Please, if you are involved in any of these issues try to see if is
possible to address them or if you think
is ok to release the alpha without a fix, please, drop me an email stating
so.

Thanks,

Regards from cloudy London,
Pablo Galindo Salgado
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZVOFYB5K6UZZLQXQCCCWAJNLTMBF5Z63/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Steve Dower

On 2/1/2021 5:16 PM, Christian Heimes wrote:

On 01/02/2021 17.39, M.-A. Lemburg wrote:

Can you explain where wchar_t* type is appropriate and how two
conversions is a performance bottleneck?


If an extension has a wchar_t* string, it should be easy
to convert this in to a Python bytes object for use in Python.


How much software actually uses wchar_t these days and interfaces with
Python? Do you have examples for software that uses wchar_t and would
benefit from wchar_t support in Python?

I did a quick search for wcslen in all shared libraries and binaries on
my system


Yeah, you searched the wrong kind of system ;)

Pick up a Windows machine, cross-platform code that originated on 
Windows, anything that interoperates with Java or .NET as well, or uses 
wxWidgets.


I'm not defending the choice of wchar_t over UTF-8 (but I can: most of 
these systems chose Unicode before UTF-8 was invented and never took the 
backwards-incompatible change because they were so popular), but if we 
want to pragmatically weigh the needs of our users above our desire for 
purity, then we should try and support both equally wherever possible.


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GYUWANE7IMPU45A257UYQD4ZGUDE6QUX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Understanding "is not safe" in typeobject.c

2021-02-01 Thread Guido van Rossum
That code is quite old. This comment tries to explain it:
```
/* Check that the use doesn't do something silly and unsafe like
   object.__new__(dict). To do this, we check that the
most derived base that's not a heap type is this type. */
```
I think you may have to special-case this and arrange for B.__new__() to be
called, like it or not.

(If you want us to change the code, please file a bpo bug report. I know
that's no fun, but it's the way to get the right people involved.)

On Mon, Feb 1, 2021 at 3:27 AM Phil Thompson via Python-Dev <
python-dev@python.org> wrote:

> Hi,
>
> I'm trying to understand the purpose of the check in tp_new_wrapper() of
> typeobject.c that results in the "is not safe" exception.
>
> I have the following class hierarchy...
>
> B -> A -> object
>
> ...where B and A are implemented in C. Class A has an implementation of
> tp_new which does a few context-specific checks before calling
> PyBaseObject_Type.tp_new() directly to actually create the object. This
> works fine.
>
> However I want to allow class B to be used with a Python mixin. A's
> tp_new() then has to do something similar to super().__new__(). I have
> tried to implement this by locating the type object after A in B's MRO,
> getting it's '__new__' attribute and calling it (using PyObject_Call())
> with B passed as the only argument. However I then get the "is not safe"
> exception, specifically...
>
> TypeError: object.__new__(B) is not safe, use B.__new__()
>
> I take the same approach for __init__() and that works fine.
>
> If I comment out the check in tp_new_wrapper() then everything works
> fine.
>
> So, am I doing something unsafe? If so, what?
>
> Or, is the check at fault in not allowing the case of a C extension type
> with its own tp_new?
>
> Thanks,
> Phil
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/HRGDEMURCJ5DSNEPMQPQR3R7VVDFA4ZX/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/X5EDFSASK7RKYISS7MVMHHYWMRRUSNAM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Paul Moore
On Mon, 1 Feb 2021 at 17:19, Christian Heimes  wrote:
> How much software actually uses wchar_t these days and interfaces with
> Python? Do you have examples for software that uses wchar_t and would
> benefit from wchar_t support in Python?

This is very much a drive-by comment (I haven't been following this
thread) so ignore me if this is already covered, but Windows APIs use
wchar_t extensively. I routinely work with wchar_t when interfacing
Windows API code and Python. But I have no idea what this PEP is
proposing to drop, so as long as someone has ensured that the PEP
won't adversely affect working with Windows APIs, I'm happy.

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/STBXVKV7SB7M55AIL7D34IYKXGTMFWCM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Christian Heimes
On 01/02/2021 17.39, M.-A. Lemburg wrote:
>> Can you explain where wchar_t* type is appropriate and how two
>> conversions is a performance bottleneck?
> 
> If an extension has a wchar_t* string, it should be easy
> to convert this in to a Python bytes object for use in Python.

How much software actually uses wchar_t these days and interfaces with
Python? Do you have examples for software that uses wchar_t and would
benefit from wchar_t support in Python?

I did a quick search for wcslen in all shared libraries and binaries on
my system. It's a good indicator how many programs actually use wchar_t.
126 out of more than 9,000 shared libraries and binaries contain the
string "wcslen". The only hit for PyUnicode_AsWideCharString was
libpypy3-c.so...

(Fedora has unified /usr and /lib64, e.g. /bin -> /usr/bin)

$ ls /usr/bin/ /usr/sbin/ | grep -v python | wc -l
4264
$ grep -R wcslen /usr/bin/ /usr/sbin/ | grep -v python | wc -l
92

$ find /usr/lib64/ -name '*.so' -not -name '*python*' | wc -l
5478
$ find /usr/lib64/ -name '*.so' -not -name '*python*' | xargs grep
wcslen | wc -l
34

Christian
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/M6JC5XCXL4ENTMTFR7SUKM7PDQO5KZPT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Victor Stinner
On Mon, Feb 1, 2021 at 5:58 PM M.-A. Lemburg  wrote:
> The fix is pretty simple, doesn't add a lot more code and gets
> us the symmetry back that I had put into the Unicode C API when
> I created this back in 2000.

This sounds like a completely different PEP than PEP 624 (which aims
to remove code, not add code). I suggest you to propose your own PEP.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VC6E7JMITO27PTYEUFAAD2KOH7BNAWNA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread M.-A. Lemburg
On 01.02.2021 17:51, Victor Stinner wrote:
> On Mon, Feb 1, 2021 at 5:39 PM M.-A. Lemburg  wrote:
>> The C code is already there, but it got hidden away in the
>> Python 3.3 change to new internals.
> 
> Well, we are not in agreement and it's ok. Your objection is written
> in the PEP. IMO it's now up to the Steering Council to decide if the
> overall PEP is ok or not. The PEP itself is now complete and lists
> advantages and drawbacks.

Please read my reply to Inada-san. If the PEP were complete and ok, I
would not have written the email.

The fix is pretty simple, doesn't add a lot more code and gets
us the symmetry back that I had put into the Unicode C API when
I created this back in 2000.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Feb 01 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/P5I3S4KKM3FMIMGQAGO67PPEX5VIEL6X/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Antoine Pitrou
On Mon, 1 Feb 2021 17:39:16 +0100
"M.-A. Lemburg"  wrote:
> 
> They should not use Py_UNICODE.
> 
> wchar_t is standard C and is in wide spread use in C code for
> storing Unicode data.

Do you have any data points about "wide spread use"?

I work in C++ daily and don't see any "wide spread use" of wchar_t (or
its C++ cousin std::wstring). Modern APIs assume bytestrings and UTF-8
encoding.

Regards

Antoine.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QGSPEEYFOYZR6PVPH5NOQWF4HMHVNTP6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Victor Stinner
On Mon, Feb 1, 2021 at 5:39 PM M.-A. Lemburg  wrote:
> The C code is already there, but it got hidden away in the
> Python 3.3 change to new internals.

Well, we are not in agreement and it's ok. Your objection is written
in the PEP. IMO it's now up to the Steering Council to decide if the
overall PEP is ok or not. The PEP itself is now complete and lists
advantages and drawbacks.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VUT2T2VJUFXE57YN4VFHSTHTDWR6MRHP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread M.-A. Lemburg
On 01.02.2021 17:10, Victor Stinner wrote:
> On Mon, Feb 1, 2021 at 4:47 PM M.-A. Lemburg  wrote:
>> At the very least, we should have such APIs for going from wchar_t*
>> to a Python object.
>>
>> The alternatives you provide all require creating an intermediate
>> Python object for this purpose.
> 
> We cannot optimize all use cases. IMO we should only optimize
> conversions between char* and Python object.
> 
> I don't see the need for two conversions (char* => Python and then
> Python => wchar_t*) as an issue if you need wchar_t*.

The C code is already there, but it got hidden away in the
Python 3.3 change to new internals.

All that needs to be done is remove the intermediate Python
Unicode object creation and have those encoder APIs again
interface to the native C code.

> Objects/unicodeobject.c is already very complex with specialization
> for ASCII, Py_UCS1 (latin1), Py_UCS2 and Py_UCS4 kinds: 16k lines of C
> code. I would prefer to make it simpler than more complex.
> 
> Internally, functions like PyUnicode_EncodeLatin1() already do the two
> conversions. So it's not like the PEP has any impact on performance.

Before Python 3.3 all those APIs interfaced directly to the
C codec functions. The introduction of an intermediate Python
Unicode object was just done as quick work-around, even
though it was not really needed, since Python 3.3 did not
remove the C code of the encoders.

>> That would keep extensions working after a recompile, since
>> Py_UNICODE is already a typedef to wchar_t.
> 
> Extensions should not use Py_UNICODE*/wchar_t*.

They should not use Py_UNICODE.

wchar_t is standard C and is in wide spread use in C code for
storing Unicode data. This was one of the main reason for
introducing UCS4 Python versions for Linux in the mid 2000s,
since Linux apps used 4 byte wchar_t as native storage format.

My point is that extensions would just need a recompile
with the change from Py_UNICODE to wchar_t, since Py_UNICODE
and wchar_t are already the same thing in Python 3.3+.

> Can you explain where wchar_t* type is appropriate and how two
> conversions is a performance bottleneck?

If an extension has a wchar_t* string, it should be easy
to convert this in to a Python bytes object for use in Python.

Just like it should be easy to go from a char* string to
a Python str object.

The PEP breaks this symmetry by removing access to the
encoder implementations.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Feb 01 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FSUPT6B26VJT7S6UCW4RYWRQ3LYLUINU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Why aren't we allowing the use of C11?

2021-02-01 Thread Victor Stinner
On Sat, Jan 30, 2021 at 10:37 AM Antoine Pitrou  wrote:
> You can hide the access behind a function call. Slightly more costly,
> but shouldn't be that expensive on modern machines.

Oh sure, on the API side, it can be an "opaque" function call (on the
ABI side) or static inline function, depending if the call can use TLS
or not.

If possible, I would prefer to use a faster static inline function in
most cases ;-)

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/I3T5JHMYJXDWBLAZWLPKG2N6HFS3A3HU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread Victor Stinner
On Mon, Feb 1, 2021 at 4:47 PM M.-A. Lemburg  wrote:
> At the very least, we should have such APIs for going from wchar_t*
> to a Python object.
>
> The alternatives you provide all require creating an intermediate
> Python object for this purpose.

We cannot optimize all use cases. IMO we should only optimize
conversions between char* and Python object.

I don't see the need for two conversions (char* => Python and then
Python => wchar_t*) as an issue if you need wchar_t*.

Objects/unicodeobject.c is already very complex with specialization
for ASCII, Py_UCS1 (latin1), Py_UCS2 and Py_UCS4 kinds: 16k lines of C
code. I would prefer to make it simpler than more complex.

Internally, functions like PyUnicode_EncodeLatin1() already do the two
conversions. So it's not like the PEP has any impact on performance.


> That would keep extensions working after a recompile, since
> Py_UNICODE is already a typedef to wchar_t.

Extensions should not use Py_UNICODE*/wchar_t*.

Can you explain where wchar_t* type is appropriate and how two
conversions is a performance bottleneck?

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/U6V6XWMLPTSNMLDQWRWBVPNTVG6SF5F6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 624: Remove Py_UNICODE encoder APIs

2021-02-01 Thread M.-A. Lemburg
Hi Inada-san,

thank you for adding some comments, but they are not really capturing
what I think is missing:

"""
Removing these APIs removes ability to use codec without temporary Unicode.

Codecs can not encode Unicode buffer directly without temporary Unicode
object since Python 3.3. All these APIs creates temporary Unicode object for
now. So removing them doesn't reduce any abilities.
"""

The point is that while the decoders allow going from a C object
to a Python object directly, we are missing a way to do the same
for the encoders, since the Python 3.3 change in the Unicode internals.

At the very least, we should have such APIs for going from wchar_t*
to a Python object.

The alternatives you provide all require creating an intermediate
Python object for this purpose. The APIs you want to remove do that
as well, but that's not the point. The point is to expose the codecs'
decode mechanism which is available in the C code, but currently
not exposed via C APIs, e.g. ucs4lib_utf8_encode().

It would be breaking change, but those APIs in your list could
simply be changed from using Py_UNICODE to using whcar_t instead
and then interface directly to the internal functions we have for
the encoders.

That would keep extensions working after a recompile, since
Py_UNICODE is already a typedef to wchar_t.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Feb 01 2021)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/



On 22.01.2021 07:47, Inada Naoki wrote:
> Hi, Lemburg.
> 
> I want to send the PEP to SC.
> I think I wrote all your points in the PEP. Would you review it?
> 
> Regards,
> 
> On Tue, Aug 4, 2020 at 5:04 PM Inada Naoki  wrote:
>>
>> On Tue, Aug 4, 2020 at 3:31 PM M.-A. Lemburg  wrote:
>>>
>>> Hi Inada-san,
>>>
>>> thanks for attending EuroPython. I won't be back online until
>>> next Wednesday. Would it be possible to wait until then to continue
>>> the discussion ?
>>>
>>
>> Of course. The PEP is for Python 3.11. We have a lot of time.
>>
>> Bests,
> 
> --
> Inada Naoki  
> 
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VT2J6GC7ED4PTUCU5QO6SLL4PAQ6XEKL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 597: Add optional EncodingWarning

2021-02-01 Thread Victor Stinner
Hi Inada-san,

I followed the discussions on your different PEP and I like overall
your latest PEP :-) I have some minor remarks.

On Mon, Feb 1, 2021 at 6:55 AM Inada Naoki  wrote:
> The warning is disabled by default. New ``-X warn_encoding``
> command-line option and ``PYTHONWARNENCODING`` environment variable
> are used to enable the warnings.

Maybe "warn implicit encoding" or "warn omit encoding" (not sure if
it's make sense written like that in english ;-)) would be more
explicit.


> Options to enable the warning
> --
>
> ``-X warn_encoding`` option and the ``PYTHONWARNENCODING``
> environment variable are added. They are used to enable the
> ``EncodingWarning``.
>
> ``sys.flags.encoding_warning`` is also added. The flag represents
> ``EncodingWarning`` is enabled.

Nitpick: I would prefer using the same name for the -X option and the
sys.flags attribute (ex: sys.flags.warn_encoding).


> ``encoding="locale"`` option
> 
>
> ``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means
> same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
> emit ``EncodingWarning`` when ``encoding="locale"`` is specified.

Can you please define if os.device_encoding(fd) is called if
encoding="locale" is used? It seems so, so it's not obvious from the
PEP.


In Python 3.10, I added _locale._get_locale_encoding() function which
is exactly what the encoding used by open() when no encoding is
specified (encoding=None) and when os.device_encoding(fd) returns
None. See _Py_GetLocaleEncoding() for the C implementation
(Python/fileutils.c).

Maybe we should add a public locale.get_locale_encoding() function? On
Unix, this function uses nl_langinfo(CODESET) *without* setting
LC_CTYPE locale to the user preferred locale.

I understand that encoding=locale.get_locale_encoding() would be
different from encoding="locale":
encoding=locale.get_locale_encoding() doesn't call
os.device_encoding(), right?


Maybe the PEP should also explain (in a "How to teach this" section?)
when encoding="locale" is better than a specific encoding, like
encoding="utf-8" or encoding="cp1252". In my experience, it's mostly
for the inter-operability which other applications which also use the
current locale encoding.


By the way, I recently rewrote the documentation about the encodings
used by Python:

* https://docs.python.org/dev/glossary.html#term-locale-encoding
* https://docs.python.org/dev/glossary.html#term-locale-encoding
* 
https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.filesystem_encoding
* https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.stdio_encoding
* https://docs.python.org/dev/library/os.html#utf8-mode


> Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can
> be used to avoid confusing ``LookupError: unknown encoding: locale``
> error when the code is run in old Python accidentally.

I'm not sure that it is useful. I like a simple "locale" literal
string. If there is a constant is io, people may start to think that
it's specific and will add "import io" just to get the string
"locale".

I don't think that we should care too much about the error message
rased by old Python versions.



> Opt-in warning
> ---
>
> Although ``DeprecationWarning`` is suppressed by default, emitting
> ``DeprecationWarning`` always when ``encoding`` option is omitted
> would be too noisy.

The PEP is not very clear. Does "-X warn_encoding" only emits the
warning, or does it also display it by default? Does it add a warning
filter for EncodingWarning?


The PEP has no "Backward compatibility" section. Is it possible to
monkey-patch Python to implement this PEP (maybe only partially) on
old Python versions? I'm asking to prepare existing projects for
future EncodingWarning.

The main question is if it's possible to use encoding="locale" on
Python 3.6-3.9 (maybe using some ugly hacks). By the way, your PEP has
no target Python version ;-) Do you want to get it in Python 3.10 or
3.11?

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YEYN2ZE4AYWFJJDTQZPJHTCXOPA5MGO5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Constructing expected_opinfo_* lists in test_dis.py

2021-02-01 Thread Skip Montanaro
> Maybe these lines in test_dis.py?
> ```
> #print('expected_opinfo_jumpy = [\n  ',
>   #',\n  '.join(map(str, _instructions)), ',\n]', sep='')
> ```

Thanks, I'll take a look. I was expecting there'd be a standalone
script somewhere. Hadn't considered that comments would be hiding
code.

Skip
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/W7YPHZSIZHZIV7YBVFEJNT6IHCB6L4VW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Understanding "is not safe" in typeobject.c

2021-02-01 Thread Phil Thompson via Python-Dev

Hi,

I'm trying to understand the purpose of the check in tp_new_wrapper() of 
typeobject.c that results in the "is not safe" exception.


I have the following class hierarchy...

B -> A -> object

...where B and A are implemented in C. Class A has an implementation of 
tp_new which does a few context-specific checks before calling 
PyBaseObject_Type.tp_new() directly to actually create the object. This 
works fine.


However I want to allow class B to be used with a Python mixin. A's 
tp_new() then has to do something similar to super().__new__(). I have 
tried to implement this by locating the type object after A in B's MRO, 
getting it's '__new__' attribute and calling it (using PyObject_Call()) 
with B passed as the only argument. However I then get the "is not safe" 
exception, specifically...


TypeError: object.__new__(B) is not safe, use B.__new__()

I take the same approach for __init__() and that works fine.

If I comment out the check in tp_new_wrapper() then everything works 
fine.


So, am I doing something unsafe? If so, what?

Or, is the check at fault in not allowing the case of a C extension type 
with its own tp_new?


Thanks,
Phil
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HRGDEMURCJ5DSNEPMQPQR3R7VVDFA4ZX/
Code of Conduct: http://python.org/psf/codeofconduct/