[Python-Dev] Re: Making code object APIs unstable

2021-09-02 Thread Petr Viktorin



On 01. 09. 21 22:28, Guido van Rossum wrote:

I apologize, I keep making the same mistake.

The PyCode_New[WithPosArgs] functions are *not* in the stable ABI or in 
the limited API, so there's no need to petition the SC, nor do I need 
Petr's approval.


We may be bound by backwards compatibility for the *cpython* API, but I 
think that if Cython is okay if we just break this we should be fine. 
Users of the CPython API are expected to recompile for each new version, 
and if someone were to be using these functions with the old set of 
parameters the compiler would give them an error.


The cpython CPI is still covered by the backwards compatibility policy 
(PEP 387). You do need to ask the SC to skip the two-year deprecation 
period.


I don't see an issue with the exception being granted, but I do think it 
should be rubber-stamped as a project-wide decision.



So let's just choose (E) and d*mn backwards compatibility for these two 
functions.


That means:
- Get rid of PyCode_NewWithPosArgs altogether
- PyCode_New becomes unstable (and gets a new posinlyargcount argument)


... but still remains available and documented, just with a note that it 
may change in minor versions. Right?



On Wed, Sep 1, 2021 at 11:52 AM Guido van Rossum > wrote:


(context)

Guido van Rossum schrieb am 13.08.21 um 19:24:
 > In 3.11 we're changing a lot of details about code objects.
Part of this is
 > the "Faster CPython" work, part of it is other things (e.g.
PEP 657 -- Fine
 > Grained Error Locations in Tracebacks).
 >
 > As a result, the set of fields of the code object is
changing. This is
 > fine, the structure is part of the internal API anyway.
 >
 > But there's a problem with two public API functions,
PyCode_New() and
 > PyCode_NewWithPosArgs(). As we have them in the main (3.11)
branch, their
 > signatures are incompatible with previous versions, and they
have to be
 > since the set of values needed to create a code object is
different. (The
 > types.CodeType constructor signature is also changed, and so
is its
 > replace() method, but these aren't part of any stable API).
 >
 > Unfortunately, PyCode_New() and PyCode_NewWithPosArgs() are
part of the PEP
 > 387 stable ABI. What should we do?
 >
 > A. We could deprecate them, keep (restore) their old
signatures, and create
 > crippled code objects (no exception table, no endline/column
tables,
 > qualname defaults to name).
 >
 > B. We could deprecate them, restore the old signatures, and
always raise an
 > error when they are called.
 >
 > C. We could just delete them.
 >
 > D. We could keep them, with modified signatures, and to heck
with ABI
 > compatibility for these two.
 >
 > E. We could get rid of PyCode_NewWithPosArgs(), update
PyCode() to add the
 > posonlyargcount (which is the only difference between the
two), and d*mn
 > the torpedoes.
 >
 > F. Like (E), but keep PyCode_NewWithPosArgs() as an alias for
PyCode_New()
 > (and deprecate it).
 >
 > If these weren't part of the stable ABI, I'd choose (E). [...]


On Tue, Aug 31, 2021 at 7:07 PM Stefan Behnel mailto:[email protected]>> wrote:

I also vote for (E). The creation of a code object is tied to
interpreter
internals and thus shouldn't be (or have been) declared stable.


I think you're one of the few people who call those functions, and
if even you think it's okay to break backward compatibility here, I
think we should just talk to the SC to be absolved of having these
two in the stable ABI. (Petr, do you agree? Without your backing I
don't feel comfortable even asking for this.)

I think the only problem with that argument is that code objects
are
required for frames. You could argue the same way about frames,
but then it
becomes really tricky to, you know, create frames for non-Python
code.


Note there's nothing in the stable ABI to create frames. There are
only functions to *get* an existing frame, to inspect a frame, and
to eval it. In any case even if there was a stable ABI function to
create a frame from a code object, one could argue that it's
sufficient to be able to get an existing code object from e.g. a
function object.

Since we're discussing this in the context of PEP 657, I wonder
if there's
a better way to create tracebacks from C code, other than
creating fake
frames with fake code objects.

Cython uses code objects and frames for the following use cases:

 

[Python-Dev] Re: Making code object APIs unstable

2021-09-02 Thread Antoine Pitrou
On Thu, 2 Sep 2021 13:31:32 +1200
Greg Ewing  wrote:
> On 2/09/21 4:46 am, Victor Stinner wrote:
> > If creating a fake frame is a common use case, we can maybe write a
> > public C API for that. For example, I saw parser injecting frames to
> > show the file name and line number of the parsed file in the
> > traceback.  
> 
> The way I would like to see this addressed is to make it possible
> to attach a filename and line number directly to a traceback object,
> without needing a frame or code object at all.

Tracebacks are linked in a single direction, to go the other direction
you need to walk the frames attached to the traceback.  If there is no
frame on the traceback, you cannot go the other direction.

So a (fake or not) frame object is still desirable, IMHO.

Regards

Antoine.


___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/GJH33BQBHXMC6PGTHDY7TOQDRTCRXCCV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Making code object APIs unstable

2021-09-02 Thread Greg Ewing

On 2/09/21 7:46 pm, Antoine Pitrou wrote:

Tracebacks are linked in a single direction, to go the other direction
you need to walk the frames attached to the traceback. 


So a (fake or not) frame object is still desirable, IMHO.


Could we at least remove the necessity for a fake code object?

--
Greg

___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/TDLCJHNQSPNE7UXEJ33PV2VNQOPUFUT7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP-535 (Rich comparison chaining) Discussion?

2021-09-02 Thread Angus Hollands
Thanks Nick,

I am strongly advocating for the sentiment of PEP 535, but I have not given 
strong thought to the implementation details established in PEP 532. I'll read 
through 532 properly, and come back with some thoughts.
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/NBDMSFG3EHXYZO4CDGSEWVHQMG7YWD35/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Making code object APIs unstable

2021-09-02 Thread Guido van Rossum
FWIW I've applied for an exception from the two-release deprecation policy
from the SC:
https://github.com/python/steering-council/issues/75

On Thu, Sep 2, 2021 at 1:12 AM Greg Ewing 
wrote:

> On 2/09/21 7:46 pm, Antoine Pitrou wrote:
> > Tracebacks are linked in a single direction, to go the other direction
> > you need to walk the frames attached to the traceback.
> >
> > So a (fake or not) frame object is still desirable, IMHO.
>
> Could we at least remove the necessity for a fake code object?
>
> --
> Greg
>
> ___
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/TDLCJHNQSPNE7UXEJ33PV2VNQOPUFUT7/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/25JN4NTFL5VAEB4F3RJO2RV6MQGQBXEB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Discrepancy between what aiter() and `async for` requires on purpose?

2021-09-02 Thread Guido van Rossum
First of all, we should ping Yury, who implemented `async for` about 6
years ago (see PEP 492), and Joshua Bronson, who implemented aiter() and
anext() about 5 months ago (see https://bugs.python.org/issue31861). I've
CC'ed them here.

My own view:

A. iter() doesn't check that the thing returned implements __next__,
because it's not needed -- iterators having an __iter__ methor is a
convention, not a requirement. You shouldn't implement __iter__ returning
something that doesn't implement __iter__ itself, because then "for x in
iter(a)" would fail even though "for x in a" works. But you get an error,
and anyone who implements something like that (or uses it) deserves what
they get. People know about this convention and the ABC enforces it, so in
practice it will be very rare that someone gets bitten by this.

B. aiter() shouldn't need to check either, for exactly the same reason. I
*suspect* (but do not know) that the extra check for the presence of
__iter__ is simply an attempt by the implementer to enforce the convention.
There is no *need* other than ensuring that "async for x in aiter(a)" works
when "async for x in a" works.

Note that PEP 525, which defines async generators, seems to imply that an
__aiter__ returning self is always necessary, but I don't think it gives a
reason.

I do notice there's some backwards compatibility issue related to
__aiter__, alluded to in both PEP 492 (
https://www.python.org/dev/peps/pep-0492/#api-design-and-implementation-revisions)
and PEP 525 (
https://www.python.org/dev/peps/pep-0525/#aiter-and-anext-builtins). So
it's *possible* that it has to do with this (maybe really old code
implementing the 3.5 version of __aiter__ would be caught out by the extra
check) but I don't think it is. Hopefully Yury and/or Joshua remembers?

FWIW I don't think there are any optimizations that avoid calling __iter__
or __aiter__ if __next__ or __anext__ is present. And certainly I wouldn't
endorse adding them (this would seem an ad-hoc optimization that could
break user expectations unexpectedly, quite apart from the issue discussed
here).

--Guido

On Wed, Sep 1, 2021 at 4:11 PM Nick Coghlan  wrote:

> On Tue, 31 Aug 2021, 2:52 am Brett Cannon,  wrote:
>
>>
>> On Sun, Aug 29, 2021 at 2:01 PM Serhiy Storchaka 
>> wrote:
>>
>>>
>>> > So my question is whether the discrepancy between what `async for`
>>> > expects and what `aiter()` expects on purpose?
>>> > https://bugs.python.org/issue31861 >> >
>>> > was the issue for creating aiter() and I didn't notice a discussion of
>>> > this difference. The key reason I'm asking is this does cause a
>>> > deviation compared to the relationship between `for` and `iter()`
>>> (which
>>> > does not require `__iter__` to be defined on the iterator, although
>>> > collections.abc.Iterator does). It also makes the glossary definition
>>> > being linked from
>>> >
>>> https://docs.python.org/3.10/reference/compound_stmts.html#the-async-for-statement
>>> > <
>>> https://docs.python.org/3.10/reference/compound_stmts.html#the-async-for-statement
>>> >
>>> > incorrect.
>>>
>>> PyIter_Check() only checks existence of __next__, not __iter__ (perhaps
>>> for performance reasons).
>>>
>>
>> Or maybe no one thought to require __iter__ for iterators?
>>
>
> I don't think PyIter_Check is testing the formal definition of an
> iterator, I think it's just testing if calling __iter__ can be skipped (as
> you say, for performance reasons).
>
> I'm surprised iter() would skip calling __iter__ just because an object
> defines __next__, though. Even though "__iter__ is defined and returns
> self" is part of the iterator definition, it still feels like a leap from
> there to "if __next__ is defined, skip calling __iter__ in iter()".
>
> The optimisation that bypasses the __[a]iter__ method call feels more
> legitimate in the actual for loop syntax, it just feels odd to me if the
> builtin isn't forcing the call.
>
> Cheers,
> Nick.
>
>>
>
>
>
>>
> ___
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/5UMLDQ4CANKY4WM6RNG67AEJMQI44X42/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/JLMASMTRHZS4CFBV4WKBXE6OT4A2AWNE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Discrepancy between what aiter() and `async for` requires on purpose?

2021-09-02 Thread Yury Selivanov
Comments inlined:

On Thu, Sep 2, 2021 at 6:23 PM Guido van Rossum  wrote:

> First of all, we should ping Yury, who implemented `async for` about 6
> years ago (see PEP 492), and Joshua Bronson, who implemented aiter() and
> anext() about 5 months ago (see https://bugs.python.org/issue31861). I've
> CC'ed them here.
>

Looks like PyAiter_Check was added along with the aiter/anext builtins. I
agree it's unnecessary to check for __aiter__ in it, so I let's just fix it.



>
> My own view:
>
> A. iter() doesn't check that the thing returned implements __next__,
> because it's not needed -- iterators having an __iter__ methor is a
> convention, not a requirement.
>

Yeah.


> You shouldn't implement __iter__ returning something that doesn't
> implement __iter__ itself, because then "for x in iter(a)" would fail even
> though "for x in a" works. But you get an error, and anyone who implements
> something like that (or uses it) deserves what they get. People know about
> this convention and the ABC enforces it, so in practice it will be very
> rare that someone gets bitten by this.
>
> B. aiter() shouldn't need to check either, for exactly the same reason. I
> *suspect* (but do not know) that the extra check for the presence of
> __iter__ is simply an attempt by the implementer to enforce the convention.
> There is no *need* other than ensuring that "async for x in aiter(a)" works
> when "async for x in a" works.
>

I agree.

>
> Note that PEP 525, which defines async generators, seems to imply that an
> __aiter__ returning self is always necessary, but I don't think it gives a
> reason.
>

PEP 525 implies that specifically for asynchronous generators, not
iterators. That's due to the fact that synchronous generators return self
from their __iter__.

>
> I do notice there's some backwards compatibility issue related to
> __aiter__, alluded to in both PEP 492 (
> https://www.python.org/dev/peps/pep-0492/#api-design-and-implementation-revisions)
> and PEP 525 (
> https://www.python.org/dev/peps/pep-0525/#aiter-and-anext-builtins). So
> it's *possible* that it has to do with this (maybe really old code
> implementing the 3.5 version of __aiter__ would be caught out by the extra
> check) but I don't think it is. Hopefully Yury and/or Joshua remembers?
>

That wasn't related.

In the first iteration of PEP 492, __aiter__ was required to be a
coroutine. Some time after shipping 3.5.0 I realized that that would
complicate asynchronous generators for no reason (and I think there were
also some bigger problems than just complicating them). So I updated the
PEP to change __aiter__ return type from `Awaitable[AsyncIterator]` to
`AsyncIterator`. ceval code was changed to call __aiter__ and see if the
object that it returned had __anext__. If not, it tried to await on it.

Bottom line: let's fix PyAiter_Check to only look for __anext__. It's a new
function so we can still fix it to reflect PyIter_Check and not worry about
anything.

Yury
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/BRHMOFPEKGQCCKEKEEKGSYDR6NOPMRCC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] A better way to freeze modules

2021-09-02 Thread Gregory Szorc
Over in https://bugs.python.org/issue45020 there is some exciting work
around expanding the use of the frozen importer to speed up Python
interpreter startup. I wholeheartedly support the effort and don't want to
discourage progress in this area.

Simultaneously, I've been down this path before with PyOxidizer and feel
like I have some insight to share.

I don't think I'll be offending anyone by saying the existing CPython
frozen importer is quite primitive in terms of functionality: it does the
minimum it needs to do to support importing module bytecode embedded in the
interpreter binary [for purposes of bootstrapping the Python-based
importlib modules]. The C struct representing frozen modules is literally
just the module name and a pointer to a sized buffer containing bytecode.

In issue45020 there is talk of enhancing the functionality of the frozen
importer to support its potential broader use. For example, setting
__file__ or exposing .__loader__.get_source(). I support the overall
initiative.

However, introducing enhanced functionality of the frozen importer will at
the C level require either:

a) backwards incompatible changes to the C API to support additional
metadata on frozen modules (or at the very least a supplementary API that
fragments what a "frozen" module is).
b) CPython only hacks to support additional functionality for "freezing"
the standard library for purposes of speeding up startup.

I'm not a CPython core developer, but neither "a" nor "b" seem ideal to me.
"a" is backwards incompatible. "b" seems like a stop-gap solution until a
more generic version is available outside the CPython standard library.

>From my experience with PyOxidizer and software in general, here is what I
think is going to happen:

1. CPython enhances the frozen importer to be usable in more situations.
2. Python programmers realize this solution has performance and
ease-of-distribution wins and want to use it more.
3. Limitations in the frozen importer are found. Bugs are reported. Feature
requests are made.
4. The frozen importer keeps getting incrementally extended or Python
developers grow frustrated that its enhancements are only available to the
standard library. You end up slowly reimplementing the importing mechanism
in C (remember Python 2?) or disappoint users.

Rather than extending the frozen importer, I would suggest considering an
alternative solution that is far more useful to the long-term success of
Python: I would consider building a fully-featured, generic importer that
is capable of importing modules and resource data from a well-defined and
portable serialization format / data structure that isn't defined by C
structs and APIs.

Instead of defining module bytecode (and possible additional minimal
metadata) in C structs in a frozen modules array (or an equivalent C API),
what if we instead defined a serialization format for representing the
contents of loadable Python data (module source, module bytecode, resource
files, extension module library data, etc)? We could then point the Python
interpreter at instances of this data structure (in memory or in files) so
it could import/load the resources within using a meta path importer.

What if this serialization format were designed so that it was extremely
efficient to parse and imports could be serviced with the same trivially
minimal overhead that the frozen importer currently has? We could embed
these data structures in produced binaries and achieve the same desirable
results we'll be getting in issue45020 all while delivering a more generic
solution.

What if this serialization format were portable across machines? The entire
Python ecosystem could leverage it as a container format for distributing
Python resources. Rather than splatting dozens or hundreds of files on the
filesystem, you could write a single file with all of a package's
resources. Bugs around filesystem implementation details such as case
(in)sensitivity and Unicode normalization go away. Package installs are
quicker. Run-time performance is better due to faster imports.

(OK, maybe that last point brings back bad memories of eggs and you
instinctively reject the idea. Or you have concerns about development
ergonomics when module source code isn't in standalone editable files.
These are fair points!)

What if the Python interpreter gains an "app mode" where it is capable of
being paired with a single "resources file" and running the application
within? Think running zip applications today, but a bit faster, more
tailored to Python, and more fully featured.

What if an efficient binary serialization format could be leveraged as a
cache to speed up subsequent interpreter startups?

These were all considerations on my mind in the early days of PyOxidizer
when I realized that the frozen importer and zip importers were lacking the
features I desired and I would need to find an alternative solution.

One thing led to another and I have incrementally developed the "Python
packed 

[Python-Dev] Re: A better way to freeze modules

2021-09-02 Thread Guido van Rossum
Quick reaction: This feels like a bait and switch to me. Also, there are
many advantages to using a standard format like zip (many formats are
really zip with some conventions). Finally, the bytecode format you are
using is “marshal”, and is fully portable — as is zip.

On Thu, Sep 2, 2021 at 21:44 Gregory Szorc  wrote:

> Over in https://bugs.python.org/issue45020 there is some exciting work
> around expanding the use of the frozen importer to speed up Python
> interpreter startup. I wholeheartedly support the effort and don't want to
> discourage progress in this area.
>
> Simultaneously, I've been down this path before with PyOxidizer and feel
> like I have some insight to share.
>
> I don't think I'll be offending anyone by saying the existing CPython
> frozen importer is quite primitive in terms of functionality: it does the
> minimum it needs to do to support importing module bytecode embedded in the
> interpreter binary [for purposes of bootstrapping the Python-based
> importlib modules]. The C struct representing frozen modules is literally
> just the module name and a pointer to a sized buffer containing bytecode.
>
> In issue45020 there is talk of enhancing the functionality of the frozen
> importer to support its potential broader use. For example, setting
> __file__ or exposing .__loader__.get_source(). I support the overall
> initiative.
>
> However, introducing enhanced functionality of the frozen importer will at
> the C level require either:
>
> a) backwards incompatible changes to the C API to support additional
> metadata on frozen modules (or at the very least a supplementary API that
> fragments what a "frozen" module is).
> b) CPython only hacks to support additional functionality for "freezing"
> the standard library for purposes of speeding up startup.
>
> I'm not a CPython core developer, but neither "a" nor "b" seem ideal to
> me. "a" is backwards incompatible. "b" seems like a stop-gap solution until
> a more generic version is available outside the CPython standard library.
>
> From my experience with PyOxidizer and software in general, here is what I
> think is going to happen:
>
> 1. CPython enhances the frozen importer to be usable in more situations.
> 2. Python programmers realize this solution has performance and
> ease-of-distribution wins and want to use it more.
> 3. Limitations in the frozen importer are found. Bugs are reported.
> Feature requests are made.
> 4. The frozen importer keeps getting incrementally extended or Python
> developers grow frustrated that its enhancements are only available to the
> standard library. You end up slowly reimplementing the importing mechanism
> in C (remember Python 2?) or disappoint users.
>
> Rather than extending the frozen importer, I would suggest considering an
> alternative solution that is far more useful to the long-term success of
> Python: I would consider building a fully-featured, generic importer that
> is capable of importing modules and resource data from a well-defined and
> portable serialization format / data structure that isn't defined by C
> structs and APIs.
>
> Instead of defining module bytecode (and possible additional minimal
> metadata) in C structs in a frozen modules array (or an equivalent C API),
> what if we instead defined a serialization format for representing the
> contents of loadable Python data (module source, module bytecode, resource
> files, extension module library data, etc)? We could then point the Python
> interpreter at instances of this data structure (in memory or in files) so
> it could import/load the resources within using a meta path importer.
>
> What if this serialization format were designed so that it was extremely
> efficient to parse and imports could be serviced with the same trivially
> minimal overhead that the frozen importer currently has? We could embed
> these data structures in produced binaries and achieve the same desirable
> results we'll be getting in issue45020 all while delivering a more generic
> solution.
>
> What if this serialization format were portable across machines? The
> entire Python ecosystem could leverage it as a container format for
> distributing Python resources. Rather than splatting dozens or hundreds of
> files on the filesystem, you could write a single file with all of a
> package's resources. Bugs around filesystem implementation details such as
> case (in)sensitivity and Unicode normalization go away. Package installs
> are quicker. Run-time performance is better due to faster imports.
>
> (OK, maybe that last point brings back bad memories of eggs and you
> instinctively reject the idea. Or you have concerns about development
> ergonomics when module source code isn't in standalone editable files.
> These are fair points!)
>
> What if the Python interpreter gains an "app mode" where it is capable of
> being paired with a single "resources file" and running the application
> within? Think running zip applications today, but a bit faster