Re: [Python-Dev] PEP 590 discussion

2019-04-27 Thread Mark Shannon

Hi Petr,

On 24/04/2019 11:24 pm, Petr Viktorin wrote:

On 4/10/19 7:05 PM, Jeroen Demeyer wrote:

On 2019-04-10 18:25, Petr Viktorin wrote:

Hello!
I've had time for a more thorough reading of PEP 590 and the reference
implementation. Thank you for the work!


And thank you for the review!


I'd now describe the fundamental
difference between PEP 580 and PEP 590 as:
- PEP 580 tries to optimize all existing calling conventions
- PEP 590 tries to optimize (and expose) the most general calling
convention (i.e. fastcall)


And PEP 580 has better performance overall, even for METH_FASTCALL. 
See this thread:

https://mail.python.org/pipermail/python-dev/2019-April/156954.html

Since these PEPs are all about performance, I consider this a very 
relevant argument in favor of PEP 580.


All about performance as well as simplicity, correctness, testability, 
teachability... And PEP 580 touches some introspection :)



PEP 580 also does a number of other things, as listed in PEP 579. But I
think PEP 590 does not block future PEPs for the other items.
On the other hand, PEP 580 has a much more mature implementation -- and
that's where it picked up real-world complexity.

About complexity, please read what I wrote in
https://mail.python.org/pipermail/python-dev/2019-March/156853.html

I claim that the complexity in the protocol of PEP 580 is a good 
thing, as it removes complexity from other places, in particular from 
the users of the protocol (better have a complex protocol that's 
simple to use, rather than a simple protocol that's complex to use).


I think we're talking past each other. I see now it as:

PEP 580 takes existing complexity and makes it available to all users, 
in a simpler way. It makes existing code faster.


PEP 590 defines a new simple/fast protocol for its users, and instead of 
making existing complexity faster and easier to use, it's left to be 
deprecated/phased out (or kept in existing classes for backwards 
compatibility). It makes it possible for future code to be faster/simpler.


I think things should be simple by default, but if people want some 
extra performance, they can opt in to some extra complexity.



As a more concrete example of the simplicity that PEP 580 could bring, 
CPython currently has 2 classes for bound methods implemented in C:

- "builtin_function_or_method" for normal C methods
- "method-descriptor" for slot wrappers like __eq__ or __add__

With PEP 590, these classes would need to stay separate to get maximal 
performance. With PEP 580, just one class for bound methods would be 
sufficient and there wouldn't be any performance loss. And this 
extends to custom third-party function/method classes, for example as 
implemented by Cython.


Yet, for backwards compatibility reasons, we can't merge the classes.
Also, I think CPython and Cython are exactly the users that can trade 
some extra complexity for better performance.



Jeroen's analysis from
https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems
to miss a step at the top:

a. CALL_FUNCTION* / CALL_METHOD opcode
   calls
b. _PyObject_FastCallKeywords()
   which calls
c. _PyCFunction_FastCallKeywords()
   which calls
d. _PyMethodDef_RawFastCallKeywords()
   which calls
e. the actual C function (*ml_meth)()

I think it's more useful to say that both PEPs bridge a->e (via
_Py_VectorCall or PyCCall_Call).


Not quite. For a builtin_function_or_method, we have with PEP 580:

a. call_function()
 calls
d. PyCCall_FastCall
 which calls
e. the actual C function

and with PEP 590 it's more like:

a. call_function()
 calls
c. _PyCFunction_FastCallKeywords
 which calls
d. _PyMethodDef_RawFastCallKeywords
 which calls
e. the actual C function

Level c. above is the vectorcall wrapper, which is a level that PEP 
580 doesn't have.


PEP 580 optimizes all the code paths, where PEP 590 optimizes the fast 
path, and makes sure most/all use cases can use (or switch to) the fast 
path. > Both fast paths are fast: bridging a->e using zero-copy arg passing with

some C calls and flag checks.

The PEP 580 approach is faster; PEP 590's is simpler.


Why do you say that PEP 580's approach is faster? There is no evidence 
for this.
The only evidence so far is a couple of contrived benchmarks. Jeroen's 
showed a ~1% speedup for PEP 580 and mine showed a ~30% speed up for PEP 
590.
This clearly shows that I am better and coming up with contrived 
benchmarks :)


PEP 590 was chosen as the fastest protocol I could come up with that was 
fully general, and wasn't so complex as to be unusable.






Jeroen, is there something in PEPs 579/580 that PEP 590 blocks, or
should address?


Well, PEP 580 is an extensible protocol while PEP 590 is not. But, 
PyTypeObject is extensible, so even with PEP 590 one can always extend 
that (for example, PEP 590 uses a type flag 
Py_TPFLAGS_METHOD_DESCRIPTOR where PEP 580 instead uses the structs 
for the C call protocol). But I guess that extending 

Re: [Python-Dev] PEP 590 discussion

2019-04-27 Thread Mark Shannon

Hi Jeroen,

On 15/04/2019 9:38 am, Jeroen Demeyer wrote:

On 2019-04-14 13:30, Mark Shannon wrote:

PY_VECTORCALL_ARGUMENTS_OFFSET exists so that callables that make onward
calls with an additional argument can do so efficiently. The obvious
example is bound-methods, but classes are at least as important.
cls(*args) -> cls.new(cls, *args) -> cls.__init__(self, *args)


But tp_new and tp_init take the "cls" and "self" as separate arguments, 
not as part of *args. So I don't see why you need 
PY_VECTORCALL_ARGUMENTS_OFFSET for this.


Here's some (untested) code for an implementation of vectorcall for 
object subtypes implemented in Python. It uses 
PY_VECTORCALL_ARGUMENTS_OFFSET to save memory allocation when calling 
the __init__ method.


https://github.com/python/cpython/commit/9ff46e3ba0747f386f9519933910d63d5caae6ee#diff-c3cf251f16d5a03a9e7d4639f2d6f998R3820

Cheers,
Mark.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 590 discussion

2019-04-26 Thread Jeroen Demeyer

On 2019-04-25 23:11, Petr Viktorin wrote:

My thoughts are not the roadmap, of course :)


I asked about methods because we should aware of the consequences when 
choosing between PEP 580 and PEP 590 (or some compromise). There are 
basically 3 different ways of dealing with bound methods:


(A) put methods inside the protocol. This is PEP 580 and my 580/590 
compromise proposal. The disadvantage here is complexity in the protocol.


(B) don't put methods inside the protocol and use a single generic 
method class types.MethodType. This is the status-quo for Python 
functions. It has the disadvantage of being slightly slower: there is an 
additional level of indirection when calling a bound method object.


(C) don't put methods inside the protocol but use multiple method 
classes, one for every function class. This is the status-quo for 
functions implemented in C. This has the disadvantage of code duplication.


I think that the choice between PEP 580 or 590 should be done together 
with a choice of one of the above options. For example, I really don't 
like the code duplication of (C), so I would prefer PEP 590 with (B) 
over PEP 590 with (C).

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 590 discussion

2019-04-25 Thread Petr Viktorin

On 4/25/19 5:12 AM, Jeroen Demeyer wrote:

On 2019-04-25 00:24, Petr Viktorin wrote:

PEP 590 defines a new simple/fast protocol for its users, and instead of
making existing complexity faster and easier to use, it's left to be
deprecated/phased out (or kept in existing classes for backwards
compatibility). It makes it possible for future code to be 
faster/simpler.


Can you elaborate on what you mean with this deprecating/phasing out?


Kept for backwards compatibility, but not actively recommended or 
optimized. Perhaps made slower if that would help performance elsewhere.



What's your view on dealing with method classes (not necessarily right 
now, but in the future)? Do you think that having separate method 
classes like method-wrapper (for example [].__add__) is good or bad?


I fully agree with PEP 579's point on complexity:

There are a huge number of classes involved to implement all variations of 
methods. This is not a problem by itself, but a compounding issue.


The main problem is that, currently, you sometimes need to care about 
this (due to CPython special casing its own classes, without fallback to 
some public API). Ideally, what matters is the protocols the class 
implements rather than the class itself. If that is solved, having so 
many different classes becomes curious but unimportant -- merging them 
shouldn't be a priority.


I'd concentrate on two efforts instead:

- Calling should have a fast public API. (That's this PEP.)
- Introspection should have well-defined, consistently used public API 
(but not necessarily fast).


For introspection, I think the way is implementing the necessary API 
(e.g. dunder attributes) and changing things like inspect, traceback 
generation, etc. to use them. CPython's callable classes should stay as 
internal implementation details. (Specifically: I'm against making them 
subclassable: allowing subclasses basically makes everything about the 
superclass an API.)


Since the way how PEP 580 and PEP 590 deal with bound method classes is 
very different, I would like to know the roadmap for this.


My thoughts are not the roadmap, of course :)


Speaking about roadmaps, I often use PEP 579 to check what I'm 
forgetting. Here are my thoughts on it:



## Naming (The word "built-in" is overused in Python)

This is a social/docs problem, and out of scope of the technical 
efforts. PEPs should always define the terms they use (even in the case 
where there is an official definition, but it doesn't match popular usage).



## Not extendable

As I mentioned above, I'm against opening the callables for subclassing. 
We should define and use protocols instead.



## cfunctions do not become methods

If we were designing Python from scratch, this should have been done 
differently.
Now this is a problem for Cython to solve. CPython should provide the 
tools to do so.



## Semantics of inspect.isfunction

I don't like inspect.isfunction, because "Is it a function?" is almost 
never what you actually want to ask. I'd like to deprecate it in favor 
of explicit functions like "Does it have source code?", "Is it 
callable?", or even "Is it exactly types.FunctionType?".
But I'm against changing its behavior -- people are expecting the 
current answer.



## C functions should have access to the function object

That's where my stake in all this is; I want to move on with PEP 573 
after 580/590 is sorted out.



## METH_FASTCALL is private and undocumented

This is the intersection of PEP 580 and 590.


## Allowing native C arguments

This would be a very experimental feature. Argument Clinic itself is not 
intended for public use, locking its "impl" functions as part of public 
API is off the table at this point.
Cython's cpdef allows this nicely, and CPython's API is full of C 
functions. That should be good enough good for now.



## Complexity

We should simpify, but I think the number of callable classes is not the 
best metric to focus on.



## PyMethodDef is too limited

This is a valid point. But the PyMethodDef array is little more than a 
shortcut to creating methods directly in a loop. The immediate 
workaround could be to create a new constructor for methods. Then we can 
look into expressing the data declaratively again.



## Slot wrappers have no custom documentation

I think this can now be done with a new custom slot wrapper class. 
Perhaps that can be added to CPython when it matures.



## Static methods and class methods should be callable

This is a valid, though minor, point. I don't event think it would be a 
PEP-level change.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 590 discussion

2019-04-25 Thread Jeroen Demeyer

On 2019-04-25 00:24, Petr Viktorin wrote:

PEP 590 defines a new simple/fast protocol for its users, and instead of
making existing complexity faster and easier to use, it's left to be
deprecated/phased out (or kept in existing classes for backwards
compatibility). It makes it possible for future code to be faster/simpler.


Can you elaborate on what you mean with this deprecating/phasing out?

What's your view on dealing with method classes (not necessarily right 
now, but in the future)? Do you think that having separate method 
classes like method-wrapper (for example [].__add__) is good or bad?


Since the way how PEP 580 and PEP 590 deal with bound method classes is 
very different, I would like to know the roadmap for this.



Jeroen.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 590 discussion

2019-04-24 Thread Petr Viktorin

On 4/10/19 7:05 PM, Jeroen Demeyer wrote:

On 2019-04-10 18:25, Petr Viktorin wrote:

Hello!
I've had time for a more thorough reading of PEP 590 and the reference
implementation. Thank you for the work!


And thank you for the review!


I'd now describe the fundamental
difference between PEP 580 and PEP 590 as:
- PEP 580 tries to optimize all existing calling conventions
- PEP 590 tries to optimize (and expose) the most general calling
convention (i.e. fastcall)


And PEP 580 has better performance overall, even for METH_FASTCALL. See 
this thread:

https://mail.python.org/pipermail/python-dev/2019-April/156954.html

Since these PEPs are all about performance, I consider this a very 
relevant argument in favor of PEP 580.


All about performance as well as simplicity, correctness, testability, 
teachability... And PEP 580 touches some introspection :)



PEP 580 also does a number of other things, as listed in PEP 579. But I
think PEP 590 does not block future PEPs for the other items.
On the other hand, PEP 580 has a much more mature implementation -- and
that's where it picked up real-world complexity.

About complexity, please read what I wrote in
https://mail.python.org/pipermail/python-dev/2019-March/156853.html

I claim that the complexity in the protocol of PEP 580 is a good thing, 
as it removes complexity from other places, in particular from the users 
of the protocol (better have a complex protocol that's simple to use, 
rather than a simple protocol that's complex to use).


I think we're talking past each other. I see now it as:

PEP 580 takes existing complexity and makes it available to all users, 
in a simpler way. It makes existing code faster.


PEP 590 defines a new simple/fast protocol for its users, and instead of 
making existing complexity faster and easier to use, it's left to be 
deprecated/phased out (or kept in existing classes for backwards 
compatibility). It makes it possible for future code to be faster/simpler.


I think things should be simple by default, but if people want some 
extra performance, they can opt in to some extra complexity.



As a more concrete example of the simplicity that PEP 580 could bring, 
CPython currently has 2 classes for bound methods implemented in C:

- "builtin_function_or_method" for normal C methods
- "method-descriptor" for slot wrappers like __eq__ or __add__

With PEP 590, these classes would need to stay separate to get maximal 
performance. With PEP 580, just one class for bound methods would be 
sufficient and there wouldn't be any performance loss. And this extends 
to custom third-party function/method classes, for example as 
implemented by Cython.


Yet, for backwards compatibility reasons, we can't merge the classes.
Also, I think CPython and Cython are exactly the users that can trade 
some extra complexity for better performance.



Jeroen's analysis from
https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems
to miss a step at the top:

a. CALL_FUNCTION* / CALL_METHOD opcode
   calls
b. _PyObject_FastCallKeywords()
   which calls
c. _PyCFunction_FastCallKeywords()
   which calls
d. _PyMethodDef_RawFastCallKeywords()
   which calls
e. the actual C function (*ml_meth)()

I think it's more useful to say that both PEPs bridge a->e (via
_Py_VectorCall or PyCCall_Call).


Not quite. For a builtin_function_or_method, we have with PEP 580:

a. call_function()
     calls
d. PyCCall_FastCall
     which calls
e. the actual C function

and with PEP 590 it's more like:

a. call_function()
     calls
c. _PyCFunction_FastCallKeywords
     which calls
d. _PyMethodDef_RawFastCallKeywords
     which calls
e. the actual C function

Level c. above is the vectorcall wrapper, which is a level that PEP 580 
doesn't have.


PEP 580 optimizes all the code paths, where PEP 590 optimizes the fast 
path, and makes sure most/all use cases can use (or switch to) the fast 
path.
Both fast paths are fast: bridging a->e using zero-copy arg passing with 
some C calls and flag checks.


The PEP 580 approach is faster; PEP 590's is simpler.



Jeroen, is there something in PEPs 579/580 that PEP 590 blocks, or
should address?


Well, PEP 580 is an extensible protocol while PEP 590 is not. But, 
PyTypeObject is extensible, so even with PEP 590 one can always extend 
that (for example, PEP 590 uses a type flag Py_TPFLAGS_METHOD_DESCRIPTOR 
where PEP 580 instead uses the structs for the C call protocol). But I 
guess that extending PyTypeObject will be harder to justify (say, in a 
future PEP) than extending the C call protocol.


That's a good point.


Also, it's explicitly allowed for users of the PEP 580 protocol to 
extend the PyCCallDef structure with custom fields. But I don't have a 
concrete idea of whether that will be useful.


Unless I'm missing something, that would be effectively the same as 
extending their own instance struct. To bring any benefits, the extended 
PyCCallDef would need to be standardized in a PEP.



Re: [Python-Dev] PEP 590 discussion

2019-04-24 Thread Petr Viktorin
Hi Mark! See my more general reply; here I'll just tie loose ends with a 
few +1s.


On 4/14/19 7:30 AM, Mark Shannon wrote:

On 10/04/2019 5:25 pm, Petr Viktorin wrote:

[...]
PEP 590 is built on a simple idea, formalizing fastcall. But it is 
complicated by PY_VECTORCALL_ARGUMENTS_OFFSET and 
Py_TPFLAGS_METHOD_DESCRIPTOR.
As far as I understand, both are there to avoid intermediate 
bound-method object for LOAD_METHOD/CALL_METHOD. (They do try to be 
general, but I don't see any other use case.)

Is that right?


Not quite.
Py_TPFLAGS_METHOD_DESCRIPTOR is for LOAD_METHOD/CALL_METHOD, it allows 
any callable descriptor to benefit from the LOAD_METHOD/CALL_METHOD 
optimisation.


PY_VECTORCALL_ARGUMENTS_OFFSET exists so that callables that make onward 
calls with an additional argument can do so efficiently. The obvious 
example is bound-methods, but classes are at least as important.

cls(*args) -> cls.new(cls, *args) -> cls.__init__(self, *args)


I see. Thanks!

(I'm running out of time today, but I'll write more on why I'm asking, 
and on the case I called "impossible" (while avoiding creation of a 
"bound method" object), later.)


Let me drop this thread; I stand corrected.

Another point I'd like some discussion on is that vectorcall function 
pointer is per-instance. It looks this is only useful for type 
objects, but it will add a pointer to every new-style callable object 
(including functions). That seems wasteful.
Why not have a per-type pointer, and for types that need it (like 
PyTypeObject), make it dispatch to an instance-specific function?


Firstly, each callable has different behaviour, so it makes sense to be 
able to do the dispatch from caller to callee in one step. Having a 
per-object function pointer allows that.
Secondly, callables are either large or transient. If large, then the 
extra few bytes makes little difference. If transient then, it matters 
even less.
The total increase in memory is likely to be only a few tens of 
kilobytes, even for a large program.


That makes sense.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 590 discussion

2019-04-16 Thread Jeroen Demeyer

On 2019-04-03 07:33, Jeroen Demeyer wrote:

Access to the class isn't possible currently and also not with PEP 590.
But it's easy enough to fix that: PEP 573 adds a new METH_METHOD flag to
change the signature of the C function (not the vectorcall wrapper). PEP
580 supports this "out of the box" because I'm reusing the class also to
do type checks. But this shouldn't be an argument for or against either
PEP.


Actually, in the answer above I only considered "is implementing PEP 573 
possible?" but I did not consider the complexity of doing that. And in 
line with what I claimed about complexity before, I think that PEP 580 
scores better in this regard.


Take PEP 580 and assume for the sake of argument that it didn't already 
have the cc_parent field. Then adding support for PEP 573 is easy: just 
add the cc_parent field to the C call protocol structure and set that 
field when initializing a method_descriptor. C functions can use the 
METH_DEFARG flag to get access to the PyCCallDef structure, which gives 
cc_parent. Implementing PEP 573 for a custom function class takes no 
extra effort: it doesn't require any changes to that class, except for 
correctly initializing the cc_parent field. Since PEP 580 has built-in 
support for methods, nothing special needs to be done to support methods 
too.


With PEP 590 on the other hand, every single class which is involved in 
PEP 573 must be changed and every single vectorcall wrapper supporting 
PEP 573 must be changed. This is not limited to the function class 
itself, also the corresponding method class (for example, 
builtin_function_or_method for method_descriptor) needs to be changed.



Jeroen
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 590 discussion

2019-04-15 Thread Jeroen Demeyer

On 2019-04-14 13:30, Mark Shannon wrote:

PY_VECTORCALL_ARGUMENTS_OFFSET exists so that callables that make onward
calls with an additional argument can do so efficiently. The obvious
example is bound-methods, but classes are at least as important.
cls(*args) -> cls.new(cls, *args) -> cls.__init__(self, *args)


But tp_new and tp_init take the "cls" and "self" as separate arguments, 
not as part of *args. So I don't see why you need 
PY_VECTORCALL_ARGUMENTS_OFFSET for this.



The updated minimal implementation now uses `const` arguments.
Code that uses args[-1] must explicitly cast away the const.
https://github.com/markshannon/cpython/blob/vectorcall-minimal/Objects/classobject.c#L55


That's better indeed.


Jeroen.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 590 discussion

2019-04-14 Thread Mark Shannon

Hi, Petr

On 10/04/2019 5:25 pm, Petr Viktorin wrote:

Hello!
I've had time for a more thorough reading of PEP 590 and the reference 
implementation. Thank you for the work!
Overall, I like PEP 590's direction. I'd now describe the fundamental 
difference between PEP 580 and PEP 590 as:

- PEP 580 tries to optimize all existing calling conventions
- PEP 590 tries to optimize (and expose) the most general calling 
convention (i.e. fastcall)


PEP 580 also does a number of other things, as listed in PEP 579. But I 
think PEP 590 does not block future PEPs for the other items.
On the other hand, PEP 580 has a much more mature implementation -- and 
that's where it picked up real-world complexity.


PEP 590's METH_VECTORCALL is designed to handle all existing use cases, 
rather than mirroring the existing METH_* varieties.
But both PEPs require the callable's code to be modified, so requiring 
it to switch calling conventions shouldn't be a problem.


Jeroen's analysis from 
https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems 
to miss a step at the top:


a. CALL_FUNCTION* / CALL_METHOD opcode
   calls
b. _PyObject_FastCallKeywords()
   which calls
c. _PyCFunction_FastCallKeywords()
   which calls
d. _PyMethodDef_RawFastCallKeywords()
   which calls
e. the actual C function (*ml_meth)()

I think it's more useful to say that both PEPs bridge a->e (via 
_Py_VectorCall or PyCCall_Call).



PEP 590 is built on a simple idea, formalizing fastcall. But it is 
complicated by PY_VECTORCALL_ARGUMENTS_OFFSET and 
Py_TPFLAGS_METHOD_DESCRIPTOR.
As far as I understand, both are there to avoid intermediate 
bound-method object for LOAD_METHOD/CALL_METHOD. (They do try to be 
general, but I don't see any other use case.)

Is that right?


Not quite.
Py_TPFLAGS_METHOD_DESCRIPTOR is for LOAD_METHOD/CALL_METHOD, it allows 
any callable descriptor to benefit from the LOAD_METHOD/CALL_METHOD 
optimisation.


PY_VECTORCALL_ARGUMENTS_OFFSET exists so that callables that make onward 
calls with an additional argument can do so efficiently. The obvious 
example is bound-methods, but classes are at least as important.

cls(*args) -> cls.new(cls, *args) -> cls.__init__(self, *args)

(I'm running out of time today, but I'll write more on why I'm asking, 
and on the case I called "impossible" (while avoiding creation of a 
"bound method" object), later.)



The way `const` is handled in the function signatures strikes me as too 
fragile for public API.
I'd like if, as much as possible, PY_VECTORCALL_ARGUMENTS_OFFSET was 
treated as a special optimization that extension authors can either opt 
in to, or blissfully ignore.

That might mean:
- vectorcall, PyObject_VectorCallWithCallable, PyObject_VectorCall, 
PyCall_MakeTpCall all formally take "PyObject *const *args"
- a naïve callee must do "nargs &= ~PY_VECTORCALL_ARGUMENTS_OFFSET" 
(maybe spelled as "nargs &= PY_VECTORCALL_NARGS_MASK"), but otherwise 
writes compiler-enforced const-correct code.
- if PY_VECTORCALL_ARGUMENTS_OFFSET is set, the callee may modify 
"args[-1]" (and only that, and after the author has read the docs).


The updated minimal implementation now uses `const` arguments.
Code that uses args[-1] must explicitly cast away the const.
https://github.com/markshannon/cpython/blob/vectorcall-minimal/Objects/classobject.c#L55




Another point I'd like some discussion on is that vectorcall function 
pointer is per-instance. It looks this is only useful for type objects, 
but it will add a pointer to every new-style callable object (including 
functions). That seems wasteful.
Why not have a per-type pointer, and for types that need it (like 
PyTypeObject), make it dispatch to an instance-specific function?


Firstly, each callable has different behaviour, so it makes sense to be 
able to do the dispatch from caller to callee in one step. Having a 
per-object function pointer allows that.
Secondly, callables are either large or transient. If large, then the 
extra few bytes makes little difference. If transient then, it matters 
even less.
The total increase in memory is likely to be only a few tens of 
kilobytes, even for a large program.





Minor things:
- "Continued prohibition of callable classes as base classes" -- this 
section reads as a final. Would you be OK wording this as something 
other PEPs can tackle?
- "PyObject_VectorCall" -- this looks extraneous, and the reference 
imlementation doesn't need it so far. Can it be removed, or justified?


Yes, removing it makes sense. I can then rename the clumsily named 
"PyObject_VectorCallWithCallable" as "PyObject_VectorCall".


- METH_VECTORCALL is *not* strictly "equivalent to the currently 
undocumented METH_FASTCALL | METH_KEYWORD flags" (it has the 
ARGUMENTS_OFFSET complication).


METH_VECTORCALL is just making METH_FASTCALL | METH_KEYWORD documented 
and public.
Would you prefer that it has a different name to prevent confusion with 
over PY_VECTORCALL_ARGUMENTS_OFFSET?


I 

Re: [Python-Dev] PEP 590 discussion

2019-04-11 Thread Brett Cannon
On Thu, Apr 11, 2019 at 5:06 AM Jeroen Demeyer  wrote:

> Petr,
>
> I realize that you are in a difficult position. You'll end up
> disappointing either me or Mark...
>
> I don't know if the steering council or somebody else has a good idea to
> deal with this situation.
>

Our answer was "ask Petr to be BDFL Delegate". ;)

In all seriousness, none of us on the council or as well equipped as Petr
to handle this tough decision, else it would take even longer for us to
learn enough to make an informed decision and we would be even worse off.

-Brett


>
> > Jeroen has time
>
> Speaking of time, maybe I should clarify that I have time until the end
> of August: I am working for the OpenDreamKit grant, which allows me to
> work basically full-time on open source software development but that
> ends at the end of August.
>
> > Here again, I mostly want to know if the details are there for deeper
> > reasons, or just points to polish.
>
> I would say: mostly shallow details.
>
> The subclassing thing would be good to resolve, but I don't see any
> difference between PEP 580 and PEP 590 there. In PEP 580, I wrote a
> strategy for dealing with subclassing. I believe that it works and that
> exactly the same idea would work for PEP 590 too. Of course, I may be
> overlooking something...
>
> > I don't have good general experience with premature extensibility, so
> > I'd not count this as a plus.
>
> Fair enough. I also see it more as a "nice to have", not as a big plus.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 590 discussion

2019-04-11 Thread Jeroen Demeyer

Petr,

I realize that you are in a difficult position. You'll end up 
disappointing either me or Mark...


I don't know if the steering council or somebody else has a good idea to 
deal with this situation.



Jeroen has time


Speaking of time, maybe I should clarify that I have time until the end 
of August: I am working for the OpenDreamKit grant, which allows me to 
work basically full-time on open source software development but that 
ends at the end of August.



Here again, I mostly want to know if the details are there for deeper
reasons, or just points to polish.


I would say: mostly shallow details.

The subclassing thing would be good to resolve, but I don't see any 
difference between PEP 580 and PEP 590 there. In PEP 580, I wrote a 
strategy for dealing with subclassing. I believe that it works and that 
exactly the same idea would work for PEP 590 too. Of course, I may be 
overlooking something...



I don't have good general experience with premature extensibility, so
I'd not count this as a plus.


Fair enough. I also see it more as a "nice to have", not as a big plus.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 590 discussion

2019-04-11 Thread Petr Viktorin

On 4/11/19 1:05 AM, Jeroen Demeyer wrote:

On 2019-04-10 18:25, Petr Viktorin wrote:

Hello!
I've had time for a more thorough reading of PEP 590 and the reference
implementation. Thank you for the work!


And thank you for the review!


One general note: I am not (yet) choosing between PEP 580 and PEP 590.
I am not looking for arguments for/against whole PEPs, but individual 
ideas which, I believe, can still be mixed & matched.


I see the situation this way:
- I get about one day per week when I can properly concentrate on 
CPython. It's frustrating to be the bottleneck.
- Jeroen has time, but it would frustrating to work on something that 
will later be discarded, and it's frustrating to not be able to move the 
project forward.
- Mark has good ideas, but seems to lack the time to polish them, or 
even test out if they are good. It is probably frustrating to see 
unpolished ideas rejected.


I'm looking for ways to reduce the frustration, given where we are.


Jeroen, thank you for the comments. Apologies for not having the time to 
reply to all of them properly right now.


Mark, if you could find the time to answer (even just a few of the 
points), it would be great. I ask you to share/clarify your thoughts, 
not defend your PEP.




I'd now describe the fundamental
difference between PEP 580 and PEP 590 as:
- PEP 580 tries to optimize all existing calling conventions
- PEP 590 tries to optimize (and expose) the most general calling
convention (i.e. fastcall)


And PEP 580 has better performance overall, even for METH_FASTCALL. See 
this thread:

https://mail.python.org/pipermail/python-dev/2019-April/156954.html

Since these PEPs are all about performance, I consider this a very 
relevant argument in favor of PEP 580.



PEP 580 also does a number of other things, as listed in PEP 579. But I
think PEP 590 does not block future PEPs for the other items.
On the other hand, PEP 580 has a much more mature implementation -- and
that's where it picked up real-world complexity.

About complexity, please read what I wrote in
https://mail.python.org/pipermail/python-dev/2019-March/156853.html

I claim that the complexity in the protocol of PEP 580 is a good thing, 
as it removes complexity from other places, in particular from the users 
of the protocol (better have a complex protocol that's simple to use, 
rather than a simple protocol that's complex to use).


Sadly, I need more time on this than I have today; I'll get back to it 
next week.


As a more concrete example of the simplicity that PEP 580 could bring, 
CPython currently has 2 classes for bound methods implemented in C:

- "builtin_function_or_method" for normal C methods
- "method-descriptor" for slot wrappers like __eq__ or __add__

With PEP 590, these classes would need to stay separate to get maximal 
performance. With PEP 580, just one class for bound methods would be 
sufficient and there wouldn't be any performance loss. And this extends 
to custom third-party function/method classes, for example as 
implemented by Cython.



PEP 590's METH_VECTORCALL is designed to handle all existing use cases,
rather than mirroring the existing METH_* varieties.
But both PEPs require the callable's code to be modified, so requiring
it to switch calling conventions shouldn't be a problem.


Agreed.


Jeroen's analysis from
https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems
to miss a step at the top:

a. CALL_FUNCTION* / CALL_METHOD opcode
   calls
b. _PyObject_FastCallKeywords()
   which calls
c. _PyCFunction_FastCallKeywords()
   which calls
d. _PyMethodDef_RawFastCallKeywords()
   which calls
e. the actual C function (*ml_meth)()

I think it's more useful to say that both PEPs bridge a->e (via
_Py_VectorCall or PyCCall_Call).


Not quite. For a builtin_function_or_method, we have with PEP 580:

a. call_function()
     calls
d. PyCCall_FastCall
     which calls
e. the actual C function

and with PEP 590 it's more like:

a. call_function()
     calls
c. _PyCFunction_FastCallKeywords
     which calls
d. _PyMethodDef_RawFastCallKeywords
     which calls
e. the actual C function

Level c. above is the vectorcall wrapper, which is a level that PEP 580 
doesn't have.


Again, I'll get back to this next week.


The way `const` is handled in the function signatures strikes me as too
fragile for public API.


That's a detail which shouldn't influence the acceptance of either PEP.


True.
I guess what I want from the answer is to know how much thought went 
into const handling: is what's in the PEP an initial draft, or does it 
solve some hidden issue?



Why not have a per-type pointer, and for types that need it (like
PyTypeObject), make it dispatch to an instance-specific function?


That would be exactly https://bugs.python.org/issue29259

I'll let Mark comment on this.


Minor things:
- "Continued prohibition of callable classes as base classes" -- this
section reads as a final. Would you be OK wording this as 

Re: [Python-Dev] PEP 590 discussion

2019-04-10 Thread Jeroen Demeyer

On 2019-04-10 18:25, Petr Viktorin wrote:

Hello!
I've had time for a more thorough reading of PEP 590 and the reference
implementation. Thank you for the work!


And thank you for the review!


I'd now describe the fundamental
difference between PEP 580 and PEP 590 as:
- PEP 580 tries to optimize all existing calling conventions
- PEP 590 tries to optimize (and expose) the most general calling
convention (i.e. fastcall)


And PEP 580 has better performance overall, even for METH_FASTCALL. See 
this thread:

https://mail.python.org/pipermail/python-dev/2019-April/156954.html

Since these PEPs are all about performance, I consider this a very 
relevant argument in favor of PEP 580.



PEP 580 also does a number of other things, as listed in PEP 579. But I
think PEP 590 does not block future PEPs for the other items.
On the other hand, PEP 580 has a much more mature implementation -- and
that's where it picked up real-world complexity.

About complexity, please read what I wrote in
https://mail.python.org/pipermail/python-dev/2019-March/156853.html

I claim that the complexity in the protocol of PEP 580 is a good thing, 
as it removes complexity from other places, in particular from the users 
of the protocol (better have a complex protocol that's simple to use, 
rather than a simple protocol that's complex to use).


As a more concrete example of the simplicity that PEP 580 could bring, 
CPython currently has 2 classes for bound methods implemented in C:

- "builtin_function_or_method" for normal C methods
- "method-descriptor" for slot wrappers like __eq__ or __add__

With PEP 590, these classes would need to stay separate to get maximal 
performance. With PEP 580, just one class for bound methods would be 
sufficient and there wouldn't be any performance loss. And this extends 
to custom third-party function/method classes, for example as 
implemented by Cython.



PEP 590's METH_VECTORCALL is designed to handle all existing use cases,
rather than mirroring the existing METH_* varieties.
But both PEPs require the callable's code to be modified, so requiring
it to switch calling conventions shouldn't be a problem.


Agreed.


Jeroen's analysis from
https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems
to miss a step at the top:

a. CALL_FUNCTION* / CALL_METHOD opcode
   calls
b. _PyObject_FastCallKeywords()
   which calls
c. _PyCFunction_FastCallKeywords()
   which calls
d. _PyMethodDef_RawFastCallKeywords()
   which calls
e. the actual C function (*ml_meth)()

I think it's more useful to say that both PEPs bridge a->e (via
_Py_VectorCall or PyCCall_Call).


Not quite. For a builtin_function_or_method, we have with PEP 580:

a. call_function()
calls
d. PyCCall_FastCall
which calls
e. the actual C function

and with PEP 590 it's more like:

a. call_function()
calls
c. _PyCFunction_FastCallKeywords
which calls
d. _PyMethodDef_RawFastCallKeywords
which calls
e. the actual C function

Level c. above is the vectorcall wrapper, which is a level that PEP 580 
doesn't have.



The way `const` is handled in the function signatures strikes me as too
fragile for public API.


That's a detail which shouldn't influence the acceptance of either PEP.


Why not have a per-type pointer, and for types that need it (like
PyTypeObject), make it dispatch to an instance-specific function?


That would be exactly https://bugs.python.org/issue29259

I'll let Mark comment on this.


Minor things:
- "Continued prohibition of callable classes as base classes" -- this
section reads as a final. Would you be OK wording this as something
other PEPs can tackle?
- "PyObject_VectorCall" -- this looks extraneous, and the reference
imlementation doesn't need it so far. Can it be removed, or justified?
- METH_VECTORCALL is *not* strictly "equivalent to the currently
undocumented METH_FASTCALL | METH_KEYWORD flags" (it has the
ARGUMENTS_OFFSET complication).
- I'd like to officially call this PEP "Vectorcall", see
https://github.com/python/peps/pull/984


Those are indeed details which shouldn't influence the acceptance of 
either PEP. If you go with PEP 590, then we should discuss this further.



Mark, what are your plans for next steps with PEP 590? If a volunteer
wanted to help you push this forward, what would be the best thing to
work on?


Personally, I think what we need now is a decision between PEP 580 and 
PEP 590 (there is still the possibility of rejecting both but I really 
hope that this won't happen). There is a lot of work that still needs to 
be done after either PEP is accepted, such as:

- finish and merge the reference implementation
- document everything
- use the protocol in more classes where it makes sense (for example, 
staticmethod, wrapper_descriptor)

- use this in Cython
- handle more issues from PEP 579

I volunteer to put my time into this, regardless of which PEP is 
accepted. Of course, I still think that PEP 580 is better, but I also 
want this 

Re: [Python-Dev] PEP 590 discussion

2019-04-10 Thread Petr Viktorin

Hello!
I've had time for a more thorough reading of PEP 590 and the reference 
implementation. Thank you for the work!
Overall, I like PEP 590's direction. I'd now describe the fundamental 
difference between PEP 580 and PEP 590 as:

- PEP 580 tries to optimize all existing calling conventions
- PEP 590 tries to optimize (and expose) the most general calling 
convention (i.e. fastcall)


PEP 580 also does a number of other things, as listed in PEP 579. But I 
think PEP 590 does not block future PEPs for the other items.
On the other hand, PEP 580 has a much more mature implementation -- and 
that's where it picked up real-world complexity.


PEP 590's METH_VECTORCALL is designed to handle all existing use cases, 
rather than mirroring the existing METH_* varieties.
But both PEPs require the callable's code to be modified, so requiring 
it to switch calling conventions shouldn't be a problem.


Jeroen's analysis from 
https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems 
to miss a step at the top:


a. CALL_FUNCTION* / CALL_METHOD opcode
  calls
b. _PyObject_FastCallKeywords()
  which calls
c. _PyCFunction_FastCallKeywords()
  which calls
d. _PyMethodDef_RawFastCallKeywords()
  which calls
e. the actual C function (*ml_meth)()

I think it's more useful to say that both PEPs bridge a->e (via 
_Py_VectorCall or PyCCall_Call).



PEP 590 is built on a simple idea, formalizing fastcall. But it is 
complicated by PY_VECTORCALL_ARGUMENTS_OFFSET and 
Py_TPFLAGS_METHOD_DESCRIPTOR.
As far as I understand, both are there to avoid intermediate 
bound-method object for LOAD_METHOD/CALL_METHOD. (They do try to be 
general, but I don't see any other use case.)

Is that right?
(I'm running out of time today, but I'll write more on why I'm asking, 
and on the case I called "impossible" (while avoiding creation of a 
"bound method" object), later.)



The way `const` is handled in the function signatures strikes me as too 
fragile for public API.
I'd like if, as much as possible, PY_VECTORCALL_ARGUMENTS_OFFSET was 
treated as a special optimization that extension authors can either opt 
in to, or blissfully ignore.

That might mean:
- vectorcall, PyObject_VectorCallWithCallable, PyObject_VectorCall, 
PyCall_MakeTpCall all formally take "PyObject *const *args"
- a naïve callee must do "nargs &= ~PY_VECTORCALL_ARGUMENTS_OFFSET" 
(maybe spelled as "nargs &= PY_VECTORCALL_NARGS_MASK"), but otherwise 
writes compiler-enforced const-correct code.
- if PY_VECTORCALL_ARGUMENTS_OFFSET is set, the callee may modify 
"args[-1]" (and only that, and after the author has read the docs).



Another point I'd like some discussion on is that vectorcall function 
pointer is per-instance. It looks this is only useful for type objects, 
but it will add a pointer to every new-style callable object (including 
functions). That seems wasteful.
Why not have a per-type pointer, and for types that need it (like 
PyTypeObject), make it dispatch to an instance-specific function?



Minor things:
- "Continued prohibition of callable classes as base classes" -- this 
section reads as a final. Would you be OK wording this as something 
other PEPs can tackle?
- "PyObject_VectorCall" -- this looks extraneous, and the reference 
imlementation doesn't need it so far. Can it be removed, or justified?
- METH_VECTORCALL is *not* strictly "equivalent to the currently 
undocumented METH_FASTCALL | METH_KEYWORD flags" (it has the 
ARGUMENTS_OFFSET complication).
- I'd like to officially call this PEP "Vectorcall", see 
https://github.com/python/peps/pull/984




Mark, what are your plans for next steps with PEP 590? If a volunteer 
wanted to help you push this forward, what would be the best thing to 
work on?


Jeroen, is there something in PEPs 579/580 that PEP 590 blocks, or 
should address?

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 590 discussion

2019-04-02 Thread Jeroen Demeyer

In one of the ways to call C functions in PEP 580, the function gets
access to:
- the arguments,
- "self", the object
- the class that the method was found in (which is not necessarily
type(self))
I still have to read the details, but when combined with
LOAD_METHOD/CALL_METHOD optimization (avoiding creation of a "bound
method" object), it seems impossible to do this efficiently with just
the callable's code and callable's object.


It is possible, and relatively straightforward.


Access to the class isn't possible currently and also not with PEP 590. 
But it's easy enough to fix that: PEP 573 adds a new METH_METHOD flag to 
change the signature of the C function (not the vectorcall wrapper). PEP 
580 supports this "out of the box" because I'm reusing the class also to 
do type checks. But this shouldn't be an argument for or against either PEP.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 590 discussion

2019-04-02 Thread Mark Shannon

Hi,

On 02/04/2019 1:49 pm, Petr Viktorin wrote:

On 3/30/19 11:36 PM, Jeroen Demeyer wrote:

On 2019-03-30 17:30, Mark Shannon wrote:

2. The claim that PEP 580 allows "certain optimizations because other
code can make assumptions" is flawed. In general, the caller cannot make
assumptions about the callee or vice-versa. Python is a dynamic 
language.


PEP 580 is meant for extension classes, not Python classes. Extension 
classes are not dynamic. When you implement tp_call in a given way, 
the user cannot change it. So if a class implements the C call 
protocol or the vectorcall protocol, callers can make assumptions 
about what that means.



PEP 579 is mainly a list of supposed flaws with the
'builtin_function_or_method' class.
The general thrust of PEP 579 seems to be that builtin-functions and
builtin-methods should be more flexible and extensible than they are. I
don't agree. If you want different behaviour, then use a different
object. Don't try an cram all this extra behaviour into a pre-existing
object.


I think that there is a misunderstanding here. I fully agree with the 
"use a different object" solution. This isn't a new solution: it's 
already possible to implement those different objects (Cython does 
it). It's just that this solution comes at a performance cost and 
that's what we want to avoid.


It does seem like there is some misunderstanding.

PEP 580 defines a CCall structure, which includes the function pointer, 
flags, "self" and "parent". Like the current implementation, it has 
various METH_ flags for various C signatures. When called, the info from 
CCall is matched up (in relatively complex ways) to what the C function 
expects.


PEP 590 only adds the "vectorcall". It does away with flags and only has 
one C signatures, which is designed to fit all the existing ones, and is 
well optimized. Storing the "self"/"parent", and making sure they're 
passed to the C function is the responsibility of the callable object.
There's an optimization for "self" (offsetting using 
PY_VECTORCALL_ARGUMENTS_OFFSET), and any supporting info can be provided 
as part of "self". >

I'll reiterate that PEP 590 is more general than PEP 580 and that once
the callable's code has access to the callable object (as both PEPs
allow) then anything is possible. You can't can get more extensible than
that.


Anything is possible, but if one of the possibilities becomes common and 
useful, PEP 590 would make it hard to optimize for it.
Python has grown many "METH_*" signatures over the years as we found 
more things that need to be passed to callables. Why would 
"METH_VECTORCALL" be the last? If it won't (if you think about it as one 
more way to call functions), then dedicating a tp_* slot to it sounds 
quite expensive.


I doubt METH_VECTORCALL will be the last.
Let me give you an example: It is quite common for a function to take 
two arguments, so we might want add a METH_OO flag for builtin-functions 
with 2 parameters.


To support this in PEP 590, you would make exactly the same change as 
you would now; which is to add another case to the switch statement in 
_PyCFunction_FastCallKeywords.

For PEP 580, you would add another case to the switch in PyCCall_FastCall.

No difference really.

PEP 580 uses a slot as well. It's only 8 bytes per class.




In one of the ways to call C functions in PEP 580, the function gets 
access to:

- the arguments,
- "self", the object
- the class that the method was found in (which is not necessarily 
type(self))
I still have to read the details, but when combined with 
LOAD_METHOD/CALL_METHOD optimization (avoiding creation of a "bound 
method" object), it seems impossible to do this efficiently with just 
the callable's code and callable's object.


It is possible, and relatively straightforward.
Why do you think it is impossible?




I would argue the opposite: PEP 590 defines a fixed protocol that is 
not easy to extend. PEP 580 on the other hand uses a new data 
structure PyCCallDef which could easily be extended in the future 
(this will intentionally never be part of the stable ABI, so we can do 
that).


I have also argued before that the generality of PEP 590 is a bad 
thing rather than a good thing: by defining a more rigid protocol as 
in PEP 580, more optimizations are possible.



PEP 580 has the same limitation for the same reasons. The limitation is
necessary for correctness if an object supports calls via `__call__` and
through another calling convention.


I don't think that this limitation is needed in either PEP. As I 
explained at the top of this email, it can easily be solved by not 
using the protocol for Python classes. What is wrong with my proposal 
in PEP 580: https://www.python.org/dev/peps/pep-0580/#inheritance



I'll add Jeroen's notes from the review of the proposed PEP 590
(https://github.com/python/peps/pull/960):

The statement "PEP 580 is specifically targetted at function-like 
objects, and doesn't support other callables like 

Re: [Python-Dev] PEP 590 discussion

2019-04-02 Thread Petr Viktorin

On 3/30/19 11:36 PM, Jeroen Demeyer wrote:

On 2019-03-30 17:30, Mark Shannon wrote:

2. The claim that PEP 580 allows "certain optimizations because other
code can make assumptions" is flawed. In general, the caller cannot make
assumptions about the callee or vice-versa. Python is a dynamic language.


PEP 580 is meant for extension classes, not Python classes. Extension 
classes are not dynamic. When you implement tp_call in a given way, the 
user cannot change it. So if a class implements the C call protocol or 
the vectorcall protocol, callers can make assumptions about what that 
means.



PEP 579 is mainly a list of supposed flaws with the
'builtin_function_or_method' class.
The general thrust of PEP 579 seems to be that builtin-functions and
builtin-methods should be more flexible and extensible than they are. I
don't agree. If you want different behaviour, then use a different
object. Don't try an cram all this extra behaviour into a pre-existing
object.


I think that there is a misunderstanding here. I fully agree with the 
"use a different object" solution. This isn't a new solution: it's 
already possible to implement those different objects (Cython does it). 
It's just that this solution comes at a performance cost and that's what 
we want to avoid.


It does seem like there is some misunderstanding.

PEP 580 defines a CCall structure, which includes the function pointer, 
flags, "self" and "parent". Like the current implementation, it has 
various METH_ flags for various C signatures. When called, the info from 
CCall is matched up (in relatively complex ways) to what the C function 
expects.


PEP 590 only adds the "vectorcall". It does away with flags and only has 
one C signatures, which is designed to fit all the existing ones, and is 
well optimized. Storing the "self"/"parent", and making sure they're 
passed to the C function is the responsibility of the callable object.
There's an optimization for "self" (offsetting using 
PY_VECTORCALL_ARGUMENTS_OFFSET), and any supporting info can be provided 
as part of "self".



I'll reiterate that PEP 590 is more general than PEP 580 and that once
the callable's code has access to the callable object (as both PEPs
allow) then anything is possible. You can't can get more extensible than
that.


Anything is possible, but if one of the possibilities becomes common and 
useful, PEP 590 would make it hard to optimize for it.
Python has grown many "METH_*" signatures over the years as we found 
more things that need to be passed to callables. Why would 
"METH_VECTORCALL" be the last? If it won't (if you think about it as one 
more way to call functions), then dedicating a tp_* slot to it sounds 
quite expensive.



In one of the ways to call C functions in PEP 580, the function gets 
access to:

- the arguments,
- "self", the object
- the class that the method was found in (which is not necessarily 
type(self))
I still have to read the details, but when combined with 
LOAD_METHOD/CALL_METHOD optimization (avoiding creation of a "bound 
method" object), it seems impossible to do this efficiently with just 
the callable's code and callable's object.



I would argue the opposite: PEP 590 defines a fixed protocol that is not 
easy to extend. PEP 580 on the other hand uses a new data structure 
PyCCallDef which could easily be extended in the future (this will 
intentionally never be part of the stable ABI, so we can do that).


I have also argued before that the generality of PEP 590 is a bad thing 
rather than a good thing: by defining a more rigid protocol as in PEP 
580, more optimizations are possible.



PEP 580 has the same limitation for the same reasons. The limitation is
necessary for correctness if an object supports calls via `__call__` and
through another calling convention.


I don't think that this limitation is needed in either PEP. As I 
explained at the top of this email, it can easily be solved by not using 
the protocol for Python classes. What is wrong with my proposal in PEP 
580: https://www.python.org/dev/peps/pep-0580/#inheritance



I'll add Jeroen's notes from the review of the proposed PEP 590
(https://github.com/python/peps/pull/960):

The statement "PEP 580 is specifically targetted at function-like 
objects, and doesn't support other callables like classes, partial 
functions, or proxies" is factually false. The motivation for PEP 580 is 
certainly function/method-like objects but it's a general protocol that 
every class can implement. For certain classes, it may not be easy or 
desirable to do that but it's always possible.


Given that `PY_METHOD_DESCRIPTOR` is a flag for tp_flags, shouldn't it 
be called `Py_TPFLAGS_METHOD_DESCRIPTOR` or something?


Py_TPFLAGS_HAVE_VECTOR_CALL should be Py_TPFLAGS_HAVE_VECTORCALL, to be 
consistent with tp_vectorcall_offset and other uses of "vectorcall" (not 
"vector call")



And mine, so far:

I'm not clear on the constness of the "args" array.
If it is mutable (PyObject **), you