Re: [Python-Dev] PEP 590 discussion
Hi Petr, On 24/04/2019 11:24 pm, Petr Viktorin wrote: On 4/10/19 7:05 PM, Jeroen Demeyer wrote: On 2019-04-10 18:25, Petr Viktorin wrote: Hello! I've had time for a more thorough reading of PEP 590 and the reference implementation. Thank you for the work! And thank you for the review! I'd now describe the fundamental difference between PEP 580 and PEP 590 as: - PEP 580 tries to optimize all existing calling conventions - PEP 590 tries to optimize (and expose) the most general calling convention (i.e. fastcall) And PEP 580 has better performance overall, even for METH_FASTCALL. See this thread: https://mail.python.org/pipermail/python-dev/2019-April/156954.html Since these PEPs are all about performance, I consider this a very relevant argument in favor of PEP 580. All about performance as well as simplicity, correctness, testability, teachability... And PEP 580 touches some introspection :) PEP 580 also does a number of other things, as listed in PEP 579. But I think PEP 590 does not block future PEPs for the other items. On the other hand, PEP 580 has a much more mature implementation -- and that's where it picked up real-world complexity. About complexity, please read what I wrote in https://mail.python.org/pipermail/python-dev/2019-March/156853.html I claim that the complexity in the protocol of PEP 580 is a good thing, as it removes complexity from other places, in particular from the users of the protocol (better have a complex protocol that's simple to use, rather than a simple protocol that's complex to use). I think we're talking past each other. I see now it as: PEP 580 takes existing complexity and makes it available to all users, in a simpler way. It makes existing code faster. PEP 590 defines a new simple/fast protocol for its users, and instead of making existing complexity faster and easier to use, it's left to be deprecated/phased out (or kept in existing classes for backwards compatibility). It makes it possible for future code to be faster/simpler. I think things should be simple by default, but if people want some extra performance, they can opt in to some extra complexity. As a more concrete example of the simplicity that PEP 580 could bring, CPython currently has 2 classes for bound methods implemented in C: - "builtin_function_or_method" for normal C methods - "method-descriptor" for slot wrappers like __eq__ or __add__ With PEP 590, these classes would need to stay separate to get maximal performance. With PEP 580, just one class for bound methods would be sufficient and there wouldn't be any performance loss. And this extends to custom third-party function/method classes, for example as implemented by Cython. Yet, for backwards compatibility reasons, we can't merge the classes. Also, I think CPython and Cython are exactly the users that can trade some extra complexity for better performance. Jeroen's analysis from https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems to miss a step at the top: a. CALL_FUNCTION* / CALL_METHOD opcode calls b. _PyObject_FastCallKeywords() which calls c. _PyCFunction_FastCallKeywords() which calls d. _PyMethodDef_RawFastCallKeywords() which calls e. the actual C function (*ml_meth)() I think it's more useful to say that both PEPs bridge a->e (via _Py_VectorCall or PyCCall_Call). Not quite. For a builtin_function_or_method, we have with PEP 580: a. call_function() calls d. PyCCall_FastCall which calls e. the actual C function and with PEP 590 it's more like: a. call_function() calls c. _PyCFunction_FastCallKeywords which calls d. _PyMethodDef_RawFastCallKeywords which calls e. the actual C function Level c. above is the vectorcall wrapper, which is a level that PEP 580 doesn't have. PEP 580 optimizes all the code paths, where PEP 590 optimizes the fast path, and makes sure most/all use cases can use (or switch to) the fast path. > Both fast paths are fast: bridging a->e using zero-copy arg passing with some C calls and flag checks. The PEP 580 approach is faster; PEP 590's is simpler. Why do you say that PEP 580's approach is faster? There is no evidence for this. The only evidence so far is a couple of contrived benchmarks. Jeroen's showed a ~1% speedup for PEP 580 and mine showed a ~30% speed up for PEP 590. This clearly shows that I am better and coming up with contrived benchmarks :) PEP 590 was chosen as the fastest protocol I could come up with that was fully general, and wasn't so complex as to be unusable. Jeroen, is there something in PEPs 579/580 that PEP 590 blocks, or should address? Well, PEP 580 is an extensible protocol while PEP 590 is not. But, PyTypeObject is extensible, so even with PEP 590 one can always extend that (for example, PEP 590 uses a type flag Py_TPFLAGS_METHOD_DESCRIPTOR where PEP 580 instead uses the structs for the C call protocol). But I guess that extending
Re: [Python-Dev] PEP 590 discussion
Hi Jeroen, On 15/04/2019 9:38 am, Jeroen Demeyer wrote: On 2019-04-14 13:30, Mark Shannon wrote: PY_VECTORCALL_ARGUMENTS_OFFSET exists so that callables that make onward calls with an additional argument can do so efficiently. The obvious example is bound-methods, but classes are at least as important. cls(*args) -> cls.new(cls, *args) -> cls.__init__(self, *args) But tp_new and tp_init take the "cls" and "self" as separate arguments, not as part of *args. So I don't see why you need PY_VECTORCALL_ARGUMENTS_OFFSET for this. Here's some (untested) code for an implementation of vectorcall for object subtypes implemented in Python. It uses PY_VECTORCALL_ARGUMENTS_OFFSET to save memory allocation when calling the __init__ method. https://github.com/python/cpython/commit/9ff46e3ba0747f386f9519933910d63d5caae6ee#diff-c3cf251f16d5a03a9e7d4639f2d6f998R3820 Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 590 discussion
On 2019-04-25 23:11, Petr Viktorin wrote: My thoughts are not the roadmap, of course :) I asked about methods because we should aware of the consequences when choosing between PEP 580 and PEP 590 (or some compromise). There are basically 3 different ways of dealing with bound methods: (A) put methods inside the protocol. This is PEP 580 and my 580/590 compromise proposal. The disadvantage here is complexity in the protocol. (B) don't put methods inside the protocol and use a single generic method class types.MethodType. This is the status-quo for Python functions. It has the disadvantage of being slightly slower: there is an additional level of indirection when calling a bound method object. (C) don't put methods inside the protocol but use multiple method classes, one for every function class. This is the status-quo for functions implemented in C. This has the disadvantage of code duplication. I think that the choice between PEP 580 or 590 should be done together with a choice of one of the above options. For example, I really don't like the code duplication of (C), so I would prefer PEP 590 with (B) over PEP 590 with (C). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 590 discussion
On 4/25/19 5:12 AM, Jeroen Demeyer wrote: On 2019-04-25 00:24, Petr Viktorin wrote: PEP 590 defines a new simple/fast protocol for its users, and instead of making existing complexity faster and easier to use, it's left to be deprecated/phased out (or kept in existing classes for backwards compatibility). It makes it possible for future code to be faster/simpler. Can you elaborate on what you mean with this deprecating/phasing out? Kept for backwards compatibility, but not actively recommended or optimized. Perhaps made slower if that would help performance elsewhere. What's your view on dealing with method classes (not necessarily right now, but in the future)? Do you think that having separate method classes like method-wrapper (for example [].__add__) is good or bad? I fully agree with PEP 579's point on complexity: There are a huge number of classes involved to implement all variations of methods. This is not a problem by itself, but a compounding issue. The main problem is that, currently, you sometimes need to care about this (due to CPython special casing its own classes, without fallback to some public API). Ideally, what matters is the protocols the class implements rather than the class itself. If that is solved, having so many different classes becomes curious but unimportant -- merging them shouldn't be a priority. I'd concentrate on two efforts instead: - Calling should have a fast public API. (That's this PEP.) - Introspection should have well-defined, consistently used public API (but not necessarily fast). For introspection, I think the way is implementing the necessary API (e.g. dunder attributes) and changing things like inspect, traceback generation, etc. to use them. CPython's callable classes should stay as internal implementation details. (Specifically: I'm against making them subclassable: allowing subclasses basically makes everything about the superclass an API.) Since the way how PEP 580 and PEP 590 deal with bound method classes is very different, I would like to know the roadmap for this. My thoughts are not the roadmap, of course :) Speaking about roadmaps, I often use PEP 579 to check what I'm forgetting. Here are my thoughts on it: ## Naming (The word "built-in" is overused in Python) This is a social/docs problem, and out of scope of the technical efforts. PEPs should always define the terms they use (even in the case where there is an official definition, but it doesn't match popular usage). ## Not extendable As I mentioned above, I'm against opening the callables for subclassing. We should define and use protocols instead. ## cfunctions do not become methods If we were designing Python from scratch, this should have been done differently. Now this is a problem for Cython to solve. CPython should provide the tools to do so. ## Semantics of inspect.isfunction I don't like inspect.isfunction, because "Is it a function?" is almost never what you actually want to ask. I'd like to deprecate it in favor of explicit functions like "Does it have source code?", "Is it callable?", or even "Is it exactly types.FunctionType?". But I'm against changing its behavior -- people are expecting the current answer. ## C functions should have access to the function object That's where my stake in all this is; I want to move on with PEP 573 after 580/590 is sorted out. ## METH_FASTCALL is private and undocumented This is the intersection of PEP 580 and 590. ## Allowing native C arguments This would be a very experimental feature. Argument Clinic itself is not intended for public use, locking its "impl" functions as part of public API is off the table at this point. Cython's cpdef allows this nicely, and CPython's API is full of C functions. That should be good enough good for now. ## Complexity We should simpify, but I think the number of callable classes is not the best metric to focus on. ## PyMethodDef is too limited This is a valid point. But the PyMethodDef array is little more than a shortcut to creating methods directly in a loop. The immediate workaround could be to create a new constructor for methods. Then we can look into expressing the data declaratively again. ## Slot wrappers have no custom documentation I think this can now be done with a new custom slot wrapper class. Perhaps that can be added to CPython when it matures. ## Static methods and class methods should be callable This is a valid, though minor, point. I don't event think it would be a PEP-level change. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 590 discussion
On 2019-04-25 00:24, Petr Viktorin wrote: PEP 590 defines a new simple/fast protocol for its users, and instead of making existing complexity faster and easier to use, it's left to be deprecated/phased out (or kept in existing classes for backwards compatibility). It makes it possible for future code to be faster/simpler. Can you elaborate on what you mean with this deprecating/phasing out? What's your view on dealing with method classes (not necessarily right now, but in the future)? Do you think that having separate method classes like method-wrapper (for example [].__add__) is good or bad? Since the way how PEP 580 and PEP 590 deal with bound method classes is very different, I would like to know the roadmap for this. Jeroen. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 590 discussion
On 4/10/19 7:05 PM, Jeroen Demeyer wrote: On 2019-04-10 18:25, Petr Viktorin wrote: Hello! I've had time for a more thorough reading of PEP 590 and the reference implementation. Thank you for the work! And thank you for the review! I'd now describe the fundamental difference between PEP 580 and PEP 590 as: - PEP 580 tries to optimize all existing calling conventions - PEP 590 tries to optimize (and expose) the most general calling convention (i.e. fastcall) And PEP 580 has better performance overall, even for METH_FASTCALL. See this thread: https://mail.python.org/pipermail/python-dev/2019-April/156954.html Since these PEPs are all about performance, I consider this a very relevant argument in favor of PEP 580. All about performance as well as simplicity, correctness, testability, teachability... And PEP 580 touches some introspection :) PEP 580 also does a number of other things, as listed in PEP 579. But I think PEP 590 does not block future PEPs for the other items. On the other hand, PEP 580 has a much more mature implementation -- and that's where it picked up real-world complexity. About complexity, please read what I wrote in https://mail.python.org/pipermail/python-dev/2019-March/156853.html I claim that the complexity in the protocol of PEP 580 is a good thing, as it removes complexity from other places, in particular from the users of the protocol (better have a complex protocol that's simple to use, rather than a simple protocol that's complex to use). I think we're talking past each other. I see now it as: PEP 580 takes existing complexity and makes it available to all users, in a simpler way. It makes existing code faster. PEP 590 defines a new simple/fast protocol for its users, and instead of making existing complexity faster and easier to use, it's left to be deprecated/phased out (or kept in existing classes for backwards compatibility). It makes it possible for future code to be faster/simpler. I think things should be simple by default, but if people want some extra performance, they can opt in to some extra complexity. As a more concrete example of the simplicity that PEP 580 could bring, CPython currently has 2 classes for bound methods implemented in C: - "builtin_function_or_method" for normal C methods - "method-descriptor" for slot wrappers like __eq__ or __add__ With PEP 590, these classes would need to stay separate to get maximal performance. With PEP 580, just one class for bound methods would be sufficient and there wouldn't be any performance loss. And this extends to custom third-party function/method classes, for example as implemented by Cython. Yet, for backwards compatibility reasons, we can't merge the classes. Also, I think CPython and Cython are exactly the users that can trade some extra complexity for better performance. Jeroen's analysis from https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems to miss a step at the top: a. CALL_FUNCTION* / CALL_METHOD opcode calls b. _PyObject_FastCallKeywords() which calls c. _PyCFunction_FastCallKeywords() which calls d. _PyMethodDef_RawFastCallKeywords() which calls e. the actual C function (*ml_meth)() I think it's more useful to say that both PEPs bridge a->e (via _Py_VectorCall or PyCCall_Call). Not quite. For a builtin_function_or_method, we have with PEP 580: a. call_function() calls d. PyCCall_FastCall which calls e. the actual C function and with PEP 590 it's more like: a. call_function() calls c. _PyCFunction_FastCallKeywords which calls d. _PyMethodDef_RawFastCallKeywords which calls e. the actual C function Level c. above is the vectorcall wrapper, which is a level that PEP 580 doesn't have. PEP 580 optimizes all the code paths, where PEP 590 optimizes the fast path, and makes sure most/all use cases can use (or switch to) the fast path. Both fast paths are fast: bridging a->e using zero-copy arg passing with some C calls and flag checks. The PEP 580 approach is faster; PEP 590's is simpler. Jeroen, is there something in PEPs 579/580 that PEP 590 blocks, or should address? Well, PEP 580 is an extensible protocol while PEP 590 is not. But, PyTypeObject is extensible, so even with PEP 590 one can always extend that (for example, PEP 590 uses a type flag Py_TPFLAGS_METHOD_DESCRIPTOR where PEP 580 instead uses the structs for the C call protocol). But I guess that extending PyTypeObject will be harder to justify (say, in a future PEP) than extending the C call protocol. That's a good point. Also, it's explicitly allowed for users of the PEP 580 protocol to extend the PyCCallDef structure with custom fields. But I don't have a concrete idea of whether that will be useful. Unless I'm missing something, that would be effectively the same as extending their own instance struct. To bring any benefits, the extended PyCCallDef would need to be standardized in a PEP.
Re: [Python-Dev] PEP 590 discussion
Hi Mark! See my more general reply; here I'll just tie loose ends with a few +1s. On 4/14/19 7:30 AM, Mark Shannon wrote: On 10/04/2019 5:25 pm, Petr Viktorin wrote: [...] PEP 590 is built on a simple idea, formalizing fastcall. But it is complicated by PY_VECTORCALL_ARGUMENTS_OFFSET and Py_TPFLAGS_METHOD_DESCRIPTOR. As far as I understand, both are there to avoid intermediate bound-method object for LOAD_METHOD/CALL_METHOD. (They do try to be general, but I don't see any other use case.) Is that right? Not quite. Py_TPFLAGS_METHOD_DESCRIPTOR is for LOAD_METHOD/CALL_METHOD, it allows any callable descriptor to benefit from the LOAD_METHOD/CALL_METHOD optimisation. PY_VECTORCALL_ARGUMENTS_OFFSET exists so that callables that make onward calls with an additional argument can do so efficiently. The obvious example is bound-methods, but classes are at least as important. cls(*args) -> cls.new(cls, *args) -> cls.__init__(self, *args) I see. Thanks! (I'm running out of time today, but I'll write more on why I'm asking, and on the case I called "impossible" (while avoiding creation of a "bound method" object), later.) Let me drop this thread; I stand corrected. Another point I'd like some discussion on is that vectorcall function pointer is per-instance. It looks this is only useful for type objects, but it will add a pointer to every new-style callable object (including functions). That seems wasteful. Why not have a per-type pointer, and for types that need it (like PyTypeObject), make it dispatch to an instance-specific function? Firstly, each callable has different behaviour, so it makes sense to be able to do the dispatch from caller to callee in one step. Having a per-object function pointer allows that. Secondly, callables are either large or transient. If large, then the extra few bytes makes little difference. If transient then, it matters even less. The total increase in memory is likely to be only a few tens of kilobytes, even for a large program. That makes sense. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 590 discussion
On 2019-04-03 07:33, Jeroen Demeyer wrote: Access to the class isn't possible currently and also not with PEP 590. But it's easy enough to fix that: PEP 573 adds a new METH_METHOD flag to change the signature of the C function (not the vectorcall wrapper). PEP 580 supports this "out of the box" because I'm reusing the class also to do type checks. But this shouldn't be an argument for or against either PEP. Actually, in the answer above I only considered "is implementing PEP 573 possible?" but I did not consider the complexity of doing that. And in line with what I claimed about complexity before, I think that PEP 580 scores better in this regard. Take PEP 580 and assume for the sake of argument that it didn't already have the cc_parent field. Then adding support for PEP 573 is easy: just add the cc_parent field to the C call protocol structure and set that field when initializing a method_descriptor. C functions can use the METH_DEFARG flag to get access to the PyCCallDef structure, which gives cc_parent. Implementing PEP 573 for a custom function class takes no extra effort: it doesn't require any changes to that class, except for correctly initializing the cc_parent field. Since PEP 580 has built-in support for methods, nothing special needs to be done to support methods too. With PEP 590 on the other hand, every single class which is involved in PEP 573 must be changed and every single vectorcall wrapper supporting PEP 573 must be changed. This is not limited to the function class itself, also the corresponding method class (for example, builtin_function_or_method for method_descriptor) needs to be changed. Jeroen ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 590 discussion
On 2019-04-14 13:30, Mark Shannon wrote: PY_VECTORCALL_ARGUMENTS_OFFSET exists so that callables that make onward calls with an additional argument can do so efficiently. The obvious example is bound-methods, but classes are at least as important. cls(*args) -> cls.new(cls, *args) -> cls.__init__(self, *args) But tp_new and tp_init take the "cls" and "self" as separate arguments, not as part of *args. So I don't see why you need PY_VECTORCALL_ARGUMENTS_OFFSET for this. The updated minimal implementation now uses `const` arguments. Code that uses args[-1] must explicitly cast away the const. https://github.com/markshannon/cpython/blob/vectorcall-minimal/Objects/classobject.c#L55 That's better indeed. Jeroen. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 590 discussion
Hi, Petr On 10/04/2019 5:25 pm, Petr Viktorin wrote: Hello! I've had time for a more thorough reading of PEP 590 and the reference implementation. Thank you for the work! Overall, I like PEP 590's direction. I'd now describe the fundamental difference between PEP 580 and PEP 590 as: - PEP 580 tries to optimize all existing calling conventions - PEP 590 tries to optimize (and expose) the most general calling convention (i.e. fastcall) PEP 580 also does a number of other things, as listed in PEP 579. But I think PEP 590 does not block future PEPs for the other items. On the other hand, PEP 580 has a much more mature implementation -- and that's where it picked up real-world complexity. PEP 590's METH_VECTORCALL is designed to handle all existing use cases, rather than mirroring the existing METH_* varieties. But both PEPs require the callable's code to be modified, so requiring it to switch calling conventions shouldn't be a problem. Jeroen's analysis from https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems to miss a step at the top: a. CALL_FUNCTION* / CALL_METHOD opcode calls b. _PyObject_FastCallKeywords() which calls c. _PyCFunction_FastCallKeywords() which calls d. _PyMethodDef_RawFastCallKeywords() which calls e. the actual C function (*ml_meth)() I think it's more useful to say that both PEPs bridge a->e (via _Py_VectorCall or PyCCall_Call). PEP 590 is built on a simple idea, formalizing fastcall. But it is complicated by PY_VECTORCALL_ARGUMENTS_OFFSET and Py_TPFLAGS_METHOD_DESCRIPTOR. As far as I understand, both are there to avoid intermediate bound-method object for LOAD_METHOD/CALL_METHOD. (They do try to be general, but I don't see any other use case.) Is that right? Not quite. Py_TPFLAGS_METHOD_DESCRIPTOR is for LOAD_METHOD/CALL_METHOD, it allows any callable descriptor to benefit from the LOAD_METHOD/CALL_METHOD optimisation. PY_VECTORCALL_ARGUMENTS_OFFSET exists so that callables that make onward calls with an additional argument can do so efficiently. The obvious example is bound-methods, but classes are at least as important. cls(*args) -> cls.new(cls, *args) -> cls.__init__(self, *args) (I'm running out of time today, but I'll write more on why I'm asking, and on the case I called "impossible" (while avoiding creation of a "bound method" object), later.) The way `const` is handled in the function signatures strikes me as too fragile for public API. I'd like if, as much as possible, PY_VECTORCALL_ARGUMENTS_OFFSET was treated as a special optimization that extension authors can either opt in to, or blissfully ignore. That might mean: - vectorcall, PyObject_VectorCallWithCallable, PyObject_VectorCall, PyCall_MakeTpCall all formally take "PyObject *const *args" - a naïve callee must do "nargs &= ~PY_VECTORCALL_ARGUMENTS_OFFSET" (maybe spelled as "nargs &= PY_VECTORCALL_NARGS_MASK"), but otherwise writes compiler-enforced const-correct code. - if PY_VECTORCALL_ARGUMENTS_OFFSET is set, the callee may modify "args[-1]" (and only that, and after the author has read the docs). The updated minimal implementation now uses `const` arguments. Code that uses args[-1] must explicitly cast away the const. https://github.com/markshannon/cpython/blob/vectorcall-minimal/Objects/classobject.c#L55 Another point I'd like some discussion on is that vectorcall function pointer is per-instance. It looks this is only useful for type objects, but it will add a pointer to every new-style callable object (including functions). That seems wasteful. Why not have a per-type pointer, and for types that need it (like PyTypeObject), make it dispatch to an instance-specific function? Firstly, each callable has different behaviour, so it makes sense to be able to do the dispatch from caller to callee in one step. Having a per-object function pointer allows that. Secondly, callables are either large or transient. If large, then the extra few bytes makes little difference. If transient then, it matters even less. The total increase in memory is likely to be only a few tens of kilobytes, even for a large program. Minor things: - "Continued prohibition of callable classes as base classes" -- this section reads as a final. Would you be OK wording this as something other PEPs can tackle? - "PyObject_VectorCall" -- this looks extraneous, and the reference imlementation doesn't need it so far. Can it be removed, or justified? Yes, removing it makes sense. I can then rename the clumsily named "PyObject_VectorCallWithCallable" as "PyObject_VectorCall". - METH_VECTORCALL is *not* strictly "equivalent to the currently undocumented METH_FASTCALL | METH_KEYWORD flags" (it has the ARGUMENTS_OFFSET complication). METH_VECTORCALL is just making METH_FASTCALL | METH_KEYWORD documented and public. Would you prefer that it has a different name to prevent confusion with over PY_VECTORCALL_ARGUMENTS_OFFSET? I
Re: [Python-Dev] PEP 590 discussion
On Thu, Apr 11, 2019 at 5:06 AM Jeroen Demeyer wrote: > Petr, > > I realize that you are in a difficult position. You'll end up > disappointing either me or Mark... > > I don't know if the steering council or somebody else has a good idea to > deal with this situation. > Our answer was "ask Petr to be BDFL Delegate". ;) In all seriousness, none of us on the council or as well equipped as Petr to handle this tough decision, else it would take even longer for us to learn enough to make an informed decision and we would be even worse off. -Brett > > > Jeroen has time > > Speaking of time, maybe I should clarify that I have time until the end > of August: I am working for the OpenDreamKit grant, which allows me to > work basically full-time on open source software development but that > ends at the end of August. > > > Here again, I mostly want to know if the details are there for deeper > > reasons, or just points to polish. > > I would say: mostly shallow details. > > The subclassing thing would be good to resolve, but I don't see any > difference between PEP 580 and PEP 590 there. In PEP 580, I wrote a > strategy for dealing with subclassing. I believe that it works and that > exactly the same idea would work for PEP 590 too. Of course, I may be > overlooking something... > > > I don't have good general experience with premature extensibility, so > > I'd not count this as a plus. > > Fair enough. I also see it more as a "nice to have", not as a big plus. > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 590 discussion
Petr, I realize that you are in a difficult position. You'll end up disappointing either me or Mark... I don't know if the steering council or somebody else has a good idea to deal with this situation. Jeroen has time Speaking of time, maybe I should clarify that I have time until the end of August: I am working for the OpenDreamKit grant, which allows me to work basically full-time on open source software development but that ends at the end of August. Here again, I mostly want to know if the details are there for deeper reasons, or just points to polish. I would say: mostly shallow details. The subclassing thing would be good to resolve, but I don't see any difference between PEP 580 and PEP 590 there. In PEP 580, I wrote a strategy for dealing with subclassing. I believe that it works and that exactly the same idea would work for PEP 590 too. Of course, I may be overlooking something... I don't have good general experience with premature extensibility, so I'd not count this as a plus. Fair enough. I also see it more as a "nice to have", not as a big plus. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 590 discussion
On 4/11/19 1:05 AM, Jeroen Demeyer wrote: On 2019-04-10 18:25, Petr Viktorin wrote: Hello! I've had time for a more thorough reading of PEP 590 and the reference implementation. Thank you for the work! And thank you for the review! One general note: I am not (yet) choosing between PEP 580 and PEP 590. I am not looking for arguments for/against whole PEPs, but individual ideas which, I believe, can still be mixed & matched. I see the situation this way: - I get about one day per week when I can properly concentrate on CPython. It's frustrating to be the bottleneck. - Jeroen has time, but it would frustrating to work on something that will later be discarded, and it's frustrating to not be able to move the project forward. - Mark has good ideas, but seems to lack the time to polish them, or even test out if they are good. It is probably frustrating to see unpolished ideas rejected. I'm looking for ways to reduce the frustration, given where we are. Jeroen, thank you for the comments. Apologies for not having the time to reply to all of them properly right now. Mark, if you could find the time to answer (even just a few of the points), it would be great. I ask you to share/clarify your thoughts, not defend your PEP. I'd now describe the fundamental difference between PEP 580 and PEP 590 as: - PEP 580 tries to optimize all existing calling conventions - PEP 590 tries to optimize (and expose) the most general calling convention (i.e. fastcall) And PEP 580 has better performance overall, even for METH_FASTCALL. See this thread: https://mail.python.org/pipermail/python-dev/2019-April/156954.html Since these PEPs are all about performance, I consider this a very relevant argument in favor of PEP 580. PEP 580 also does a number of other things, as listed in PEP 579. But I think PEP 590 does not block future PEPs for the other items. On the other hand, PEP 580 has a much more mature implementation -- and that's where it picked up real-world complexity. About complexity, please read what I wrote in https://mail.python.org/pipermail/python-dev/2019-March/156853.html I claim that the complexity in the protocol of PEP 580 is a good thing, as it removes complexity from other places, in particular from the users of the protocol (better have a complex protocol that's simple to use, rather than a simple protocol that's complex to use). Sadly, I need more time on this than I have today; I'll get back to it next week. As a more concrete example of the simplicity that PEP 580 could bring, CPython currently has 2 classes for bound methods implemented in C: - "builtin_function_or_method" for normal C methods - "method-descriptor" for slot wrappers like __eq__ or __add__ With PEP 590, these classes would need to stay separate to get maximal performance. With PEP 580, just one class for bound methods would be sufficient and there wouldn't be any performance loss. And this extends to custom third-party function/method classes, for example as implemented by Cython. PEP 590's METH_VECTORCALL is designed to handle all existing use cases, rather than mirroring the existing METH_* varieties. But both PEPs require the callable's code to be modified, so requiring it to switch calling conventions shouldn't be a problem. Agreed. Jeroen's analysis from https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems to miss a step at the top: a. CALL_FUNCTION* / CALL_METHOD opcode calls b. _PyObject_FastCallKeywords() which calls c. _PyCFunction_FastCallKeywords() which calls d. _PyMethodDef_RawFastCallKeywords() which calls e. the actual C function (*ml_meth)() I think it's more useful to say that both PEPs bridge a->e (via _Py_VectorCall or PyCCall_Call). Not quite. For a builtin_function_or_method, we have with PEP 580: a. call_function() calls d. PyCCall_FastCall which calls e. the actual C function and with PEP 590 it's more like: a. call_function() calls c. _PyCFunction_FastCallKeywords which calls d. _PyMethodDef_RawFastCallKeywords which calls e. the actual C function Level c. above is the vectorcall wrapper, which is a level that PEP 580 doesn't have. Again, I'll get back to this next week. The way `const` is handled in the function signatures strikes me as too fragile for public API. That's a detail which shouldn't influence the acceptance of either PEP. True. I guess what I want from the answer is to know how much thought went into const handling: is what's in the PEP an initial draft, or does it solve some hidden issue? Why not have a per-type pointer, and for types that need it (like PyTypeObject), make it dispatch to an instance-specific function? That would be exactly https://bugs.python.org/issue29259 I'll let Mark comment on this. Minor things: - "Continued prohibition of callable classes as base classes" -- this section reads as a final. Would you be OK wording this as
Re: [Python-Dev] PEP 590 discussion
On 2019-04-10 18:25, Petr Viktorin wrote: Hello! I've had time for a more thorough reading of PEP 590 and the reference implementation. Thank you for the work! And thank you for the review! I'd now describe the fundamental difference between PEP 580 and PEP 590 as: - PEP 580 tries to optimize all existing calling conventions - PEP 590 tries to optimize (and expose) the most general calling convention (i.e. fastcall) And PEP 580 has better performance overall, even for METH_FASTCALL. See this thread: https://mail.python.org/pipermail/python-dev/2019-April/156954.html Since these PEPs are all about performance, I consider this a very relevant argument in favor of PEP 580. PEP 580 also does a number of other things, as listed in PEP 579. But I think PEP 590 does not block future PEPs for the other items. On the other hand, PEP 580 has a much more mature implementation -- and that's where it picked up real-world complexity. About complexity, please read what I wrote in https://mail.python.org/pipermail/python-dev/2019-March/156853.html I claim that the complexity in the protocol of PEP 580 is a good thing, as it removes complexity from other places, in particular from the users of the protocol (better have a complex protocol that's simple to use, rather than a simple protocol that's complex to use). As a more concrete example of the simplicity that PEP 580 could bring, CPython currently has 2 classes for bound methods implemented in C: - "builtin_function_or_method" for normal C methods - "method-descriptor" for slot wrappers like __eq__ or __add__ With PEP 590, these classes would need to stay separate to get maximal performance. With PEP 580, just one class for bound methods would be sufficient and there wouldn't be any performance loss. And this extends to custom third-party function/method classes, for example as implemented by Cython. PEP 590's METH_VECTORCALL is designed to handle all existing use cases, rather than mirroring the existing METH_* varieties. But both PEPs require the callable's code to be modified, so requiring it to switch calling conventions shouldn't be a problem. Agreed. Jeroen's analysis from https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems to miss a step at the top: a. CALL_FUNCTION* / CALL_METHOD opcode calls b. _PyObject_FastCallKeywords() which calls c. _PyCFunction_FastCallKeywords() which calls d. _PyMethodDef_RawFastCallKeywords() which calls e. the actual C function (*ml_meth)() I think it's more useful to say that both PEPs bridge a->e (via _Py_VectorCall or PyCCall_Call). Not quite. For a builtin_function_or_method, we have with PEP 580: a. call_function() calls d. PyCCall_FastCall which calls e. the actual C function and with PEP 590 it's more like: a. call_function() calls c. _PyCFunction_FastCallKeywords which calls d. _PyMethodDef_RawFastCallKeywords which calls e. the actual C function Level c. above is the vectorcall wrapper, which is a level that PEP 580 doesn't have. The way `const` is handled in the function signatures strikes me as too fragile for public API. That's a detail which shouldn't influence the acceptance of either PEP. Why not have a per-type pointer, and for types that need it (like PyTypeObject), make it dispatch to an instance-specific function? That would be exactly https://bugs.python.org/issue29259 I'll let Mark comment on this. Minor things: - "Continued prohibition of callable classes as base classes" -- this section reads as a final. Would you be OK wording this as something other PEPs can tackle? - "PyObject_VectorCall" -- this looks extraneous, and the reference imlementation doesn't need it so far. Can it be removed, or justified? - METH_VECTORCALL is *not* strictly "equivalent to the currently undocumented METH_FASTCALL | METH_KEYWORD flags" (it has the ARGUMENTS_OFFSET complication). - I'd like to officially call this PEP "Vectorcall", see https://github.com/python/peps/pull/984 Those are indeed details which shouldn't influence the acceptance of either PEP. If you go with PEP 590, then we should discuss this further. Mark, what are your plans for next steps with PEP 590? If a volunteer wanted to help you push this forward, what would be the best thing to work on? Personally, I think what we need now is a decision between PEP 580 and PEP 590 (there is still the possibility of rejecting both but I really hope that this won't happen). There is a lot of work that still needs to be done after either PEP is accepted, such as: - finish and merge the reference implementation - document everything - use the protocol in more classes where it makes sense (for example, staticmethod, wrapper_descriptor) - use this in Cython - handle more issues from PEP 579 I volunteer to put my time into this, regardless of which PEP is accepted. Of course, I still think that PEP 580 is better, but I also want this
Re: [Python-Dev] PEP 590 discussion
Hello! I've had time for a more thorough reading of PEP 590 and the reference implementation. Thank you for the work! Overall, I like PEP 590's direction. I'd now describe the fundamental difference between PEP 580 and PEP 590 as: - PEP 580 tries to optimize all existing calling conventions - PEP 590 tries to optimize (and expose) the most general calling convention (i.e. fastcall) PEP 580 also does a number of other things, as listed in PEP 579. But I think PEP 590 does not block future PEPs for the other items. On the other hand, PEP 580 has a much more mature implementation -- and that's where it picked up real-world complexity. PEP 590's METH_VECTORCALL is designed to handle all existing use cases, rather than mirroring the existing METH_* varieties. But both PEPs require the callable's code to be modified, so requiring it to switch calling conventions shouldn't be a problem. Jeroen's analysis from https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems to miss a step at the top: a. CALL_FUNCTION* / CALL_METHOD opcode calls b. _PyObject_FastCallKeywords() which calls c. _PyCFunction_FastCallKeywords() which calls d. _PyMethodDef_RawFastCallKeywords() which calls e. the actual C function (*ml_meth)() I think it's more useful to say that both PEPs bridge a->e (via _Py_VectorCall or PyCCall_Call). PEP 590 is built on a simple idea, formalizing fastcall. But it is complicated by PY_VECTORCALL_ARGUMENTS_OFFSET and Py_TPFLAGS_METHOD_DESCRIPTOR. As far as I understand, both are there to avoid intermediate bound-method object for LOAD_METHOD/CALL_METHOD. (They do try to be general, but I don't see any other use case.) Is that right? (I'm running out of time today, but I'll write more on why I'm asking, and on the case I called "impossible" (while avoiding creation of a "bound method" object), later.) The way `const` is handled in the function signatures strikes me as too fragile for public API. I'd like if, as much as possible, PY_VECTORCALL_ARGUMENTS_OFFSET was treated as a special optimization that extension authors can either opt in to, or blissfully ignore. That might mean: - vectorcall, PyObject_VectorCallWithCallable, PyObject_VectorCall, PyCall_MakeTpCall all formally take "PyObject *const *args" - a naïve callee must do "nargs &= ~PY_VECTORCALL_ARGUMENTS_OFFSET" (maybe spelled as "nargs &= PY_VECTORCALL_NARGS_MASK"), but otherwise writes compiler-enforced const-correct code. - if PY_VECTORCALL_ARGUMENTS_OFFSET is set, the callee may modify "args[-1]" (and only that, and after the author has read the docs). Another point I'd like some discussion on is that vectorcall function pointer is per-instance. It looks this is only useful for type objects, but it will add a pointer to every new-style callable object (including functions). That seems wasteful. Why not have a per-type pointer, and for types that need it (like PyTypeObject), make it dispatch to an instance-specific function? Minor things: - "Continued prohibition of callable classes as base classes" -- this section reads as a final. Would you be OK wording this as something other PEPs can tackle? - "PyObject_VectorCall" -- this looks extraneous, and the reference imlementation doesn't need it so far. Can it be removed, or justified? - METH_VECTORCALL is *not* strictly "equivalent to the currently undocumented METH_FASTCALL | METH_KEYWORD flags" (it has the ARGUMENTS_OFFSET complication). - I'd like to officially call this PEP "Vectorcall", see https://github.com/python/peps/pull/984 Mark, what are your plans for next steps with PEP 590? If a volunteer wanted to help you push this forward, what would be the best thing to work on? Jeroen, is there something in PEPs 579/580 that PEP 590 blocks, or should address? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 590 discussion
In one of the ways to call C functions in PEP 580, the function gets access to: - the arguments, - "self", the object - the class that the method was found in (which is not necessarily type(self)) I still have to read the details, but when combined with LOAD_METHOD/CALL_METHOD optimization (avoiding creation of a "bound method" object), it seems impossible to do this efficiently with just the callable's code and callable's object. It is possible, and relatively straightforward. Access to the class isn't possible currently and also not with PEP 590. But it's easy enough to fix that: PEP 573 adds a new METH_METHOD flag to change the signature of the C function (not the vectorcall wrapper). PEP 580 supports this "out of the box" because I'm reusing the class also to do type checks. But this shouldn't be an argument for or against either PEP. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 590 discussion
Hi, On 02/04/2019 1:49 pm, Petr Viktorin wrote: On 3/30/19 11:36 PM, Jeroen Demeyer wrote: On 2019-03-30 17:30, Mark Shannon wrote: 2. The claim that PEP 580 allows "certain optimizations because other code can make assumptions" is flawed. In general, the caller cannot make assumptions about the callee or vice-versa. Python is a dynamic language. PEP 580 is meant for extension classes, not Python classes. Extension classes are not dynamic. When you implement tp_call in a given way, the user cannot change it. So if a class implements the C call protocol or the vectorcall protocol, callers can make assumptions about what that means. PEP 579 is mainly a list of supposed flaws with the 'builtin_function_or_method' class. The general thrust of PEP 579 seems to be that builtin-functions and builtin-methods should be more flexible and extensible than they are. I don't agree. If you want different behaviour, then use a different object. Don't try an cram all this extra behaviour into a pre-existing object. I think that there is a misunderstanding here. I fully agree with the "use a different object" solution. This isn't a new solution: it's already possible to implement those different objects (Cython does it). It's just that this solution comes at a performance cost and that's what we want to avoid. It does seem like there is some misunderstanding. PEP 580 defines a CCall structure, which includes the function pointer, flags, "self" and "parent". Like the current implementation, it has various METH_ flags for various C signatures. When called, the info from CCall is matched up (in relatively complex ways) to what the C function expects. PEP 590 only adds the "vectorcall". It does away with flags and only has one C signatures, which is designed to fit all the existing ones, and is well optimized. Storing the "self"/"parent", and making sure they're passed to the C function is the responsibility of the callable object. There's an optimization for "self" (offsetting using PY_VECTORCALL_ARGUMENTS_OFFSET), and any supporting info can be provided as part of "self". > I'll reiterate that PEP 590 is more general than PEP 580 and that once the callable's code has access to the callable object (as both PEPs allow) then anything is possible. You can't can get more extensible than that. Anything is possible, but if one of the possibilities becomes common and useful, PEP 590 would make it hard to optimize for it. Python has grown many "METH_*" signatures over the years as we found more things that need to be passed to callables. Why would "METH_VECTORCALL" be the last? If it won't (if you think about it as one more way to call functions), then dedicating a tp_* slot to it sounds quite expensive. I doubt METH_VECTORCALL will be the last. Let me give you an example: It is quite common for a function to take two arguments, so we might want add a METH_OO flag for builtin-functions with 2 parameters. To support this in PEP 590, you would make exactly the same change as you would now; which is to add another case to the switch statement in _PyCFunction_FastCallKeywords. For PEP 580, you would add another case to the switch in PyCCall_FastCall. No difference really. PEP 580 uses a slot as well. It's only 8 bytes per class. In one of the ways to call C functions in PEP 580, the function gets access to: - the arguments, - "self", the object - the class that the method was found in (which is not necessarily type(self)) I still have to read the details, but when combined with LOAD_METHOD/CALL_METHOD optimization (avoiding creation of a "bound method" object), it seems impossible to do this efficiently with just the callable's code and callable's object. It is possible, and relatively straightforward. Why do you think it is impossible? I would argue the opposite: PEP 590 defines a fixed protocol that is not easy to extend. PEP 580 on the other hand uses a new data structure PyCCallDef which could easily be extended in the future (this will intentionally never be part of the stable ABI, so we can do that). I have also argued before that the generality of PEP 590 is a bad thing rather than a good thing: by defining a more rigid protocol as in PEP 580, more optimizations are possible. PEP 580 has the same limitation for the same reasons. The limitation is necessary for correctness if an object supports calls via `__call__` and through another calling convention. I don't think that this limitation is needed in either PEP. As I explained at the top of this email, it can easily be solved by not using the protocol for Python classes. What is wrong with my proposal in PEP 580: https://www.python.org/dev/peps/pep-0580/#inheritance I'll add Jeroen's notes from the review of the proposed PEP 590 (https://github.com/python/peps/pull/960): The statement "PEP 580 is specifically targetted at function-like objects, and doesn't support other callables like
Re: [Python-Dev] PEP 590 discussion
On 3/30/19 11:36 PM, Jeroen Demeyer wrote: On 2019-03-30 17:30, Mark Shannon wrote: 2. The claim that PEP 580 allows "certain optimizations because other code can make assumptions" is flawed. In general, the caller cannot make assumptions about the callee or vice-versa. Python is a dynamic language. PEP 580 is meant for extension classes, not Python classes. Extension classes are not dynamic. When you implement tp_call in a given way, the user cannot change it. So if a class implements the C call protocol or the vectorcall protocol, callers can make assumptions about what that means. PEP 579 is mainly a list of supposed flaws with the 'builtin_function_or_method' class. The general thrust of PEP 579 seems to be that builtin-functions and builtin-methods should be more flexible and extensible than they are. I don't agree. If you want different behaviour, then use a different object. Don't try an cram all this extra behaviour into a pre-existing object. I think that there is a misunderstanding here. I fully agree with the "use a different object" solution. This isn't a new solution: it's already possible to implement those different objects (Cython does it). It's just that this solution comes at a performance cost and that's what we want to avoid. It does seem like there is some misunderstanding. PEP 580 defines a CCall structure, which includes the function pointer, flags, "self" and "parent". Like the current implementation, it has various METH_ flags for various C signatures. When called, the info from CCall is matched up (in relatively complex ways) to what the C function expects. PEP 590 only adds the "vectorcall". It does away with flags and only has one C signatures, which is designed to fit all the existing ones, and is well optimized. Storing the "self"/"parent", and making sure they're passed to the C function is the responsibility of the callable object. There's an optimization for "self" (offsetting using PY_VECTORCALL_ARGUMENTS_OFFSET), and any supporting info can be provided as part of "self". I'll reiterate that PEP 590 is more general than PEP 580 and that once the callable's code has access to the callable object (as both PEPs allow) then anything is possible. You can't can get more extensible than that. Anything is possible, but if one of the possibilities becomes common and useful, PEP 590 would make it hard to optimize for it. Python has grown many "METH_*" signatures over the years as we found more things that need to be passed to callables. Why would "METH_VECTORCALL" be the last? If it won't (if you think about it as one more way to call functions), then dedicating a tp_* slot to it sounds quite expensive. In one of the ways to call C functions in PEP 580, the function gets access to: - the arguments, - "self", the object - the class that the method was found in (which is not necessarily type(self)) I still have to read the details, but when combined with LOAD_METHOD/CALL_METHOD optimization (avoiding creation of a "bound method" object), it seems impossible to do this efficiently with just the callable's code and callable's object. I would argue the opposite: PEP 590 defines a fixed protocol that is not easy to extend. PEP 580 on the other hand uses a new data structure PyCCallDef which could easily be extended in the future (this will intentionally never be part of the stable ABI, so we can do that). I have also argued before that the generality of PEP 590 is a bad thing rather than a good thing: by defining a more rigid protocol as in PEP 580, more optimizations are possible. PEP 580 has the same limitation for the same reasons. The limitation is necessary for correctness if an object supports calls via `__call__` and through another calling convention. I don't think that this limitation is needed in either PEP. As I explained at the top of this email, it can easily be solved by not using the protocol for Python classes. What is wrong with my proposal in PEP 580: https://www.python.org/dev/peps/pep-0580/#inheritance I'll add Jeroen's notes from the review of the proposed PEP 590 (https://github.com/python/peps/pull/960): The statement "PEP 580 is specifically targetted at function-like objects, and doesn't support other callables like classes, partial functions, or proxies" is factually false. The motivation for PEP 580 is certainly function/method-like objects but it's a general protocol that every class can implement. For certain classes, it may not be easy or desirable to do that but it's always possible. Given that `PY_METHOD_DESCRIPTOR` is a flag for tp_flags, shouldn't it be called `Py_TPFLAGS_METHOD_DESCRIPTOR` or something? Py_TPFLAGS_HAVE_VECTOR_CALL should be Py_TPFLAGS_HAVE_VECTORCALL, to be consistent with tp_vectorcall_offset and other uses of "vectorcall" (not "vector call") And mine, so far: I'm not clear on the constness of the "args" array. If it is mutable (PyObject **), you