Re: [Python-Dev] PEP 580/590 discussion
On 2019-05-10 00:07, Petr Viktorin wrote: METH_FASTCALL is currently not documented, and it should be renamed before it's documented. Names with "fast" or "new" generally don't age well. Just to make sure that we're understanding correctly, is your proposal to do the following: - remove the name METH_FASTCALL - remove the calling convention METH_FASTCALL without METH_KEYWORDS - rename METH_FASTCALL|METH_KEYWORDS -> METH_VECTORCALL ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
Petr Viktorin schrieb am 10.05.19 um 00:07: > On 5/9/19 5:33 PM, Jeroen Demeyer wrote: >> Maybe you misunderstood my proposal. I want to allow both for extra >> flexibility: >> >> - METH_FASTCALL (possibly combined with METH_KEYWORDS) continues to work >> as before. If you don't want to care about the implementation details of >> vectorcall, this is the right thing to use. >> >> - METH_VECTORCALL (using exactly the vectorcallfunc signature) is a new >> calling convention for applications that want the lowest possible >> overhead at the cost of being slightly harder to use. > > Then we can, in the spirit of minimalism, not add METH_VECTORCALL at all. > [...] > METH_FASTCALL is currently not documented, and it should be renamed before > it's documented. Names with "fast" or "new" generally don't age well. I personally don't see an advantage in having both, apart from helping code that wants to be fast also on Py3.7, for example. It unnecessarily complicates the CPython implementation and C-API. I'd be ok with removing FASTCALL in favour of VECTORCALL. That's more code to generate for Cython in order to adapt to Py<3.6, Py3.6, Py3.7 and then Py>=3.[89], but well, seeing the heap of code that we *already* generate, it's not going to hurt our users much. It would, however, be (selfishly) helpful if FASTCALL could still go through a deprecation period, because we'd like to keep the current Cython 0.29.x release series compatible with Python 3.8, and I'd like to avoid adding support for VECTORCALL and compiling out FASTCALL in a point release. Removing it in Py3.9 seems ok to me. Stefan ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On 5/9/19 5:33 PM, Jeroen Demeyer wrote: On 2019-05-09 20:30, Petr Viktorin wrote: The underlying C function should not need to know how to extract "self" from the function object, or how to handle the argument offsetting. Those should be implementation details. Maybe you misunderstood my proposal. I want to allow both for extra flexibility: - METH_FASTCALL (possibly combined with METH_KEYWORDS) continues to work as before. If you don't want to care about the implementation details of vectorcall, this is the right thing to use. - METH_VECTORCALL (using exactly the vectorcallfunc signature) is a new calling convention for applications that want the lowest possible overhead at the cost of being slightly harder to use. Then we can, in the spirit of minimalism, not add METH_VECTORCALL at all. Personally, I consider the discussion about who is supposed to check that a function returns NULL if and if an error occurred a tiny detail which shouldn't dictate the design. There are two solutions for this: either we move that check one level up and do it for all vectorcall functions. Or, we keep the existing checks in place but we don't do that check for METH_VECTORCALL (this is already more specialized anyway, so dropping that check doesn't hurt much). We could also decide to enable this check only for debug builds, especially if debug builds are going to be easier to use thank to Victor Stinner's work. I see the value in having METH_VECTORCALL equivalent to the existing METH_FASTCALL|METH_KEYWORDS. But why invent a new name for that? METH_FASTCALL|METH_KEYWORDS already works. The alias METH_VECTORCALL could only make things more confusing (having two ways to specify exactly the same thing). Or am I missing something? METH_FASTCALL is currently not documented, and it should be renamed before it's documented. Names with "fast" or "new" generally don't age well. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On 2019-05-09 20:30, Petr Viktorin wrote: But, if you apply the robustness principle to vectorcallfunc, it should accept empty tuples. Sure, if the callee wants to accept empty tuples anyway, it can do that. That's the robustness principle. But us *forcing* the callee to accept empty tuples is certainly not. Basically my point is: with a little bit of effort in CPython we can make things simpler for all users of vectorcall. Why not do that? Seriously, what's the argument for *not* applying this change? Jeroen. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On 2019-05-09 20:30, Petr Viktorin wrote: The underlying C function should not need to know how to extract "self" from the function object, or how to handle the argument offsetting. Those should be implementation details. Maybe you misunderstood my proposal. I want to allow both for extra flexibility: - METH_FASTCALL (possibly combined with METH_KEYWORDS) continues to work as before. If you don't want to care about the implementation details of vectorcall, this is the right thing to use. - METH_VECTORCALL (using exactly the vectorcallfunc signature) is a new calling convention for applications that want the lowest possible overhead at the cost of being slightly harder to use. Personally, I consider the discussion about who is supposed to check that a function returns NULL if and if an error occurred a tiny detail which shouldn't dictate the design. There are two solutions for this: either we move that check one level up and do it for all vectorcall functions. Or, we keep the existing checks in place but we don't do that check for METH_VECTORCALL (this is already more specialized anyway, so dropping that check doesn't hurt much). We could also decide to enable this check only for debug builds, especially if debug builds are going to be easier to use thank to Victor Stinner's work. I see the value in having METH_VECTORCALL equivalent to the existing METH_FASTCALL|METH_KEYWORDS. But why invent a new name for that? METH_FASTCALL|METH_KEYWORDS already works. The alias METH_VECTORCALL could only make things more confusing (having two ways to specify exactly the same thing). Or am I missing something? Jeroen. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On 2019-05-09 23:09, Brett Cannon wrote: Any reason the above are all "Vectorcall" and not "VectorCall"? You seem to potentially have that capitalization for "PyCall_MakeVectorCall" as mentioned below which seems to be asking for typos if there's going to be two ways to do it. :) "PyCall_MakeVectorCall" is a typo for "PyVectorcall_Call" (https://github.com/python/peps/pull/1037) Everything else uses "Vectorcall" or "VECTORCALL". In text, we use "vectorcall" without a space. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On 2019-05-09 20:30, Petr Viktorin wrote: ### Making things private For Python 3.8, the public API should be private, so the API can get some contact with the real world. I'd especially like to be able to learn from Cython's experience using it. That would mean: * _PyObject_Vectorcall * _PyCall_MakeVectorCall * _PyVectorcall_NARGS * _METH_VECTORCALL * _Py_TPFLAGS_HAVE_VECTORCALL * _Py_TPFLAGS_METHOD_DESCRIPTOR Do we really have to underscore the names? Would there be a way to mark this API as provisional and subject to change without changing the names? If it turns out that PEP 590 was perfect after all, then we're just breaking stuff in Python 3.9 (when removing the underscores) for no reason. Alternatively, could we keep the underscored names as official API in Python 3.9? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On Thu, May 9, 2019 at 11:31 AM Petr Viktorin wrote: > PEP 590 is on its way to be accepted, with some details still to be > discussed. I've rejected PEP 580 so we can focus on one place. > > Here are things we discussed on GitHub but now seem to agree on: > > * The vectorcall function's kwname argument can be NULL. > * Let's use `vectorcallfunc`, not `vectorcall`, and stop the bikeshedding. > * `tp_vectorcall_offset` can be `Py_ssize_t` (The discussions around > signedness and C standards and consistency are interesting, but > ultimately irrelevant here.) > * `PyCall_MakeTpCall` can be removed. > * `PyVectorcall_Function` (for getting the `vectorcallfunc` of an > object) can be an internal helper. External code should go through > `PyCall_Vectorcall` (whatever we name it). > * `PY_VECTORCALL_ARGUMENTS_OFFSET` is OK, bikeshedding over variants > like `PY_VECTORCALL_PREPEND` won't bring much benefit. > > Anyone against, make your point now :) > Any reason the above are all "Vectorcall" and not "VectorCall"? You seem to potentially have that capitalization for "PyCall_MakeVectorCall" as mentioned below which seems to be asking for typos if there's going to be two ways to do it. :) -Brett > > The following have discussion PRs open: > > * `PyCall_MakeVectorCall` name: https://github.com/python/peps/pull/1037 > * Passing a dict to `PyObject_Vectorcall`: > https://github.com/python/peps/pull/1038 > * Type of the kwnames argument (PyObject/PyTupleObject): > https://github.com/python/peps/pull/1039 > > > The remaining points are: > > > ### Making things private > > For Python 3.8, the public API should be private, so the API can get > some contact with the real world. I'd especially like to be able to > learn from > Cython's experience using it. > That would mean: > > * _PyObject_Vectorcall > * _PyCall_MakeVectorCall > * _PyVectorcall_NARGS > * _METH_VECTORCALL > * _Py_TPFLAGS_HAVE_VECTORCALL > * _Py_TPFLAGS_METHOD_DESCRIPTOR > > > ### Can the kwnames tuple be empty? > > Disallowing empty tuples means it's easier for the *callee* to detect > the case of no keyword arguments. Instead of: > > if (kwnames != NULL && PyTuple_GET_SIZE(kwnames)) > > you have: > > if (kwnames != NULL) > > On the other hand, the *caller* would now be responsible for handling > the no-kwarg case specially. > > Jeroen points out: > > The side of the caller (which should ensure not to send an empty tuple) > > is CPython and there the issue of people implementing the protocol > wrongly > > doesn't arise. > > External C code is not expected to manually use tp_vectorcall_offset to > make > > vectorcalls: it is expected to use an API like PyCall_Vectorcall() and > that > > API will ensure to replace an empty tuple with NULL. > > > > I see it as an application of > https://en.wikipedia.org/wiki/Robustness_principle > > (Be conservative in what you send, be liberal in what you accept): > > PyCall_Vectorcall should accept an empty tuple but it should not send an > > empty tuple to the vectorcall function. > > But, if you apply the robustness principle to vectorcallfunc, it > should accept empty tuples. > > > ### `METH_VECTORCALL` function type > > Jeroen suggested changing this from: > > `PyObject *(*call) (PyObject *self, PyObject *const *args, > Py_ssize_t nargs, PyObject *kwname)` > > to `vectorcallfunc`, i.e.: > > `PyObject *(*call) (PyObject *callable, Py_ssize_t n, PyObject > *const *args, PyObject *kwnames)` > > Mark argues that this is a major change and prevents the interpreter > from sanity checking the return value of PyMethodDef defined > functions. > (Since the functions are defined by extension code, they need to be > sanity-checked, and this will be done by PyCFunction's vectorcall > adapter. Tools like Cython can bypass the check if needed.) > > The underlying C function should not need to know how to extract > "self" from the function object, or how to handle the argument > offsetting. > Those should be implementation details. > > I see the value in having METH_VECTORCALL equivalent to the existing > METH_FASTCALL|METH_KEYWORDS. > (Even though PEP 573 will need to add to the calling convention.) > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
PEP 590 is on its way to be accepted, with some details still to be discussed. I've rejected PEP 580 so we can focus on one place. Here are things we discussed on GitHub but now seem to agree on: * The vectorcall function's kwname argument can be NULL. * Let's use `vectorcallfunc`, not `vectorcall`, and stop the bikeshedding. * `tp_vectorcall_offset` can be `Py_ssize_t` (The discussions around signedness and C standards and consistency are interesting, but ultimately irrelevant here.) * `PyCall_MakeTpCall` can be removed. * `PyVectorcall_Function` (for getting the `vectorcallfunc` of an object) can be an internal helper. External code should go through `PyCall_Vectorcall` (whatever we name it). * `PY_VECTORCALL_ARGUMENTS_OFFSET` is OK, bikeshedding over variants like `PY_VECTORCALL_PREPEND` won't bring much benefit. Anyone against, make your point now :) The following have discussion PRs open: * `PyCall_MakeVectorCall` name: https://github.com/python/peps/pull/1037 * Passing a dict to `PyObject_Vectorcall`: https://github.com/python/peps/pull/1038 * Type of the kwnames argument (PyObject/PyTupleObject): https://github.com/python/peps/pull/1039 The remaining points are: ### Making things private For Python 3.8, the public API should be private, so the API can get some contact with the real world. I'd especially like to be able to learn from Cython's experience using it. That would mean: * _PyObject_Vectorcall * _PyCall_MakeVectorCall * _PyVectorcall_NARGS * _METH_VECTORCALL * _Py_TPFLAGS_HAVE_VECTORCALL * _Py_TPFLAGS_METHOD_DESCRIPTOR ### Can the kwnames tuple be empty? Disallowing empty tuples means it's easier for the *callee* to detect the case of no keyword arguments. Instead of: if (kwnames != NULL && PyTuple_GET_SIZE(kwnames)) you have: if (kwnames != NULL) On the other hand, the *caller* would now be responsible for handling the no-kwarg case specially. Jeroen points out: > The side of the caller (which should ensure not to send an empty tuple) > is CPython and there the issue of people implementing the protocol wrongly > doesn't arise. > External C code is not expected to manually use tp_vectorcall_offset to make > vectorcalls: it is expected to use an API like PyCall_Vectorcall() and that > API will ensure to replace an empty tuple with NULL. > > I see it as an application of > https://en.wikipedia.org/wiki/Robustness_principle > (Be conservative in what you send, be liberal in what you accept): > PyCall_Vectorcall should accept an empty tuple but it should not send an > empty tuple to the vectorcall function. But, if you apply the robustness principle to vectorcallfunc, it should accept empty tuples. ### `METH_VECTORCALL` function type Jeroen suggested changing this from: `PyObject *(*call) (PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwname)` to `vectorcallfunc`, i.e.: `PyObject *(*call) (PyObject *callable, Py_ssize_t n, PyObject *const *args, PyObject *kwnames)` Mark argues that this is a major change and prevents the interpreter from sanity checking the return value of PyMethodDef defined functions. (Since the functions are defined by extension code, they need to be sanity-checked, and this will be done by PyCFunction's vectorcall adapter. Tools like Cython can bypass the check if needed.) The underlying C function should not need to know how to extract "self" from the function object, or how to handle the argument offsetting. Those should be implementation details. I see the value in having METH_VECTORCALL equivalent to the existing METH_FASTCALL|METH_KEYWORDS. (Even though PEP 573 will need to add to the calling convention.) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On 5/6/19 3:43 AM, Jeroen Demeyer wrote: On 2019-05-06 00:04, Petr Viktorin wrote: - Single bound method class for all kinds of function classes: This would be a cleaner design, yes, but I don't see a pressing need. As PEP 579 says, "this is a compounding issue", not a goal. As I recall, that is the only major reason for CCALL_DEFARG. Just a minor correction here: I guess that you mean CCALL_SELFARG. The flag CCALL_DEFARG is for passing the PyCCallDef* in PEP 580, which is mostly equivalent to passing the callable object in PEP 590. The signature of PEP 580 is func(const PyCCallDef *def, PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) And with PEP 590 it is func(PyObject *callable, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) with the additional special role for the PY_VECTORCALL_ARGUMENTS_OFFSET bit (which is meant to solve the problem of "self" in a different way). I worded that badly, sorry. From PEP 590's `callable`, the called function can get any of these if it needs to (and if they're stored somewhere). But you can't write generic code would get them from any callable. If we're not going for the "single bound method class" idea, that is OK; `def` & `self` can be implementation details of the callables that need them. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On 5/6/19 4:24 AM, Jeroen Demeyer wrote: Hello Petr, Thanks for your time. I suggest you (or somebody else) to officially reject PEP 580. I'll do that shortly. I hope that you are not taking this personally. PEP 580 is a good design. PEP 590 even says that it's built on your ideas. I start working on reformulating PEP 590, adding some elements from PEP 580. At the same time, I work on the implementation of PEP 590. I want to implement Mark's idea of having a separate wrapper for each old-style calling convention. In the mean time, we can continue the discussion about the details, for example whether to store the flags inside the instance (I don't have an answer for that right now, I'll need to think about it). I'm abandoning per-instance flag proposal. It's an unnecessary complication; per-type flags are fine. Petr, did you discuss with the Steering Council? It would be good to have some kind of pre-approval that PEP 590 and its implementation will be accepted. I want to work on PEP 590, but I'm not the right person to "defend" it (I know that it's worse in some ways than PEP 580). As BDFL-delegate, I'm "pre-approving" PEP 590. I mentioned some details of PEP 590 that still need attention. If there are any more, now's the time to bring them up. And yes, I know that in some ways it's worse than PEP 580. That's what makes it a hard decision. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
Hello Petr, Thanks for your time. I suggest you (or somebody else) to officially reject PEP 580. I start working on reformulating PEP 590, adding some elements from PEP 580. At the same time, I work on the implementation of PEP 590. I want to implement Mark's idea of having a separate wrapper for each old-style calling convention. In the mean time, we can continue the discussion about the details, for example whether to store the flags inside the instance (I don't have an answer for that right now, I'll need to think about it). Petr, did you discuss with the Steering Council? It would be good to have some kind of pre-approval that PEP 590 and its implementation will be accepted. I want to work on PEP 590, but I'm not the right person to "defend" it (I know that it's worse in some ways than PEP 580). Jeroen. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On 2019-05-06 00:04, Petr Viktorin wrote: - Single bound method class for all kinds of function classes: This would be a cleaner design, yes, but I don't see a pressing need. As PEP 579 says, "this is a compounding issue", not a goal. As I recall, that is the only major reason for CCALL_DEFARG. Just a minor correction here: I guess that you mean CCALL_SELFARG. The flag CCALL_DEFARG is for passing the PyCCallDef* in PEP 580, which is mostly equivalent to passing the callable object in PEP 590. The signature of PEP 580 is func(const PyCCallDef *def, PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) And with PEP 590 it is func(PyObject *callable, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) with the additional special role for the PY_VECTORCALL_ARGUMENTS_OFFSET bit (which is meant to solve the problem of "self" in a different way). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
Hello! Sorry for the delay; PyCon is keeping me busy. On the other hand, I did get to talk to a lot of smart people here! I'm leaning toward accepting PEP 590 (with some changes still). Let's start focusing on it. As for the changes, I have these 4 points: I feel that the API needs some contact with real users before it's set in stone. That was the motivation behind my proposal for PEP 590 with additional flags. At PyCon, Nick Coghlan suggested another option make the API "provisional": make it formally private. Py_TPFLAGS_HAVE_VECTORCALL would be underscore-prefixed, and the docs would say that it can change. in Python 3.9, the semantics will be finalized and the underscore removed. This would allow high-maintenance projects (like Cython) to start using it and give their feedback, and we'd have a chance to respond to the feedback. tp_vectorcall_offset should be what's replacing tp_print in the struct. The current implementation has tp_vectorcall there. This way, Cython can create vectorcall callables for older Pythons. (See PEP 580: https://www.python.org/dev/peps/pep-0580/#replacing-tp-print). Subclassing should not be forbidden. Jeroen, do you want write a section for how subclassing should work? Given Jeroen's research and ideas that went into the PEP (and hopefully, we'll incorporate some PEP 580 text as well), it seems fair to list him as co-author of the accepted PEP, instead of just listing PEP 580 in the acknowledgement section. On some other points: - Single bound method class for all kinds of function classes: This would be a cleaner design, yes, but I don't see a pressing need. As PEP 579 says, "this is a compounding issue", not a goal. As I recall, that is the only major reason for CCALL_DEFARG. PEP 590 says that x64 Windows passes 4 arguments in registers. Admittedly, I haven't checked this, nor the performance implications (so this would be a good point to argue!), but it seems like a good reason to keep the argument count down. So, no CCALL_DEFARG. - In reply to this Mark's note: > PEP 590 is fully universal, it supports callables that can do anything with > anything. There is no need for it to be extended because it already supports > any possible behaviour. I don't buy this point. The current tp_call also supports any possible behavior. Here we want to support any behavior *efficiently*. As a specific example: for calling PEP 590 callable with a kwarg dict, there'll need to be an extra allocation. That's inefficient relative to PEP 580 (or PEP 590 plus allowing a dict in "kwnames"). But I'm willing to believe the inefficiency is acceptable. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
Hi Jeroen, On 25/04/2019 3:42 pm, Jeroen Demeyer wrote: On 2019-04-25 00:24, Petr Viktorin wrote: I believe we can achieve that by having PEP 590's (o+offset) point not just to function pointer, but to a {function pointer; flags} struct with flags defined for two optimizations: What's the rationale for putting the flags in the instance? Do you expect flags to be different between one instance and another instance of the same class? Both type flags and nargs bits are very limited resources. Type flags are only a limited resource if you think that all flags ever added to a type must be put into tp_flags. There is nothing wrong with adding new fields tp_extraflags or tp_vectorcall_flags to a type. What I don't like about it is that it has the extensions built-in; mandatory for all callers/callees. I don't agree with the above sentence about PEP 580: - callers should use APIs like PyCCall_FastCall() and shouldn't need to worry about the implementation details at all. - callees can opt out of all the extensions by not setting any special flags and setting cr_self to a non-NULL value. When using the flags CCALL_FASTCALL | CCALL_KEYWORDS, then implementing the callee is exactly the same as PEP 590. As in PEP 590, any class that uses this mechanism shall not be usable as a base class. Can we please lift this restriction? There is really no reason for it. I'm not aware of any similar restriction anywhere in CPython. Note that allowing subclassing is not the same as inheriting the protocol. As a compromise, we could simply never inherit the protocol. AFAICT, any limitations on subclassing exist solely to prevent tp_call and the PEP 580/590 function pointer being in conflict. This limitation is inherent and the same for both PEPs. Do you agree? Let us conside a class C that sets the Py_TPFLAGS_HAVE_CCALL/Py_TPFLAGS_HAVE_VECTORCALL flag. It will set the function pointer in a new instance, C(), when the object is created. If we create a new class D: class D(C): __call__(self, ...): ... and then create an instance `d = D()` then calling d will have two contradictory behaviours; the one installed by C in the function pointer and the one specified by D.__call__ We can ensure correct behaviour by setting the function pointer to NULL or a forwarding function (depending on the implementation) if __call__ has been overridden. This would be enforced at class creation/readying time. Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On 2019-04-27 14:07, Mark Shannon wrote: class D(C): __call__(self, ...): ... and then create an instance `d = D()` then calling d will have two contradictory behaviours; the one installed by C in the function pointer and the one specified by D.__call__ It's true that the function pointer in D will be wrong but it's also irrelevant since the function pointer won't be used: class D won't have the flag Py_TPFLAGS_HAVE_CCALL/Py_TPFLAGS_HAVE_VECTORCALL set. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
Hi Petr, On 24/04/2019 11:24 pm, Petr Viktorin wrote: So, I spent another day pondering the PEPs. I love PEP 590's simplicity and PEP 580's extensibility. As I hinted before, I hope they can they be combined, and I believe we can achieve that by having PEP 590's (o+offset) point not just to function pointer, but to a {function pointer; flags} struct with flags defined for two optimizations: - "Method-like", i.e. compatible with LOAD_METHOD/CALL_METHOD. - "Argument offsetting request", allowing PEP 590's PY_VECTORCALL_ARGUMENTS_OFFSET optimization. A big problem with adding another field to the structure is that it prevents classes from implementing vectorcall. A 30% reduction in the time to create ranges, small lists and sets and to call type(x) is easily worth the a single tp_flag, IMO. As an aside, there are currently over 10 spare flags. As long we don't consume more that one a year, we have over a decade to make tp_flags a uint64_t. It already consumes 64 bits on any 64 bit machine, due to the struct layout. As I've said before, PEP 590 is universal and capable of supporting an implementation of PEP 580 on top of it. Therefore, adding any flags or fields from PEP 580 to PEP 590 will not increase its capability. Since any extra fields will require at least as many memory accesses as before, it will not improve performance and by restricting layout may decrease it. This would mean one basic call signature (today's METH_FASTCALL | METH_KEYWORD), with individual optimizations available if both the caller and callee support them. That would prevent the code having access to the callable object. That access is a fundamental part of both PEP 580 and PEP 590 and the key motivating factor for both. In case you want to know my thoughts or details, let me indulge in some detailed comparisons and commentary that led to this. I also give a more detailed proposal below. Keep in mind I wrote this before I distilled it to the paragraph above, and though the distillation is written as a diff to PEP 590, I still think of this as merging both PEPs. PEP 580 tries hard to work with existing call conventions (like METH_O, METH_VARARGS), making them fast. PEP 590 just defines a new convention. Basically, any callable that wants performance improvements must switch to METH_VECTORCALL (fastcall). I believe PEP 590's approach is OK. To stay as performant as possible, C extension authors will need to adapt their code regularly. If they don't, no harm -- the code will still work as before, and will still be about as fast as it was before. As I see it, authors of C extensions have five options with PEP 590. Option 4, do nothing, is the recommended option :) 1. Use the PyMethodDef protocol, it will work exactly the same as before. It's already fairly quick in most cases. 2. Use Cython and let Cython take care of handling the vectorcall interface. 3. Use Argument Clinic, and let Argument Clinic take care of handling the vectorcall interface. 4. Do nothing. This the same as 1-3 above depending on what you were already doing. 5. Implement the vectorcall call directly. This might be a bit quicker than the above, but probably not enough to be worth it, unless you are implementing numpy or something like that. In exchange for this, Python (and Cython, etc.) can focus on optimizing one calling convention, rather than a variety, each with its own advantages and drawbacks. Extending PEP 580 to support a new calling convention will involve defining a new CCALL_* constant, and adding to existing dispatch code. Extending PEP 590 to support a new calling convention will most likely require a new type flag, and either changing the vectorcall semantics or adding a new pointer. To be a bit more concrete, I think of possible extensions to PEP 590 as things like: - Accepting a kwarg dict directly, without copying the items to tuple/array (as in PEP 580's CCALL_VARARGS|CCALL_KEYWORDS) - Prepending more than one positional argument, or appending positional arguments - When an optimization like LOAD_METHOD/CALL_METHOD turns out to no longer be relevant, removing it to simplify/speed up code. I expect we'll later find out that something along these lines might improve performance. PEP 590 would make it hard to experiment. I mentally split PEP 590 into two pieces: formalizing fastcall, plus one major "extension" -- making bound methods fast. Not just bound methods, any callable that adds an extra argument before dispatching to another callable. This includes builtin-methods, classes and a few others. Setting the Py_TPFLAGS_METHOD_DESCRIPTOR flag states the behaviour of the object when used as a descriptor. It is up to the implementation to use that information how it likes. If LOAD_METHOD/CALL_METHOD gets replaced, then the new implementation can still use this information. When seen this way, this "extension" is quite heavy: it adds an additional type flag, Py_TPFL
Re: [Python-Dev] PEP 580/590 discussion
Hello, after reading the various comments and thinking about it more, let me propose a real compromise between PEP 580 and PEP 590. My proposal is: take the general framework of PEP 580 but support only a single calling convention like PEP 590. The single calling convention supported would be what is currently specified by the flag combination CCALL_DEFARG|CCALL_FASTCALL|CCALL_KEYWORDS. This way, the flags CCALL_VARARGS, CCALL_FASTCALL, CCALL_O, CCALL_NOARGS, CCALL_KEYWORDS, CCALL_DEFARG can be dropped. This calling convention is very similar to the calling convention of PEP 590, except that: - the callable is replaced by a pointer to a PyCCallDef (the structure from PEP 580, but possibly without cc_parent) - there is a self argument like PEP 580. This implies support for the CCALL_SELFARG flag from PEP 580 and no support for the PY_VECTORCALL_ARGUMENTS_OFFSET trick of PEP 590. Background: I added support for all those calling conventions in PEP 580 because I didn't want to make any compromise regarding performance. When writing PEP 580, I assumed that any kind of performance regression would be a reason to reject PEP 580. However, it seems now that you're willing to accept PEP 590 instead which does introduce performance regressions in certain code paths. So that suggests that we could keep the good parts of PEP 580 but reduce its complexity by having a single calling convention like PEP 590. If you compare this compromise to PEP 590, the main difference is dealing with bound methods. Personally, I really like the idea of having a *single* bound method class which would be used by all kinds of function classes without any loss of performance (not only in CPython itself, but also by Cython and other C extensions). To support that, we need something like the PyCCallRoot structure from PEP 580, together with the special handling for self. About cc_parent and CCALL_OBJCLASS: I prefer to keep that because it allows to merge classes for bare functions (not inside a class) and unbound methods (functions inside a class). Concretely, that could reduce code duplication between builtin_function_or_method and method_descriptor. But I'm also fine with removing cc_parent and CCALL_OBJCLASS. In any case, we can decide that later. What do you think? Jeroen. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On 4/25/19 10:42 AM, Jeroen Demeyer wrote: On 2019-04-25 00:24, Petr Viktorin wrote: I believe we can achieve that by having PEP 590's (o+offset) point not just to function pointer, but to a {function pointer; flags} struct with flags defined for two optimizations: What's the rationale for putting the flags in the instance? Do you expect flags to be different between one instance and another instance of the same class? I'm not tied to that idea. If there's a more reasonable place to put the flags, let's go for it, but it's not a big enough issue so it shouldn't complicate the protocol much. Quoting Mark from the other subthread: Callables are either large or transient. If large, then the extra few bytes makes little difference. If transient then, it matters even less. Both type flags and nargs bits are very limited resources. Type flags are only a limited resource if you think that all flags ever added to a type must be put into tp_flags. There is nothing wrong with adding new fields tp_extraflags or tp_vectorcall_flags to a type. Indeed. Extra flags are just what I think PEP 590 is missing. What I don't like about it is that it has the extensions built-in; mandatory for all callers/callees. I don't agree with the above sentence about PEP 580: - callers should use APIs like PyCCall_FastCall() and shouldn't need to worry about the implementation details at all. - callees can opt out of all the extensions by not setting any special flags and setting cr_self to a non-NULL value. When using the flags CCALL_FASTCALL | CCALL_KEYWORDS, then implementing the callee is exactly the same as PEP 590. Imagine an extension author sitting down to read the docs and implement a callable: - PEP 580 introduces 6 CCALL_* combinations: you need to select the best one for your use case. Also, add two structs to the instance & link them via pointers, make sure you support descriptor behavior and the __name__ attribute. (Plus there are features for special purposes: CCALL_DEFARG, CCALL_OBJCLASS, self-slicing, but you can skip that initially.) - My proposal: to the instance, add a function pointer with known signature and flags which you set to zero. Add an offset to the type, and set a type flag. (There are additional possible optimizations, but you can skip them initially.) PEP 580 makes a lot of sense if you read it all, but I fear there'll be very few people who read and understand it. And is not important just for extension authors (admittedly, implementing a callable directly using the C API is often a bad idea). The more people understand the mechanism, the more people can help with further improvements. I don't see the benefit of supporting METH_VARARGS, METH_NOARGS, and METH_O calling conventions (beyond backwards compatibility and comptibility with Python's *args syntax). For keywords, I see a benefit in supporting *only one* of kwarg dict or kwarg tuple: if the caller and callee don't agree on which one to use, you need an expensive conversion. If we say tuple is the way, some of them will need to adapt, but within the set of those that do it any caller/callee combination will be fast. (And if tuple only turns out to be wrong choice, adding dict support in the future shouldn't be hard.) That leaves fastcall (with tuple only) as the focus of this PEP, and the other calling conventions essentially as implementation details of builtin functions/methods. As in PEP 590, any class that uses this mechanism shall not be usable as a base class. Can we please lift this restriction? There is really no reason for it. I'm not aware of any similar restriction anywhere in CPython. Note that allowing subclassing is not the same as inheriting the protocol. Sure, let's use PEP 580 treatment of inheritance. Even if we don't, I don't think dropping this restriction would be a PEP-level change. It can be dropped as soon as an implementation and tests are ready, and inheritance issues ironed out. But it doesn't need to be in the initial implementation. As a compromise, we could simply never inherit the protocol. That also sounds reasonable for the initial implementation. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
On 2019-04-25 00:24, Petr Viktorin wrote: I believe we can achieve that by having PEP 590's (o+offset) point not just to function pointer, but to a {function pointer; flags} struct with flags defined for two optimizations: What's the rationale for putting the flags in the instance? Do you expect flags to be different between one instance and another instance of the same class? Both type flags and nargs bits are very limited resources. Type flags are only a limited resource if you think that all flags ever added to a type must be put into tp_flags. There is nothing wrong with adding new fields tp_extraflags or tp_vectorcall_flags to a type. What I don't like about it is that it has the extensions built-in; mandatory for all callers/callees. I don't agree with the above sentence about PEP 580: - callers should use APIs like PyCCall_FastCall() and shouldn't need to worry about the implementation details at all. - callees can opt out of all the extensions by not setting any special flags and setting cr_self to a non-NULL value. When using the flags CCALL_FASTCALL | CCALL_KEYWORDS, then implementing the callee is exactly the same as PEP 590. As in PEP 590, any class that uses this mechanism shall not be usable as a base class. Can we please lift this restriction? There is really no reason for it. I'm not aware of any similar restriction anywhere in CPython. Note that allowing subclassing is not the same as inheriting the protocol. As a compromise, we could simply never inherit the protocol. Jeroen. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 580/590 discussion
So, I spent another day pondering the PEPs. I love PEP 590's simplicity and PEP 580's extensibility. As I hinted before, I hope they can they be combined, and I believe we can achieve that by having PEP 590's (o+offset) point not just to function pointer, but to a {function pointer; flags} struct with flags defined for two optimizations: - "Method-like", i.e. compatible with LOAD_METHOD/CALL_METHOD. - "Argument offsetting request", allowing PEP 590's PY_VECTORCALL_ARGUMENTS_OFFSET optimization. This would mean one basic call signature (today's METH_FASTCALL | METH_KEYWORD), with individual optimizations available if both the caller and callee support them. In case you want to know my thoughts or details, let me indulge in some detailed comparisons and commentary that led to this. I also give a more detailed proposal below. Keep in mind I wrote this before I distilled it to the paragraph above, and though the distillation is written as a diff to PEP 590, I still think of this as merging both PEPs. PEP 580 tries hard to work with existing call conventions (like METH_O, METH_VARARGS), making them fast. PEP 590 just defines a new convention. Basically, any callable that wants performance improvements must switch to METH_VECTORCALL (fastcall). I believe PEP 590's approach is OK. To stay as performant as possible, C extension authors will need to adapt their code regularly. If they don't, no harm -- the code will still work as before, and will still be about as fast as it was before. In exchange for this, Python (and Cython, etc.) can focus on optimizing one calling convention, rather than a variety, each with its own advantages and drawbacks. Extending PEP 580 to support a new calling convention will involve defining a new CCALL_* constant, and adding to existing dispatch code. Extending PEP 590 to support a new calling convention will most likely require a new type flag, and either changing the vectorcall semantics or adding a new pointer. To be a bit more concrete, I think of possible extensions to PEP 590 as things like: - Accepting a kwarg dict directly, without copying the items to tuple/array (as in PEP 580's CCALL_VARARGS|CCALL_KEYWORDS) - Prepending more than one positional argument, or appending positional arguments - When an optimization like LOAD_METHOD/CALL_METHOD turns out to no longer be relevant, removing it to simplify/speed up code. I expect we'll later find out that something along these lines might improve performance. PEP 590 would make it hard to experiment. I mentally split PEP 590 into two pieces: formalizing fastcall, plus one major "extension" -- making bound methods fast. When seen this way, this "extension" is quite heavy: it adds an additional type flag, Py_TPFLAGS_METHOD_DESCRIPTOR, and uses a bit in the "Py_ssize_t nargs" argument as additional flag. Both type flags and nargs bits are very limited resources. If I was sure vectorcall is the final best implementation we'll have, I'd go and approve it – but I think we still need room for experimentation, in the form of more such extensions. PEP 580, with its collection of per-instance data and flags, is definitely more extensible. What I don't like about it is that it has the extensions built-in; mandatory for all callers/callees. PEP 580 adds a common data struct to callable instances. Currently these are all data bound methods want to use (cc_flags, cc_func, cc_parent, cr_self). Various flags are consulted in order to deliver the needed info to the underlying function. PEP 590 lets the callable object store data it needs independently. It provides a clever mechanism for pre-allocating space for bound methods' prepended "self" argument, so data can be provided cheaply, though it's still done by the callable itself. Callables that would need to e.g. prepend more than one argument won't be able to use this mechanism, but y'all convinced me that is not worth optimizing for. PEP 580's goal seems to be that making a callable behave like a Python function/method is just a matter of the right set of flags. Jeroen called this "complexity in the protocol". PEP 590, on the other hand, leaves much to individual callable types. This is "complexity in the users of the protocol". I now don't see a problem with PEP 590's approach. Not all users will need the complexity. We need to give CPython and Cython the tools to make implementing "def"-like functions possible (and fast), but if other extensions need to match the behavior of Python functions, they should just use Cython. Emulating Python functions is a special-enough use case that it doesn't justify complicating the protocol, and the same goes for implementing Python's built-in functions (with all their historical baggage). My more full proposal for a compromise between PEP 580 and 590 would go something like below. The type flag (Py_TPFLAGS_HAVE_VECTORCALL/Py_TPFLAGS_HAVE_CCALL) and offset (tp_vectorcall_offset/tp
Re: [Python-Dev] PEP 580/590 discussion
On 2019-04-02 21:38, Mark Shannon wrote: Hi, On 01/04/2019 6:31 am, Jeroen Demeyer wrote: I added benchmarks for PEP 590: https://gist.github.com/jdemeyer/f0d63be8f30dc34cc989cd11d43df248 Thanks. As expected for calls to C function for both PEPs and master perform about the same, as they are using almost the same calling convention under the hood. While they are "about the same", in general PEP 580 is slightly faster than master and PEP 590. And PEP 590 actually has a minor slow-down for METH_VARARGS calls. I think that this happens because PEP 580 has less levels of indirection than PEP 590. The vectorcall protocol (PEP 590) changes a slower level (tp_call) by a faster level (vectorcall), while PEP 580 just removes that level entirely: it calls the C function directly. This shows that PEP 580 is really meant to have maximal performance in all cases, accidentally even making existing code faster. Jeroen. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
Hi, On 01/04/2019 6:31 am, Jeroen Demeyer wrote: I added benchmarks for PEP 590: https://gist.github.com/jdemeyer/f0d63be8f30dc34cc989cd11d43df248 Thanks. As expected for calls to C function for both PEPs and master perform about the same, as they are using almost the same calling convention under the hood. As an example of the advantage that a general fast calling convention gives you, I have implemented the vectorcall versions of list() and range() https://github.com/markshannon/cpython/compare/vectorcall-minimal...markshannon:vectorcall-examples Which gives a roughly 30% reduction in time for creating ranges, or lists from small tuples. https://gist.github.com/markshannon/5cef3a74369391f6ef937d52cca9bfc8 Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 580/590 discussion
I added benchmarks for PEP 590: https://gist.github.com/jdemeyer/f0d63be8f30dc34cc989cd11d43df248 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 580/590 discussion
On 2019-03-30 17:30, Mark Shannon wrote: 2. The claim that PEP 580 allows "certain optimizations because other code can make assumptions" is flawed. In general, the caller cannot make assumptions about the callee or vice-versa. Python is a dynamic language. PEP 580 is meant for extension classes, not Python classes. Extension classes are not dynamic. When you implement tp_call in a given way, the user cannot change it. So if a class implements the C call protocol or the vectorcall protocol, callers can make assumptions about what that means. PEP 579 is mainly a list of supposed flaws with the 'builtin_function_or_method' class. The general thrust of PEP 579 seems to be that builtin-functions and builtin-methods should be more flexible and extensible than they are. I don't agree. If you want different behaviour, then use a different object. Don't try an cram all this extra behaviour into a pre-existing object. I think that there is a misunderstanding here. I fully agree with the "use a different object" solution. This isn't a new solution: it's already possible to implement those different objects (Cython does it). It's just that this solution comes at a performance cost and that's what we want to avoid. I'll reiterate that PEP 590 is more general than PEP 580 and that once the callable's code has access to the callable object (as both PEPs allow) then anything is possible. You can't can get more extensible than that. I would argue the opposite: PEP 590 defines a fixed protocol that is not easy to extend. PEP 580 on the other hand uses a new data structure PyCCallDef which could easily be extended in the future (this will intentionally never be part of the stable ABI, so we can do that). I have also argued before that the generality of PEP 590 is a bad thing rather than a good thing: by defining a more rigid protocol as in PEP 580, more optimizations are possible. PEP 580 has the same limitation for the same reasons. The limitation is necessary for correctness if an object supports calls via `__call__` and through another calling convention. I don't think that this limitation is needed in either PEP. As I explained at the top of this email, it can easily be solved by not using the protocol for Python classes. What is wrong with my proposal in PEP 580: https://www.python.org/dev/peps/pep-0580/#inheritance Jeroen. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com