On 2018-11-16, Brett Cannon wrote: > I think part of the challenge here (and I believe it has been > brought up elsewhere) is no one knows what kind of API is > necessary for some faster VM other than PyPy.
I think we have some pretty good ideas as to what are the problematic parts of the current API. Victor's C-API web site has details[1]. We can ask other implementors which parts are hard to support. Here are my thoughts about some desired changes: - We are *not* getting rid of refcounting for extension modules. That would require a whole new API. We might as well start from scratch with Python 4. No one wants that. However, it is likely different VMs use a different GC internally and only use refcounting for objects passed through the C-API. Using refcounted handles is the usual implementation approach. We can make some changes to make that easier. I think making PyObject an opaque pointer would help. - Borrowed references are a problem. However, because they are so commonly used and because the source code changes needed to change to a non-borrowed API is non-trivial, I don't think we should try to change this. Maybe we could just discourage their use? For CPython, using a borrowed reference API is faster. For other Python implementations, it is likely slower and maybe much slower. So, if you are an extension module that wants to work well with other VMs, you should avoid those APIs. - It would be nice to make PyTypeObject an opaque pointer as well. I think that's a lot more difficult than making PyObject opaque. So, I don't think we should attempt it in the near future. Maybe we could make a half-way step and discourage accessing ob_type directly. We would provide functions (probably inline) to do what you would otherwise do by using op->ob_type-><something>. One reason you want to discourage access to ob_type is that internally there is not necessarily one PyTypeObject structure for each Python level type. E.g. the VM might have specialized types for certain sub-domains. This is like the different flavours of strings, depending on the set of characters stored in them. Or, you could have different list types. One type of list if all values are ints, for example. Basically, with CPython op->ob_type is super fast. For other VMs, it could be a lot slower. By accessing ob_type you are saying "give me all possible type information for this object pointer". By using functions to get just what you need, you could be putting less burden on the VM. E.g. "is this object an instance of some type" is faster to compute. - APIs that return pointers to the internals of objects are a problem. E.g. PySequence_Fast_ITEMS(). For CPython, this is really fast because it is just exposing the internal details of the layout that is already in the correct format. For other VMs, that API could be expensive to emulate. E.g. you have a list to store only ints. If someone calls PySequence_Fast_ITEMS(), you have to create real PyObjects for all of the list elements. - Reducing the size of the API seems helpful. E.g. we don't need PyObject_CallObject() *and* PyObject_Call(). Also, do we really need all the type specific APIs, PyList_GetItem() vs PyObject_GetItem()? In some cases maybe we can justify the bigger API due to performance. To add a new API, someone should have a benchmark that shows a real speedup (not just that they imagine it makes a difference). I don't think we should change CPython internals to try to use this new API. E.g. we know that getting ob_type is fast so just leave the code that does that alone. Maybe in the far distant future, if we have successfully got extension modules to switch to using the new API, we could consider changing CPython internals. There would have to be a big benefit though to justify the code churn. E.g. if my tagged pointers experiment shows significant performance gains (it hasn't yet). I like Nathaniel Smith's idea of doing the new API as a separate project, outside the cpython repo. It is possible that in that effort, we would like some minor changes to cpython in order to make the new API more efficient, for example. Those should be pretty limited changes because we are hoping that the new API will work on top of old Python versions, e.g. 3.6. To avoid exposing APIs that should be hidden, re-organizing include files is an idea. However, that doesn't help for old versions of Python. So, I'm thinking that Dino's idea of just duplicating the prototypes would be better. We would like a minimal API and so the number of duplicated prototypes shouldn't be too large. Victor's recent work in changing some macros to inline functions is not really related to the new API project, IMHO. I don't think there is a problem to leave an existing macro as a macro. If we need to introduce new APIs, e.g. to help hide PyTypeObject, those APIs could use inline functions. That way, if using CPython then using the new API would be just as fast as accessing ob_type directly. You are getting an essentially zero cost abstraction. For the limited API builds, maybe it would be okay to change the inline functions into non-inlined versions (same function name). If the new API is going to be successful, it needs to be realatively easy to change extension source code to use it. E.g. replacing one function with another is pretty easy (PyObject_GetItem vs PyList_GetItem). If too many difficult changes are required, extensions are never going to get ported. The ported extension *must* be usable with older Python versions. That's a big mistake we made with the Python 2 to 3 migration. Let's not repeat it. Also, the extension module should not take a big performance hit. So, you can't change all APIs that were macros into non-inlined functions. People are not going to accept that and rightly so. However, it could be that we introduce a new ifdef like Py_LIMITED_API that gives a stable ABI. E.g. when that's enabled, most everything would turn into non-inline functions. In exchange for the performance hit, your extension would become ABI compatible between a range of CPython releases. That would be a nice feature. Basically a more useful version of Py_LIMITED_API. Regards, Neil 1. https://pythoncapi.readthedocs.io/bad_api.html _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com