Thanks for working on this, Marcel (and Petr). This looks like an ambitious intern project :) Couple of questions and comments in-line.
On Mon, Apr 23, 2018 at 12:36 PM, Marcel Plch <gmarcel.p...@gmail.com> wrote: > Hello, > I am an intern at Red Hat mentored by Petr Viktorin. As a part of my > internship, I learned the CPython internals and how to contribute > to the CPython interpreter. > > As a result, I have prepared PEP 573, which solves some problems > that PEP 489 (Multi-phase extension module initialization) has left open. > Specifically, this PEP proposes a way to access per-module state from > methods of > built-in and extension types. > Like PEP 489, it aims to make subinterpreter-friendly built-in/extension > modules > easier to create. > > A big problem found when converting many modules to PEP 489 multi-phase > initialization is subinterpreter-friendly access to exception > types defined in built-in/extension modules. > This PEP solves this by introducing "immutable exception types". > The current implementation requires one new type flag and two new > pointers in the heap type structure. > It should be possible to remove eiher the flag or one of the two pointers, > if we agree on the other mechanics in the PEP . > > > =================== > > PEP: 573 > Title: Module State Access from C Extension Methods > Version: $Revision$ > Last-Modified: $Date$ > Author: Petr Viktorin <encu...@gmail.com>, > Nick Coghlan <ncogh...@gmail.com>, > Eric Snow <ericsnowcurren...@gmail.com>, > Marcel Plch <gmarcel.p...@gmail.com> > Discussions-To: import-...@python.org > Status: Active > Type: Process > Content-Type: text/x-rst > Created: 02-Jun-2016 > Python-Version: 3.8 > Post-History: > > > Abstract > ======== > > This PEP proposes to add a way for CPython extension methods to access > context such as > the state of the modules they are defined in. > > This will allow extension methods to use direct pointer dereferences > rather than PyState_FindModule for looking up module state, reducing > or eliminating the > performance cost of using module-scoped state over process global state. > > This fixes one of the remaining roadblocks for adoption of PEP 3121 > (Extension > module initialization and finalization) and PEP 489 > (Multi-phase extension module initialization). > > Additionaly, support for easier creation of immutable exception > classes is added. > I'm not a fan of using 'immutable' here, or in the API function name. I understand the types are to some extent immutable (apart from their refcount, I assume), but I think it's going to be too easy to confuse it with types whose *instances* are immutable. (We do occasionally say things like "tuples are an immutable type".) Since the point is that they behave like statically defined ones, perhaps 'Static' would be a reasonable replacement. This removes the need for keeping per-module state if it would only be used > for exception classes. > > While this PEP takes an additional step towards fully solving the > problems that PEP 3121 and PEP 489 started > tackling, it does not attempt to resolve *all* remaining concerns. In > particular, accessing the module state from slot methods (``nb_add``, > etc) remains slower than accessing that state from other extension > methods. > > > Terminology > =========== > > Process-Global State > -------------------- > > C-level static variables. Since this is very low-level > memory storage, it must be managed carefully. > > Per-module State > ---------------- > > State local to a module object, allocated dynamically as part of a > module object's initialization. This isolates the state from other > instances of the module (including those in other subinterpreters). > > Accessed by ``PyModule_GetState()``. > > > Static Type > ----------- > > A type object defined as a C-level static variable, i.e. a compiled-in > type object. > > A static type needs to be shared between module instances and has no > information of what module it belongs to. > Static types do not have ``__dict__`` (although their instances might). > > Heap Type > --------- > > A type object created at run time. > > > Rationale > ========= > > PEP 489 introduced a new way to initialize extension modules, which brings > several advantages to extensions that implement it: > > * The extension modules behave more like their Python counterparts. > * The extension modules can easily support loading into pre-existing > module objects, which paves the way for extension module support for > ``runpy`` or for systems that enable extension module reloading. > * Loading multiple modules from the same extension is possible, which > makes testing module isolation (a key feature for proper > sub-interpreter > support) possible from a single interpreter. > > The biggest hurdle for adoption of PEP 489 is allowing access to module > state > from methods of extension types. > Currently, the way to access this state from extension methods is by > looking up the module via > ``PyState_FindModule`` (in contrast to module level functions in > extension modules, which > receive a module reference as an argument). > However, ``PyState_FindModule`` queries the thread-local state, making > it relatively > costly compared to C level process global access and consequently > deterring module authors from using it. > > Also, ``PyState_FindModule`` relies on the assumption that in each > subinterpreter, there is at most one module corresponding to > a given ``PyModuleDef``. This does not align well with Python's import > machinery. Since PEP 489 aimed to fix that, the assumption does > not hold for modules that use multi-phase initialization, so > ``PyState_FindModule`` is unavailable for these modules. > > A faster, safer way of accessing module-level state from extension methods > is needed. > > > Immutable Exception Types > ------------------------- > > For isolated modules to work, any class whose methods touch module state > must be a heap type, so that each instance of a module can have its own > type object. With the changes proposed in this PEP, heap type instances > will > have access to module state without global registration. But, to create > instances of heap types, one will need the module state in order to > get the type object corresponding to the appropriate module. > In short, heap types are "viral" – anything that “touches” them must > itself be > a heap type. > > Curently, most exception types, apart from the ones in ``builtins``, are > heap types. This is likely simply because there is a convenient way > to create them: ``PyErr_NewException``. > Heap types generally have a mutable ``__dict__``. > In most cases, this mutability is harmful. For example, exception types > from the ``sqlite`` module are mutable and shared across subinterpreters. > This allows "smuggling" values to other subinterpreters via attributes of > ``sqlite3.Error``. > > Moreover, since raising exceptions is a common operation, and heap types > will be "viral", ``PyErr_NewException`` will tend to "infect" the module > with "heap type-ness" – at least if the module decides play well with > subinterpreters/isolation. > Many modules could go without module state > entirely if the exception classes were immutable. > > To solve this problem, a new function for creating immutable exception > types > is proposed. > > > Background > =========== > > The implementation of a Python method may need access to one or more of > the following pieces of information: > > * The instance it is called on (``self``) > * The underlying function > * The class the method was defined in > * The corresponding module > * The module state > > In Python code, the Python-level equivalents may be retrieved as:: > > import sys > > def meth(self): > instance = self > module_globals = globals() > module_object = sys.modules[__name__] # (1) > underlying_function = Foo.meth # (1) > defining_class = Foo # (1) > defining_class = __class__ # (2) > > .. note:: > > The defining class is not ``type(self)``, since ``type(self)`` might > be a subclass of ``Foo``. > > The statements marked (1) implicitly rely on name-based lookup via the > function's ``__globals__``: > either the ``Foo`` attribute to access the defining class and Python > function object, or ``__name__`` to find the module object in > ``sys.modules``. > In Python code, this is feasible, as ``__globals__`` is set > appropriately when the function definition is executed, and > even if the namespace has been manipulated to return a different > object, at worst an exception will be raised. > > The ``__class__`` closure, (2), is a safer way to get the defining > class, but it still relies on ``__closure__`` being set appropriately. > > By contrast, extension methods are typically implemented as normal C > functions. > This means that they only have access to their arguments and C level > thread-local > and process-global states. Traditionally, many extension modules have > stored > their shared state in C-level process globals, causing problems when: > > * running multiple initialize/finalize cycles in the same process > * reloading modules (e.g. to test conditional imports) > * loading extension modules in subinterpreters > > PEP 3121 attempted to resolve this by offering the > ``PyState_FindModule`` API, but this still has significant problems > when it comes to extension methods (rather than module level > functions): > > * it is markedly slower than directly accessing C-level process-global > state > * there is still some inherent reliance on process global state > that means it still doesn't reliably handle module reloading > > It's also the case that when looking up a C-level struct such as > module state, supplying > an unexpected object layout can crash the interpreter, so it's > significantly more important to ensure that extension > methods receive the kind of object they expect. > > Proposal > ======== > > Currently, a bound extension method (``PyCFunction`` or > ``PyCFunctionWithKeywords``) receives only > ``self``, and (if applicable) the supplied positional and keyword > arguments. > > While module-level extension functions already receive access to the > defining module object via their > ``self`` argument, methods of extension types don't have that luxury: > they receive the bound instance > via ``self``, and hence have no direct access to the defining class or > the module level state. > > The additional module level context described above can be made > available with two changes. > Both additions are optional; extension authors need to opt in to start > using them: > > * Add a pointer to the module to heap type objects. > > * Pass the defining class to the underlying C function. > > The defining class is readily available at the time built-in > method object (``PyCFunctionObject``) is created, so it can be stored > in a new struct that extends ``PyCFunctionObject``. > > The module state can then be retrieved from the module object via > ``PyModule_GetState``. > > Note that this proposal implies that any type whose method needs to access > per-module state must be a heap type, rather than a static type. > > This is necessary to support loading multiple module objects from a single > extension: a static type, as a C-level global, has no information about > which module it belongs to. > > > Slot methods > ------------ > > The above changes don't cover slot methods, such as ``tp_iter`` or > ``nb_add``. > > The problem with slot methods is that their C API is fixed, so we can't > simply add a new argument to pass in the defining class. > Two possible solutions have been proposed to this problem: > > * Look up the class through walking the MRO. > This is potentially expensive, but will be useful if performance is > not > a problem (such as when raising a module-level exception). > * Storing a pointer to the defining class of each slot in a separate > table, > ``__typeslots__`` [#typeslots-mail]_. This is technically > feasible and fast, > but quite invasive. > > Due to the invasiveness of the latter approach, this PEP proposes > adding an MRO walking > helper for use in slot method implementations, deferring the more > complex alternative > as a potential future optimisation. Modules affected by this concern > also have the > option of using thread-local state or PEP 567 context variables, or > else defining their > own reload-friendly lookup caching scheme. > I do not believe walking the MRO is going to work without reworking the implementation of types, specifically how typeobject.c deals with slots of subclasses: in some cases copies the slots from the base class (see inherit_slots() and from where it's called). I believe this would cause problems if, for example, you define type X in module A, subclass it from type Y in module B without overriding the slot, and try to find the module object for A from the slot implementation. I don't think copying slots is a requirement for the desired semantics, but it's going to be fairly involved to rewrite it to do something else. There's also backward-compatibility to consider: third-party libraries can be inheriting from builtin types (e.g. numpy does this extensively) using the same copying-slot mechanism, which means those builtin types can't use the MRO walking to find their module without breaking compatibility with those third-party libraries. > > > Immutable Exception Types > ------------------------- > > To faciliate creating static exception classes, a new function is proposed: > ``PyErr_PrepareImmutableException``. It will work similarly to > ``PyErr_NewExceptionWithDoc`` > but will take a ``PyTypeObject **`` pointer, which points to a > ``PyTypeObject *`` that is > either ``NULL`` or an initialized ``PyTypeObject``. > This pointer may be declared in process-global state. The function will > then > allocate the object and will keep in mind that already existing exception > should not be overwritten. > > The extra indirection makes it possible to make > ``PyErr_PrepareImmutableException`` > part of the stable ABI by having the Python interpreter, rather than > extension code, > allocate the ``PyTypeObject``. > > > Specification > ============= > > Adding module references to heap types > -------------------------------------- > > The ``PyHeapTypeObject`` struct will get a new member, ``PyObject > *ht_module``, > that can store a pointer to the module object for which the type was > defined. > It will be ``NULL`` by default, and should not be modified after the type > object is created. > > A new factory method will be added for creating modules:: > > PyObject* PyType_FromModuleAndSpec(PyObject *module, > PyType_Spec *spec, > PyObject *bases) > > This acts the same as ``PyType_FromSpecWithBases``, and additionally sets > ``ht_module`` to the provided module object. > > Additionally, an accessor, ``PyObject * PyType_GetModule(PyTypeObject *)`` > will be provided. > It will return the ``ht_module`` if a heap type with module pointer set > is passed in, otherwise it will set a SystemError and return NULL. > > Usually, creating a class with ``ht_module`` set will create a reference > cycle involving the class and the module. > This is not a problem, as tearing down modules is not a > performance-sensitive > operation (and module-level functions typically also create reference > cycles). > The existing "set all module globals to None" code that breaks function > cycles > through ``f_globals`` will also break the new cycles through ``ht_module``. > > > Passing the defining class to extension methods > ----------------------------------------------- > > A new style of C-level functions will be added to the current selection of > ``PyCFunction`` and ``PyCFunctionWithKeywords``:: > > PyObject *PyCMethod(PyObject *self, > PyTypeObject *defining_class, > PyObject *args, PyObject *kwargs) > > A new method object flag, ``METH_METHOD``, will be added to signal that > the underlying C function is ``PyCMethod``. > > To hold the extra information, a new structure extending > ``PyCFunctionObject`` > will be added:: > > typedef struct { > PyCFunctionObject func; > PyTypeObject *mm_class; /* Passed as 'defining_class' arg to > the C func */ > } PyCMethodObject; > > To allow passing the defining class to the underlying C function, a change > to private API is required, now ``_PyMethodDef_RawFastCallDict`` and > ``_PyMethodDef_RawFastCallKeywords`` will receive ``PyTypeObject *cls`` > as one of their arguments. > > A new macro ``PyCFunction_GET_CLASS(cls)`` will be added for easier > access to mm_class. > > Method construction and calling code and will be updated to honor > ``METH_METHOD``. > > > Argument Clinic > --------------- > > To support passing the defining class to methods using Argument Clinic, > a new converter will be added to clinic.py: ``defining_class``. > > Each method may only have one argument using this converter, and it must > appear after ``self``, or, if ``self`` is not used, as the first argument. > The argument will be of type ``PyTypeObject *``. > > When used, Argument Clinic will select ``METH_METHOD`` as the calling > convention. > The argument will not appear in ``__text_signature__``. > > This will be compatible with ``__init__`` and ``__new__`` methods, where an > MRO walker will be used to pass the defining class from clinic generated > code to the user's function. > > > Slot methods > ------------ > > To allow access to per-module state from slot methods, an MRO walker > will be implemented:: > > PyTypeObject *PyType_DefiningTypeFromSlotFunc(PyTypeObject *type, > int slot, void *func) > > The walker will go through bases of heap-allocated ``type`` > and search for class that defines ``func`` at its ``slot``. > > The ``func`` needs not to be inherited by ``type``, only requirement > for the walker to find the defining class is that the defining class > must be heap-allocated. > > On failure, exception is set and NULL is returned. > > > Static exceptions > ----------------- > > A new function will be added:: > > int PyErr_PrepareImmutableException(PyTypeObject **exc, > const char *name, > const char *doc, > PyObject *base) > > Creates an immutable exception type which can be shared > across multiple module objects. > How is this going to deal with type.__subclasses__()? Is re-using the static type object between reloads and sub-interpreters important enough to warrant the different behaviour? What if sub-interpreters end up wanting to disallow sharing objects between them? If the type already exists (determined by a process-global pointer, > ``*exc``), skip the initialization and only ``INCREF`` it. > > If ``*exc`` is NULL, the function will > allocate a new exception type and initialize it using given parameters > the same way ``PyType_FromSpecAndBases`` would. > The ``doc`` and ``base`` arguments may be ``NULL``, defaulting to a > missing docstring and ``PyExc_Exception`` base class, respectively. > The exception type's ``tp_flags`` will be set to values common to > built-in exceptions and the ``Py_TPFLAGS_HEAP_IMMUTABLE`` flag (see below) > will be set. > On failure, ``PyErr_PrepareImmutableException`` will set an exception > and return -1. > > If called with an initialized exception type (``*exc`` > is non-NULL), the function will do nothing but incref ``*exc``. > > A new flag, ``Py_TPFLAGS_HEAP_IMMUTABLE``, will be added to prevent > mutation of the type object. This makes it possible to > share the object safely between multiple interpreters. > This flag is checked in ``type_setattro`` and blocks > setting of attributes when set, similar to built-in types. > > A new pointer, ``ht_moduleptr``, will be added to heap types to store > ``exc``. > > On deinitialization of the exception type, ``*exc`` will be set to > ``NULL``. > This makes it safe for ``PyErr_PrepareImmutableException`` to check if > the exception was already initialized. > > PyType_offsets > -------------- > > Some extension types are using instances with ``__dict__`` or > ``__weakref__`` > allocated. Currently, there is no way of passing offsets of these through > ``PyType_Spec``. To allow this, a new structure and a spec slot are > proposed. > > A new structure, ``PyType_offsets``, will have two members containing the > offsets of ``__dict__`` and ``__weakref__``:: > > typedef struct { > Py_ssize_t dict; > Py_ssize_t weaklist; > } PyType_offsets; > > The new slot, ``Py_offsets``, will be used to pass a ``PyType_offsets *`` > structure containing the mentioned data. > > > Helpers > ------- > > Getting to per-module state from a heap type is a very common task. To > make this > easier, a helper will be added:: > > void *PyType_GetModuleState(PyObject *type) > > This function takes a heap type and on success, it returns pointer to > state of the > module that the heap type belongs to. > > On failure, two scenarios may occure. When a type without a module is > passed in, > ``SystemError`` is set and ``NULL`` returned. If the module is found, > pointer > to the state, which may be ``NULL``, is returned without setting any > exception. > > > Modules Converted in the Initial Implementation > ----------------------------------------------- > > To validate the approach, several modules will be modified during > the initial implementation: > > The ``zipimport``, ``_io``, ``_elementtree``, and ``_csv`` modules > will be ported to PEP 489 multiphase initialization. > zipimport currently caches things in C globals. Changing it to use PEP 489 multi-phase initialisation is very likely going to change semantics in subtle ways... Is it really worth the risk? > > > Summary of API Changes and Additions > ==================================== > > New functions: > > * PyType_GetModule > * PyType_DefiningTypeFromSlotFunc > * PyType_GetModuleState > * PyErr_PrepareImmutableException > > New macros: > > * PyCFunction_GET_CLASS > > New types: > > * PyCMethodObject > > New structures: > > * PyType_offsets > > Modified functions: > > * _PyMethodDef_RawFastCallDict now receives ``PyTypeObject *cls``. > * _PyMethodDef_RawFastCallKeywords now receives ``PyTypeObject *cls``. > > Modified structures: > > * _heaptypeobject - added ht_module and ht_moduleptr > > Other changes: > > * METH_METHOD call flag > * defining_class converter in clinic > * Py_TPFLAGS_HEAP_IMMUTABLE flag > * Py_offsets type spec slot > > > Backwards Compatibility > ======================= > > Two new pointers are added to all heap types. > All other changes are adding new functions, structures and a type flag. > > The new ``PyErr_PrepareImmutableException`` function changes encourages > modules to switch from using heap type Exception classes to immutable ones, > and a number of modules will be switched in the initial implementation. > This change will prevent adding class attributes to such types. > For example, the following will raise AttributeError:: > > sqlite.OperationalError.foo = None > > Instances and subclasses of such exceptions will not be affected. > > Implementation > ============== > > An initial implementation is available in a Github repository [#gh-repo]_; > a patchset is at [#gh-patch]_. > > > Possible Future Extensions > ========================== > > Easy creation of types with module references > --------------------------------------------- > > It would be possible to add a PEP 489 execution slot type to make > creating heap types significantly easier than calling > ``PyType_FromModuleAndSpec``. > This is left to a future PEP. > > > Optimization > ------------ > > CPython optimizes calls to methods that have restricted signatures, > such as not allowing keyword arguments. > > As proposed here, methods defined with the ``METH_METHOD`` flag do not > support > these optimizations. > > Optimized calls still have the option of accessing per-module state > the same way slot methods do. > > > References > ========== > > .. [#typeslots-mail] [Import-SIG] On singleton modules, heap types, > and subinterpreters > (https://mail.python.org/pipermail/import-sig/2015-July/001035.html) > > .. [#gh-repo] > https://github.com/Traceur759/cpython/commits/pep-c > > .. [#gh-patch] > https://github.com/Traceur759/cpython/compare/master... > Traceur759:pep-c.patch > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > thomas%40python.org > -- Thomas Wouters <tho...@python.org> Hi! I'm an email virus! Think twice before sending your email to help me spread!
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com