Hi, this has been subject to a couple of threads on python-dev already, for example:
http://thread.gmane.org/gmane.comp.python.devel/135764/focus=140986 http://thread.gmane.org/gmane.comp.python.devel/141037/focus=141046 It originally came out of issues 13429 and 16392. http://bugs.python.org/issue13429 http://bugs.python.org/issue16392 Here's an initial attempt at a PEP for it. It is based on the (unfinished) ModuleSpec PEP, which is being discussed on the import-sig mailing list. http://mail.python.org/pipermail/import-sig/2013-August/000688.html Stefan PEP: 4XX Title: Redesigning extension modules Version: $Revision$ Last-Modified: $Date$ Author: Stefan Behnel <stefan_ml at behnel.de> BDFL-Delegate: ??? Discussions-To: ??? Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.4 Post-History: 23-Aug-2013 Resolution: Abstract ======== This PEP proposes a redesign of the way in which extension modules interact with the interpreter runtime. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve them by bringing extension modules closer to the way Python modules behave. An implication of this PEP is that extension modules can use arbitrary types for their module implementation and are no longer restricted to types.ModuleType. This makes it easy to support properties at the module level and to safely store arbitrary global state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Motivation ========== Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. This means that it knows neither the __file__ it is being loaded from nor its package (i.e. its fully qualified module name, FQMN). This hinders relative imports and resource loading. In Py3, it's also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. And without the FQMN, it is not trivial to correctly add the module to sys.modules either. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of a FQMN and correct file path hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or reloading and it is neither easy nor efficient with the current infrastructure to support these features. This PEP also addresses these issues. The current process =================== Currently, extension modules export an initialisation function named "PyInit_modulename", named after the file name of the shared library. This function is executed by the import machinery and must return either NULL in the case of an exception, or a fully initialised module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef struct. It then continues to initialise it by adding attributes to the module dict, creating types, etc. In the back, the shared library loader keeps a note of the fully qualified module name of the last module that it loaded, and when a module gets created that has a matching name, this global variable is used to determine the FQMN of the module object. This is not entirely safe as it relies on the module init function creating its own module object first, but this assumption usually holds in practice. The main problem in this process is the missing support for passing state into the module init function, and for safely passing state through to the module creation code. The proposal ============ The current extension module initialisation will be deprecated in favour of a new initialisation scheme. Since the current scheme will continue to be available, existing code will continue to work unchanged, including binary compatibility. Extension modules that support the new initialisation scheme must export a new public symbol "PyModuleCreate_modulename", where "modulename" is the name of the shared library. This mimics the previous naming convention for the "PyInit_modulename" function. This symbol must resolve to a C function with the following signature:: PyObject* (*PyModuleTypeCreateFunction)(PyObject* module_spec) The "module_spec" argument receives a "ModuleSpec" instance, as defined in PEP 4XX (FIXME). (All names are obviously up for debate and bike-shedding at this point.) When called, this function must create and return a type object, either a Python class or an extension type that is allocated on the heap. This type will be instantiated as module instance by the importer. There is no requirement for this type to be exactly or a subtype of types.ModuleType. Any type can be returned. This follows the current support for allowing arbitrary objects in sys.modules and makes it easier for extension modules to define a type that exactly matches their needs for holding module state. The constructor of this type must have the following signature:: def __init__(self, module_spec): The "module_spec" argument receives the same object as the one passed into the module type creation function. Implementation ============== XXX - not started Reloading and Sub-Interpreters ============================== To "reload" an extension module, the module create function is executed again and returns a new module type. This type is then instantiated as by the original module loader and replaces the previous entry in sys.modules. Once the last references to the previous module and its type are gone, both will be subject to normal garbage collection. Sub-interpreter support is an inherent property of the design. During import in the sub-interpreter, the module create function is executed and returns a new module type that is local to the sub-interpreter. Both the type and its module instance are subject to garbage collection in the sub-interpreter. Open questions ============== It is not immediately obvious how extensions should be handled that want to register more than one module in their module init function, e.g. compiled packages. One possibility would be to leave the setup to the user, who would have to know all FQMNs anyway in this case (or could construct them from the module spec of the current module), although not the import file path. A C-API could be provided to register new module types in the current interpreter, given a user provided ModuleSpec. There is no inherent requirement for the module creation function to actually return a type. It could return a arbitrary callable that creates a 'modulish' object when called. Should there be a type check in place that makes sure that what it returns is a type? I don't currently see a need for this. Copyright ========= This document has been placed in the public domain. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com