Hi all, There was some discussion on python-ideas last month about how to make it easier/more reliable for a module to override attribute access. This is useful for things like autoloading submodules (accessing 'foo.bar' triggers the import of 'bar'), or for deprecating module attributes that aren't functions. (Accessing 'foo.bar' emits a DeprecationWarning, "the bar attribute will be removed soon".) Python has had some basic support for this for a long time -- if a module overwrites its entry in sys.modules[__name__], then the object that's placed there will be returned by 'import'. This allows one to define custom subclasses of module and use them instead of the default, similar to how metaclasses allow one to use custom subclasses of 'type'.
In practice though it's very difficult to make this work safely and correctly for a top-level package. The main problem is that when you create a new object to stick into sys.modules, this necessarily means creating a new namespace dict. And now you have a mess, because now you have two dicts: new_module.__dict__ which is the namespace you export, and old_module.__dict__, which is the globals() for the code that's trying to define the module namespace. Keeping these in sync is extremely error-prone -- consider what happens, e.g., when your package __init__.py wants to import submodules which then recursively import the top-level package -- so it's difficult to justify for the kind of large packages that might be worried about deprecating entries in their top-level namespace. So what we'd really like is a way to somehow end up with an object that (a) has the same __dict__ as the original module, but (b) is of our own custom module subclass. If we can do this then metamodules will become safe and easy to write correctly. (There's a little demo of working metamodules here: https://github.com/njsmith/metamodule/ but it uses ctypes hacks that depend on non-stable parts of the CPython ABI, so it's not a long-term solution.) I've now spent some time trying to hack this capability into CPython and I've made a list of the possible options I can think of to fix this. I'm writing to python-dev because none of them are obviously The Right Way so I'd like to get some opinions/ruling/whatever on which approach to follow up on. Option 1: Make it possible to change the type of a module object in-place, so that we can write something like sys.modules[__name__].__class__ = MyModuleSubclass Option 1 downside: The invariants required to make __class__ assignment safe are complicated, and only implemented for heap-allocated type objects. PyModule_Type is not heap-allocated, so making this work would require lots of delicate surgery to typeobject.c. I'd rather not go down that rabbit-hole. ---- Option 2: Make PyModule_Type into a heap type allocated at interpreter startup, so that the above just works. Option 2 downside: PyModule_Type is exposed as a statically-allocated global symbol, so doing this would involve breaking the stable ABI. ---- Option 3: Make it legal to assign to the __dict__ attribute of a module object, so that we can write something like new_module = MyModuleSubclass(...) new_module.__dict__ = sys.modules[__name__].__dict__ sys.modules[__name__].__dict__ = {} # *** sys.modules[__name__] = new_module The line marked *** is necessary because the way modules are designed, they expect to control the lifecycle of their __dict__. When the module object is initialized, it fills in a bunch of stuff in the __dict__. When the module object (not the dict object!) is deallocated, it deletes everything from the __dict__. This latter feature in particular means that having two module objects sharing the same __dict__ is bad news. Option 3 downside: The paragraph above. Also, there's stuff inside the module struct besides just the __dict__, and more stuff has appeared there over time. ---- Option 4: Add a new function sys.swap_module_internals, which takes two module objects and swaps their __dict__ and other attributes. By making the operation a swap instead of an assignment, we avoid the lifecycle pitfalls from Option 3. By making it a builtin, we can make sure it always handles all the module fields that matter, not just __dict__. Usage: new_module = MyModuleSubclass(...) sys.swap_module_internals(new_module, sys.modules[__name__]) sys.modules[__name__] = new_module Option 4 downside: Obviously a hack. ---- Option 3 or 4 both seem workable, it just depends on which way we prefer to hold our nose. Option 4 is slightly more correct in that it works for *all* modules, but OTOH at the moment the only time Option 3 *really* fails is for compiled modules with PEP 3121 metadata, and compiled modules can already use a module subclass via other means (since they instantiate their own module objects). Thoughts? Suggestions on other options I've missed? Should I go ahead and write a patch for one of these? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com