Hello, Recently as part of the effort of untangling the tests of ElementTree and general code improvements (e.g. http://bugs.python.org/issue15651), I ran into something strange about PEP 3121-compliant modules. I'll demonstrate with csv, just as an example.
PEP 3121 mandates this function to look up the module-specific state in the current sub-interpreter: PyObject* PyState_FindModule(struct PyModuleDef*); This appears to make the following assumption: a given sub-interpreter only imports any C extension *once*. If it happens more than once, the assumption breaks in troubling ways. In normal code, it should never happen more than once because of the caching in sys.modules; However, many of our tests monkey-patch sys.modules (mainly by calling test.support.import_fresh_module) and hell breaks use. Here's a simple example: ---- import sys csv = __import__('csv') csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE) print(csv.list_dialects()) # ==> ['unixpwd', 'excel-tab', 'excel', 'unix'] del sys.modules['csv'] # FUN del sys.modules['_csv'] some_other_csv = __import__('csv') print(csv.list_dialects()) # ==> ['excel-tab', 'excel', 'unix'] ---- Note how doing some sys.modules acrobatics and re-importing suddenly changes the internal state of a previously imported module. This happens because: 1. The first import of 'csv' (which then imports `_csv) creates module-specific state on the heap and associates it with the current sub-interpreter. The list of dialects, amongst other things, is in that state. 2. The 'del's wipe 'csv' and '_csv' from the cache. 3. The second import of 'csv' also creates/initializes a new '_csv' module because it's not in sys.modules. This *replaces* the per-sub-interpreter cached version of the module's state with the clean state of a new module So essentially, while PEP 3121 moves state from C-file globals to per-module state, the state is still global, and this fact can be exposed from pure Python code. The above is a toy example. Here's a more serious case I ran into with ET, but once again is demonstrated with 'csv' for simplicity: ---- import io from test.support import import_fresh_module import csv csv_other = import_fresh_module('csv', fresh=['_csv', 'csv']) f = io.StringIO('foo\x00,bar\nbaz,42') reader = csv.reader(f) try: for row in reader: print(row) except csv.Error as e: print('Caught csv.error', e) except Exception as e: print('Caught Exception', e) ---- In the above, the reader throws 'csv.Error' (because of the NULL byte) but the exception clause does not catch it where expected, because it's a different exception class called `csv.Error`, due to the same problem demonstrated above (if the seemingly innocent import_fresh_module is removed, all is good). Any ideas/suggestion regarding this are welcome. This is quite an esoteric problem, but I believe it's serious. PEP 3121 is not used much (yet), but recently there was talk again about committing some of the patches created for converting Modules/*.c extensions to it during a GSoC project. I believe that we should understand the implications first. There can be a number of solutions; including modifying the PEP 3121 implementation machinery to really create/keep state "per module" and not just "per kind of module in a single sub-interpreter". Eli
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com