Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Nick Coghlan
On 10 Aug 2013 21:06, "Eli Bendersky"  wrote:
>
> n Sat, Aug 10, 2013 at 5:47 PM, Nick Coghlan  wrote:
>>
>> In a similar vein, Antoine recently noted that the fact the per-module
state isn't a real PyObject creates a variety of interesting lifecycle
management challenges.
>>
>> I'm not seeing an easy solution, either, except to automatically skip
reinitialization when the module has already been imported.
>
> This solution has problems. For example, in the case of ET it would
preclude testing what happens when pyexpat is disabled (remember we were
discussing this...). This is because there would be no real way to create
new instances of such modules (they would all cache themselves in the init
function - similarly to what ET now does in trunk, because otherwise some
of its global-dependent crazy tests fail).

Right, it would still be broken, just in a less horrible way.

>
> A more radical solution would be to *really* have multiple instances of
state per sub-interpreter. Well, they already exist -- it's
PyState_FindModule which is the problematic one because it only remembers
the last one. But I see that it's only being used by extension modules
themselves, to efficiently find modules they belong to. It feels a bit like
a hack that was made to avoid rewriting lots of code, because in general a
module's objects *can* know which module instance they came from. E.g. it
can be saved as a private field in classes exported by the module.
>
> So a more radical approach would be:
>
> PyState_FindModule can be deprecated, but still exist and be documented
to return the state the *last* module created in this sub-interpreter.
stdlib extension modules that actually use this mechanism can be rewritten
to just remember the module for real, and not rely on PyState_FindModule to
fetch it from a global cache. I don't think this would be hard, and it
would make the good intention of PEP 3121 more real - actual intependent
state per module instance.

Sounds promising to me. I suspect handling exported functions will prove to
be tricky, though - they may need to be redesigned to behave more like
"module methods".

>
> Eli
>
>
>
>
>
>
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Antoine Pitrou
On Sat, 10 Aug 2013 18:06:02 -0700
Eli Bendersky  wrote:
> This solution has problems. For example, in the case of ET it would
> preclude testing what happens when pyexpat is disabled (remember we were
> discussing this...). This is because there would be no real way to create
> new instances of such modules (they would all cache themselves in the init
> function - similarly to what ET now does in trunk, because otherwise some
> of its global-dependent crazy tests fail).
> 
> A more radical solution would be to *really* have multiple instances of
> state per sub-interpreter. Well, they already exist -- it's
> PyState_FindModule which is the problematic one because it only remembers
> the last one.

I'm not sure I understand your diagnosis. modules_per_index (and
PyState_FindModule) is per-interpreter so we already have a
per-interpreter state here. Something else must be interferring.

Note that module state is just a field attached to the module object
("void *md_state" in PyModuleObject). It's really the extension modules
which are per-interpreter, which is a good thing.

Regards

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Green buildbot failure.

2013-08-11 Thread Antoine Pitrou
On Sat, 10 Aug 2013 21:40:46 -0400
Terry Reedy  wrote:
> 
> This run recorded here shows a green test (it appears to have timed out)
> http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7017
> but the corresponding log for this Windows bot
> http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7017/steps/test/logs/stdio
> has the expected os.chown failure.

You've got the answer at the bottom:

  "program finished with exit code 0"

So for some reason, the test suite crashed, but with a successful exit
code. Buildbot thinks it ran fine.

> Are such green failures intended?

Not really, no.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Antoine Pitrou

Hi Eli,

On Sat, 10 Aug 2013 17:12:53 -0700
Eli Bendersky  wrote:
> 
> Note how doing some sys.modules acrobatics and re-importing suddenly
> changes the internal state of a previously imported module. This happens
> because:
> 
> 1. The first import of 'csv' (which then imports `_csv) creates
> module-specific state on the heap and associates it with the current
> sub-interpreter. The list of dialects, amongst other things, is in that
> state.
> 2. The 'del's wipe 'csv' and '_csv' from the cache.
> 3. The second import of 'csv' also creates/initializes a new '_csv' module
> because it's not in sys.modules. This *replaces* the per-sub-interpreter
> cached version of the module's state with the clean state of a new module

I would say this is pretty much expected. The converse would be a bug
IMO (but perhaps Martin disagrees). PEP 3121's stated goal is not only
subinterpreter support:

  "Extension module initialization currently has a few deficiencies.
  There is no cleanup for modules, the entry point name might give
  naming conflicts, the entry functions don't follow the usual calling
  convention, and multiple interpreters are not supported well."

Re-initializing state when importing a module anew makes extension
modules more like pure Python modules, which is a good thing.


I think the piece of interpretation you offered yesterday on IRC may be
the right explanation for the ET shenanigans:

  "Maybe the bug is that ParseError is kept in per-module state, and
  also exported from the module?"

PEP 3121 doesn't offer any guidelines for using its API, and its
example shows PyObject* fields in a module state.

I'm starting to think that it might be a bad use of PEP 3121. PyObjects
can, and therefore should be stored in the extension module dict where
they will participate in normal resource management (i.e. garbage
collection). If they are in the module dict, then they shouldn't be
held alive by the module state too, otherwise the (currently tricky)
lifetime management of extension modules can produce oddities.


So, the PEP 3121 "module state" pointer (the optional opaque void*
thing) should only be used to hold non-PyObjects.  PyObjects should go
to the module dict, like they do in normal Python modules.  Now, the
reason our PEP 3121 extension modules abuse the module state pointer to
keep PyObjects is two-fold:

1. it's surprisingly easier (it's actually a one-liner if you don't
handle errors - a rather bad thing, but all PEP 3121 extension modules
currently don't handle a NULL return from PyState_FindModule...)

2. it protects the module from any module dict monkeypatching. It's not
important if you are using a generic API on the PyObject, but it is if
the PyObject is really a custom C type with well-defined fields.

Those two issues can be addressed if we offer an API for it. How about:

  PyObject *PyState_GetModuleAttr(struct PyModuleDef *def,
  const char *name,
  PyObject *restrict_type)

  *def* is a pointer to the module definition.
  *name* is the attribute to look up on the module dict.
  *restrict_type*, if non-NULL, is a type object the looked up attribute
  must be an instance of.

  Lookup an attribute in the current interpreter's extension module
  instance for the module definition *def*.
  Returns a *new* reference (!), or NULL if an error occurred.
  An error can be:
  - no such module exists for the current interpreter (ImportError?
  RuntimeError? SystemError?)
  - no such attribute exists in the module dict (AttributeError)
  - the attribute doesn't conform to *restrict_type* (TypeError)

So code can be written like:

  PyObject *dialects = PyState_GetModuleAttr(
  &_csvmodule, "dialects", &PyDict_Type);
  if (dialects == NULL)
  return NULL;

Regards

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Antoine Pitrou
On Sun, 11 Aug 2013 12:33:16 +0200
Antoine Pitrou  wrote:
> So, the PEP 3121 "module state" pointer (the optional opaque void*
> thing) should only be used to hold non-PyObjects.  PyObjects should go
> to the module dict, like they do in normal Python modules.  Now, the
> reason our PEP 3121 extension modules abuse the module state pointer to
> keep PyObjects is two-fold:
> 
> 1. it's surprisingly easier (it's actually a one-liner if you don't
> handle errors - a rather bad thing, but all PEP 3121 extension modules
> currently don't handle a NULL return from PyState_FindModule...)
> 
> 2. it protects the module from any module dict monkeypatching. It's not
> important if you are using a generic API on the PyObject, but it is if
> the PyObject is really a custom C type with well-defined fields.

I overlooked a third reason which is performance. But, those lookups
are generally not performance-critical.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Nick Coghlan
On 11 August 2013 06:33, Antoine Pitrou  wrote:
> So code can be written like:
>
>   PyObject *dialects = PyState_GetModuleAttr(
>   &_csvmodule, "dialects", &PyDict_Type);
>   if (dialects == NULL)
>   return NULL;

This sounds like a good near term solution to me.

Longer term, I think there may be value in providing a richer
extension module initialisation API that lets extension modules be
represented as module *subclasses* in sys.modules, since that would
get us to a position where it is possible to have *multiple* instances
of an extension module in the *same* subinterpreter by holding on to
external references after removing them from sys.modules (which is
what we do in the test suite for pure Python modules). Enabling that
also ties into the question of passing info to the extension module
about how it is being loaded (e.g. as a submodule of a larger
package), as well as allowing extension modules to cleanly handle
reload(). However, that's dependent on the ModuleSpec idea we're
currently thrashing out on import-sig (and should be able to bring to
python-dev soon), and I think getting that integrated at all will be
ambitious enough for 3.4 - using it to improve extension module
handling would then be a project for 3.5.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Antoine Pitrou
On Sun, 11 Aug 2013 07:04:40 -0400
Nick Coghlan  wrote:
> On 11 August 2013 06:33, Antoine Pitrou  wrote:
> > So code can be written like:
> >
> >   PyObject *dialects = PyState_GetModuleAttr(
> >   &_csvmodule, "dialects", &PyDict_Type);
> >   if (dialects == NULL)
> >   return NULL;
> 
> This sounds like a good near term solution to me.
> 
> Longer term, I think there may be value in providing a richer
> extension module initialisation API that lets extension modules be
> represented as module *subclasses* in sys.modules, since that would
> get us to a position where it is possible to have *multiple* instances
> of an extension module in the *same* subinterpreter by holding on to
> external references after removing them from sys.modules (which is
> what we do in the test suite for pure Python modules).

Either that, or add a "struct PyMemberDef *m_members" field to
PyModuleDef, to enable looking up stuff in the m_state using regular
attribute lookup.

Unfortunately, doing so would probably break the ABI. Also, allowing
for module subclasses is probably more flexible in the long term. We
just need to devise a convenience API for that (perhaps by allowing to
create both the subclass *and* instantiate it in a single call).

> However, that's dependent on the ModuleSpec idea we're
> currently thrashing out on import-sig (and should be able to bring to
> python-dev soon), and I think getting that integrated at all will be
> ambitious enough for 3.4 - using it to improve extension module
> handling would then be a project for 3.5.

Sounds reasonable.

Regards

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Stefan Behnel
Antoine Pitrou, 11.08.2013 12:33:
> On Sat, 10 Aug 2013 17:12:53 -0700 Eli Bendersky wrote:
>> Note how doing some sys.modules acrobatics and re-importing suddenly
>> changes the internal state of a previously imported module. This happens
>> because:
>>
>> 1. The first import of 'csv' (which then imports `_csv) creates
>> module-specific state on the heap and associates it with the current
>> sub-interpreter. The list of dialects, amongst other things, is in that
>> state.
>> 2. The 'del's wipe 'csv' and '_csv' from the cache.
>> 3. The second import of 'csv' also creates/initializes a new '_csv' module
>> because it's not in sys.modules. This *replaces* the per-sub-interpreter
>> cached version of the module's state with the clean state of a new module
> 
> I would say this is pretty much expected. The converse would be a bug
> IMO (but perhaps Martin disagrees). PEP 3121's stated goal is not only
> subinterpreter support:
> 
>   "Extension module initialization currently has a few deficiencies.
>   There is no cleanup for modules, the entry point name might give
>   naming conflicts, the entry functions don't follow the usual calling
>   convention, and multiple interpreters are not supported well."
> 
> Re-initializing state when importing a module anew makes extension
> modules more like pure Python modules, which is a good thing.

It's the same as defining a type or function in a loop, or inside of a
closure. The whole point of reimporting is that you get a new module.

However, it should not change the content of the old module, just create a
new one.


> So, the PEP 3121 "module state" pointer (the optional opaque void*
> thing) should only be used to hold non-PyObjects.  PyObjects should go
> to the module dict, like they do in normal Python modules.  Now, the
> reason our PEP 3121 extension modules abuse the module state pointer to
> keep PyObjects is two-fold:
> 
> 1. it's surprisingly easier (it's actually a one-liner if you don't
> handle errors - a rather bad thing, but all PEP 3121 extension modules
> currently don't handle a NULL return from PyState_FindModule...)
> 
> 2. it protects the module from any module dict monkeypatching. It's not
> important if you are using a generic API on the PyObject, but it is if
> the PyObject is really a custom C type with well-defined fields.

Yes, it's a major safety problem if you can crash the interpreter by
assigning None to a module attribute.


> Those two issues can be addressed if we offer an API for it. How about:
> 
>   PyObject *PyState_GetModuleAttr(struct PyModuleDef *def,
>   const char *name,
>   PyObject *restrict_type)
> 
>   *def* is a pointer to the module definition.
>   *name* is the attribute to look up on the module dict.
>   *restrict_type*, if non-NULL, is a type object the looked up attribute
>   must be an instance of.
> 
>   Lookup an attribute in the current interpreter's extension module
>   instance for the module definition *def*.
>   Returns a *new* reference (!), or NULL if an error occurred.
>   An error can be:
>   - no such module exists for the current interpreter (ImportError?
>   RuntimeError? SystemError?)
>   - no such attribute exists in the module dict (AttributeError)
>   - the attribute doesn't conform to *restrict_type* (TypeError)
> 
> So code can be written like:
> 
>   PyObject *dialects = PyState_GetModuleAttr(
>   &_csvmodule, "dialects", &PyDict_Type);
>   if (dialects == NULL)
>   return NULL;

At least for Cython it's unlikely that it'll ever use this. It's just way
too much overhead for looking up a global name. Plus, not all global names
are visible in the module dict, e.g. it's common to have types that are
only used internally to keep some kind of state. Those would still have to
live in the internal per-module state.

ISTM that this is not a proper solution for the problem, because it only
covers the simple use cases. Rather, I'd prefer making the handling of
names in the per-module instance state safer.

Essentially, with PEP 3121, modules are just one form of an extension type.
So what's wrong with giving them normal extension type fields? Functions
are essentially methods of the module, global types are just inner classes.
Both should keep the module alive (on the one side) and be tied to it (on
the other side). If you reimport a module, you'd get a new set of
everything, and the old module would just linger in the background until
the last reference to it dies.

In other words, I don't see why modules should be any special.

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Stefan Behnel
Antoine Pitrou, 11.08.2013 13:48:
> On Sun, 11 Aug 2013 07:04:40 -0400 Nick Coghlan wrote:
>> On 11 August 2013 06:33, Antoine Pitrou wrote:
>>> So code can be written like:
>>>
>>>   PyObject *dialects = PyState_GetModuleAttr(
>>>   &_csvmodule, "dialects", &PyDict_Type);
>>>   if (dialects == NULL)
>>>   return NULL;
>>
>> This sounds like a good near term solution to me.
>>
>> Longer term, I think there may be value in providing a richer
>> extension module initialisation API that lets extension modules be
>> represented as module *subclasses* in sys.modules, since that would
>> get us to a position where it is possible to have *multiple* instances
>> of an extension module in the *same* subinterpreter by holding on to
>> external references after removing them from sys.modules (which is
>> what we do in the test suite for pure Python modules).
> 
> Either that, or add a "struct PyMemberDef *m_members" field to
> PyModuleDef, to enable looking up stuff in the m_state using regular
> attribute lookup.

Hmm, yes, it's unfortunate that the module state isn't just a public part
of the object struct.


> Unfortunately, doing so would probably break the ABI. Also, allowing
> for module subclasses is probably more flexible in the long term.

+1000


> We
> just need to devise a convenience API for that (perhaps by allowing to
> create both the subclass *and* instantiate it in a single call).

Right. This conflicts somewhat with the simplified module creation. If the
module loader passed the readily instantiated module instance into the
module init function, then module subtypes don't fit into this scheme anymore.

One more reason why modules shouldn't be special. Essentially, we need an
m_new() and m_init() for them. And the lifetime of the module type would
have to be linked to the (sub-)interpreter, whereas the lifetime of the
module instance would be determined by whoever uses the module and/or
decides to unload/reload it.

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Green buildbot failure.

2013-08-11 Thread Richard Oudkerk

On 11/08/2013 11:00am, Antoine Pitrou wrote:

You've got the answer at the bottom:

   "program finished with exit code 0"

So for some reason, the test suite crashed, but with a successful exit
code. Buildbot thinks it ran fine.


Was the test terminated because it took too long?

TerminateProcess(handle, exitcode) sometimes makes the program exit with 
return code 0 instead of exitcode.  At any rate, test_multiprocessing 
contains this disabled test:


# XXX sometimes get p.exitcode == 0 on Windows ...
#self.assertEqual(p.exitcode, -signal.SIGTERM)

--
Richard

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Antoine Pitrou
On Sun, 11 Aug 2013 14:16:10 +0200
Stefan Behnel  wrote:
> 
> > We
> > just need to devise a convenience API for that (perhaps by allowing to
> > create both the subclass *and* instantiate it in a single call).
> 
> Right. This conflicts somewhat with the simplified module creation. If the
> module loader passed the readily instantiated module instance into the
> module init function, then module subtypes don't fit into this scheme anymore.
> 
> One more reason why modules shouldn't be special. Essentially, we need an
> m_new() and m_init() for them. And the lifetime of the module type would
> have to be linked to the (sub-)interpreter, whereas the lifetime of the
> module instance would be determined by whoever uses the module and/or
> decides to unload/reload it.

It may be simpler if the only strong reference to the module type is in
the module instance itself. Successive module initializations would get
different types, but that shouldn't be a problem in practice.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Green buildbot failure.

2013-08-11 Thread Richard Oudkerk

http://stackoverflow.com/questions/2061735/42-passed-to-terminateprocess-sometimes-getexitcodeprocess-returns-0

--
Richard

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Stefan Behnel
Antoine Pitrou, 11.08.2013 14:32:
> On Sun, 11 Aug 2013 14:16:10 +0200 Stefan Behnel wrote:
>>> We
>>> just need to devise a convenience API for that (perhaps by allowing to
>>> create both the subclass *and* instantiate it in a single call).
>>
>> Right. This conflicts somewhat with the simplified module creation. If the
>> module loader passed the readily instantiated module instance into the
>> module init function, then module subtypes don't fit into this scheme 
>> anymore.
>>
>> One more reason why modules shouldn't be special. Essentially, we need an
>> m_new() and m_init() for them. And the lifetime of the module type would
>> have to be linked to the (sub-)interpreter, whereas the lifetime of the
>> module instance would be determined by whoever uses the module and/or
>> decides to unload/reload it.
> 
> It may be simpler if the only strong reference to the module type is in
> the module instance itself. Successive module initializations would get
> different types, but that shouldn't be a problem in practice.

Agreed. Then the module instance would just be the only instance of a new
type that gets created each time the module initialised. Even if module
subtypes were to become common place once they are generally supported
(because they're the easiest way to store per-module state efficiently),
module reinitialisation should be rare enough to just buy them with a new
type for each. The size of the complete module state+dict will almost
always outweigh the size of the one additional type by factors.

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Stefan Behnel
Stefan Behnel, 11.08.2013 14:48:
> Antoine Pitrou, 11.08.2013 14:32:
>> On Sun, 11 Aug 2013 14:16:10 +0200 Stefan Behnel wrote:
 We
 just need to devise a convenience API for that (perhaps by allowing to
 create both the subclass *and* instantiate it in a single call).
>>>
>>> Right. This conflicts somewhat with the simplified module creation. If the
>>> module loader passed the readily instantiated module instance into the
>>> module init function, then module subtypes don't fit into this scheme 
>>> anymore.
>>>
>>> One more reason why modules shouldn't be special. Essentially, we need an
>>> m_new() and m_init() for them. And the lifetime of the module type would
>>> have to be linked to the (sub-)interpreter, whereas the lifetime of the
>>> module instance would be determined by whoever uses the module and/or
>>> decides to unload/reload it.
>>
>> It may be simpler if the only strong reference to the module type is in
>> the module instance itself. Successive module initializations would get
>> different types, but that shouldn't be a problem in practice.
> 
> Agreed. Then the module instance would just be the only instance of a new
> type that gets created each time the module initialised. Even if module
> subtypes were to become common place once they are generally supported
> (because they're the easiest way to store per-module state efficiently),
> module reinitialisation should be rare enough to just buy them with a new
> type for each. The size of the complete module state+dict will almost
> always outweigh the size of the one additional type by factors.

BTW, this already suggests a simple module initialisation interface. The
extension module would expose a function that returns a module type, and
the loader/importer would then simply instantiate that. Nothing else is needed.

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Stefan Behnel
Stefan Behnel, 11.08.2013 14:53:
> Stefan Behnel, 11.08.2013 14:48:
>> Antoine Pitrou, 11.08.2013 14:32:
>>> On Sun, 11 Aug 2013 14:16:10 +0200 Stefan Behnel wrote:
> We
> just need to devise a convenience API for that (perhaps by allowing to
> create both the subclass *and* instantiate it in a single call).

 Right. This conflicts somewhat with the simplified module creation. If the
 module loader passed the readily instantiated module instance into the
 module init function, then module subtypes don't fit into this scheme 
 anymore.

 One more reason why modules shouldn't be special. Essentially, we need an
 m_new() and m_init() for them. And the lifetime of the module type would
 have to be linked to the (sub-)interpreter, whereas the lifetime of the
 module instance would be determined by whoever uses the module and/or
 decides to unload/reload it.
>>>
>>> It may be simpler if the only strong reference to the module type is in
>>> the module instance itself. Successive module initializations would get
>>> different types, but that shouldn't be a problem in practice.
>>
>> Agreed. Then the module instance would just be the only instance of a new
>> type that gets created each time the module initialised. Even if module
>> subtypes were to become common place once they are generally supported
>> (because they're the easiest way to store per-module state efficiently),
>> module reinitialisation should be rare enough to just buy them with a new
>> type for each. The size of the complete module state+dict will almost
>> always outweigh the size of the one additional type by factors.
> 
> BTW, this already suggests a simple module initialisation interface. The
> extension module would expose a function that returns a module type, and
> the loader/importer would then simply instantiate that. Nothing else is 
> needed.

Actually, strike the word "module type" and replace it with "type". Is
there really a reason why Python needs a module type at all? I mean, you
can stick arbitrary objects in sys.modules, so why not allow arbitrary
types to be returned by the module creation function?

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Eli Bendersky
On Sun, Aug 11, 2013 at 2:58 AM, Antoine Pitrou  wrote:

> On Sat, 10 Aug 2013 18:06:02 -0700
> Eli Bendersky  wrote:
> > This solution has problems. For example, in the case of ET it would
> > preclude testing what happens when pyexpat is disabled (remember we were
> > discussing this...). This is because there would be no real way to create
> > new instances of such modules (they would all cache themselves in the
> init
> > function - similarly to what ET now does in trunk, because otherwise some
> > of its global-dependent crazy tests fail).
> >
> > A more radical solution would be to *really* have multiple instances of
> > state per sub-interpreter. Well, they already exist -- it's
> > PyState_FindModule which is the problematic one because it only remembers
> > the last one.
>
> I'm not sure I understand your diagnosis. modules_per_index (and
> PyState_FindModule) is per-interpreter so we already have a
> per-interpreter state here. Something else must be interferring.
>
>
Yes, it's per interpreter, but only one per interpreter is remembered in
state->modules_by_index. What I'm trying to say is that currently two
different instances of PyModuleObject *within the same interpterer* share
the state if they get to it through PyState_FindModule, because they share
the same PyModuleDef, and stat->modules_by_index keeps only one module per
PyModuleDef.


> Note that module state is just a field attached to the module object
> ("void *md_state" in PyModuleObject). It's really the extension modules
> which are per-interpreter, which is a good thing.
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Nick Coghlan
On 11 Aug 2013 09:02, "Stefan Behnel"  wrote:
>
> Stefan Behnel, 11.08.2013 14:53:
> > Stefan Behnel, 11.08.2013 14:48:
> >> Antoine Pitrou, 11.08.2013 14:32:
> >>> On Sun, 11 Aug 2013 14:16:10 +0200 Stefan Behnel wrote:
> > We
> > just need to devise a convenience API for that (perhaps by allowing
to
> > create both the subclass *and* instantiate it in a single call).
> 
>  Right. This conflicts somewhat with the simplified module creation.
If the
>  module loader passed the readily instantiated module instance into
the
>  module init function, then module subtypes don't fit into this
scheme anymore.
> 
>  One more reason why modules shouldn't be special. Essentially, we
need an
>  m_new() and m_init() for them. And the lifetime of the module type
would
>  have to be linked to the (sub-)interpreter, whereas the lifetime of
the
>  module instance would be determined by whoever uses the module and/or
>  decides to unload/reload it.
> >>>
> >>> It may be simpler if the only strong reference to the module type is
in
> >>> the module instance itself. Successive module initializations would
get
> >>> different types, but that shouldn't be a problem in practice.
> >>
> >> Agreed. Then the module instance would just be the only instance of a
new
> >> type that gets created each time the module initialised. Even if module
> >> subtypes were to become common place once they are generally supported
> >> (because they're the easiest way to store per-module state
efficiently),
> >> module reinitialisation should be rare enough to just buy them with a
new
> >> type for each. The size of the complete module state+dict will almost
> >> always outweigh the size of the one additional type by factors.
> >
> > BTW, this already suggests a simple module initialisation interface. The
> > extension module would expose a function that returns a module type, and
> > the loader/importer would then simply instantiate that. Nothing else is
needed.
>
> Actually, strike the word "module type" and replace it with "type". Is
> there really a reason why Python needs a module type at all? I mean, you
> can stick arbitrary objects in sys.modules, so why not allow arbitrary
> types to be returned by the module creation function?

That's exactly what I have in mind, but the way extension module imports
currently work means we can't easily do it just yet. Fortunately, importlib
means we now have some hope of fixing that :)

Cheers,
Nick.
>
> Stefan
>
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Eli Bendersky
On Sun, Aug 11, 2013 at 3:33 AM, Antoine Pitrou  wrote:

>
> Hi Eli,
>
> On Sat, 10 Aug 2013 17:12:53 -0700
> Eli Bendersky  wrote:
> >
> > Note how doing some sys.modules acrobatics and re-importing suddenly
> > changes the internal state of a previously imported module. This happens
> > because:
> >
> > 1. The first import of 'csv' (which then imports `_csv) creates
> > module-specific state on the heap and associates it with the current
> > sub-interpreter. The list of dialects, amongst other things, is in that
> > state.
> > 2. The 'del's wipe 'csv' and '_csv' from the cache.
> > 3. The second import of 'csv' also creates/initializes a new '_csv'
> module
> > because it's not in sys.modules. This *replaces* the per-sub-interpreter
> > cached version of the module's state with the clean state of a new module
>
> I would say this is pretty much expected.


I'm struggling to see how it's expected. The two imported csv modules are
different (i.e. different id() of members), and yet some state is shared
between them. I think the root reason for it is that "PyModuleDev
_csvmodule" is uniqued per interpreter, not per module instance.

Even if dialects were not a PyObject, this would still be problematic,
don't you think? And note that here, unlike the ET.ParseError case, I don't
think the problem is exporting internal per-module state as a module
attribute. The following two are un-reconcilable, IMHO:

1. Wanting to have two instances of the same module in the same interpterer.
2. Using a global shared PyModuleDef between all instances of the same
module in the same interpterer.



> The converse would be a bug
> IMO (but perhaps Martin disagrees). PEP 3121's stated goal is not only
> subinterpreter support:
>
>   "Extension module initialization currently has a few deficiencies.
>   There is no cleanup for modules, the entry point name might give
>   naming conflicts, the entry functions don't follow the usual calling
>   convention, and multiple interpreters are not supported well."
>
> Re-initializing state when importing a module anew makes extension
> modules more like pure Python modules, which is a good thing.
>
>
> I think the piece of interpretation you offered yesterday on IRC may be
> the right explanation for the ET shenanigans:
>
>   "Maybe the bug is that ParseError is kept in per-module state, and
>   also exported from the module?"
>
> PEP 3121 doesn't offer any guidelines for using its API, and its
> example shows PyObject* fields in a module state.
>
> I'm starting to think that it might be a bad use of PEP 3121. PyObjects
> can, and therefore should be stored in the extension module dict where
> they will participate in normal resource management (i.e. garbage
> collection). If they are in the module dict, then they shouldn't be
> held alive by the module state too, otherwise the (currently tricky)
> lifetime management of extension modules can produce oddities.
>
>
> So, the PEP 3121 "module state" pointer (the optional opaque void*
> thing) should only be used to hold non-PyObjects.  PyObjects should go
> to the module dict, like they do in normal Python modules.  Now, the
> reason our PEP 3121 extension modules abuse the module state pointer to
> keep PyObjects is two-fold:
>
> 1. it's surprisingly easier (it's actually a one-liner if you don't
> handle errors - a rather bad thing, but all PEP 3121 extension modules
> currently don't handle a NULL return from PyState_FindModule...)
>
> 2. it protects the module from any module dict monkeypatching. It's not
> important if you are using a generic API on the PyObject, but it is if
> the PyObject is really a custom C type with well-defined fields.
>
> Those two issues can be addressed if we offer an API for it. How about:
>
>   PyObject *PyState_GetModuleAttr(struct PyModuleDef *def,
>   const char *name,
>   PyObject *restrict_type)
>
>   *def* is a pointer to the module definition.
>   *name* is the attribute to look up on the module dict.
>   *restrict_type*, if non-NULL, is a type object the looked up attribute
>   must be an instance of.
>
>   Lookup an attribute in the current interpreter's extension module
>   instance for the module definition *def*.
>   Returns a *new* reference (!), or NULL if an error occurred.
>   An error can be:
>   - no such module exists for the current interpreter (ImportError?
>   RuntimeError? SystemError?)
>   - no such attribute exists in the module dict (AttributeError)
>   - the attribute doesn't conform to *restrict_type* (TypeError)
>
> So code can be written like:
>
>   PyObject *dialects = PyState_GetModuleAttr(
>   &_csvmodule, "dialects", &PyDict_Type);
>   if (dialects == NULL)
>   return NULL;
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-arch

Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Antoine Pitrou
On Sun, 11 Aug 2013 06:26:55 -0700
Eli Bendersky  wrote:
> On Sun, Aug 11, 2013 at 3:33 AM, Antoine Pitrou  wrote:
> 
> >
> > Hi Eli,
> >
> > On Sat, 10 Aug 2013 17:12:53 -0700
> > Eli Bendersky  wrote:
> > >
> > > Note how doing some sys.modules acrobatics and re-importing suddenly
> > > changes the internal state of a previously imported module. This happens
> > > because:
> > >
> > > 1. The first import of 'csv' (which then imports `_csv) creates
> > > module-specific state on the heap and associates it with the current
> > > sub-interpreter. The list of dialects, amongst other things, is in that
> > > state.
> > > 2. The 'del's wipe 'csv' and '_csv' from the cache.
> > > 3. The second import of 'csv' also creates/initializes a new '_csv'
> > module
> > > because it's not in sys.modules. This *replaces* the per-sub-interpreter
> > > cached version of the module's state with the clean state of a new module
> >
> > I would say this is pretty much expected.
> 
> I'm struggling to see how it's expected. The two imported csv modules are
> different (i.e. different id() of members), and yet some state is shared
> between them.

There are two csv modules, but there are not two _csv modules.
Extension modules are currently immortal until the end of the
interpreter:

>>> csv = __import__('csv')
>>> wcsv = weakref.ref(csv)
>>> w_csv = weakref.ref(sys.modules['_csv'])
>>> del sys.modules['csv']
>>> del sys.modules['_csv']
>>> del csv
>>> gc.collect()
50
>>> wcsv()
>>> w_csv()



So, "sharing" a state is pretty much expected, since you are
re-initializating an existing module.
(but the module does get re-initialized, which is the point of PEP 3121)

> 1. Wanting to have two instances of the same module in the same interpterer.

It could be nice, but really, that's not a common use case. And it's
impossible for extension modules, currently.

Regards

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] redesigning the extension module initialisation protocol (was: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others))

2013-08-11 Thread Stefan Behnel
Nick Coghlan, 11.08.2013 15:19:
> On 11 Aug 2013 09:02, "Stefan Behnel" wrote:
>>> BTW, this already suggests a simple module initialisation interface. The
>>> extension module would expose a function that returns a module type, and
>>> the loader/importer would then simply instantiate that. Nothing else is
>>> needed.
>>
>> Actually, strike the word "module type" and replace it with "type". Is
>> there really a reason why Python needs a module type at all? I mean, you
>> can stick arbitrary objects in sys.modules, so why not allow arbitrary
>> types to be returned by the module creation function?
> 
> That's exactly what I have in mind, but the way extension module imports
> currently work means we can't easily do it just yet. Fortunately, importlib
> means we now have some hope of fixing that :)

Well, what do we need? We don't need to care about existing code, as long
as the current scheme is only deprecated and not deleted. That won't happen
before Py4 anyway. New code would simply export a different symbol when
compiling for a CPython that supports it, which points to the function that
returns the type.

Then, there's already the PyType_Copy() function, which can be used to
create a heap type from a statically defined type. So extension modules can
simply define an (arbitrary) additional type in any way they see fit, copy
it to the heap, and return it.

Next, we need to define a signature for the type's __init__() method. This
can be done in a future proof way by allowing arbitrary keyword arguments
to be added, i.e. such a type must have a signature like

def __init__(self, currently, used, pos, args, **kwargs)

and simply ignore kwargs for now.

Actually, we may get away with not passing all too many arguments here if
we allow the importer to add stuff to the type's dict in between,
specifically __file__, __path__ and friends, so that they are available
before the type gets instantiated. Not sure if this is a good idea, but it
would at least relieve the user from having to copy these things over from
some kind of context or whatever we might want to pass in.

Alternatively, we could split the instantiation up between tp_new() and
tp_init(), and let the importer set stuff on the instance dict in between
the two. But given that this context won't actually change once the shared
library is loaded, the only reason to prefer modifying the instance instead
of the type would be to avoid requiring a tp_dict for the type. Open for
discussion, I guess.

Did I forget anything? Sounds simple enough to me so far.

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Eli Bendersky
On Sun, Aug 11, 2013 at 6:40 AM, Antoine Pitrou  wrote:

> On Sun, 11 Aug 2013 06:26:55 -0700
> Eli Bendersky  wrote:
> > On Sun, Aug 11, 2013 at 3:33 AM, Antoine Pitrou 
> wrote:
> >
> > >
> > > Hi Eli,
> > >
> > > On Sat, 10 Aug 2013 17:12:53 -0700
> > > Eli Bendersky  wrote:
> > > >
> > > > Note how doing some sys.modules acrobatics and re-importing suddenly
> > > > changes the internal state of a previously imported module. This
> happens
> > > > because:
> > > >
> > > > 1. The first import of 'csv' (which then imports `_csv) creates
> > > > module-specific state on the heap and associates it with the current
> > > > sub-interpreter. The list of dialects, amongst other things, is in
> that
> > > > state.
> > > > 2. The 'del's wipe 'csv' and '_csv' from the cache.
> > > > 3. The second import of 'csv' also creates/initializes a new '_csv'
> > > module
> > > > because it's not in sys.modules. This *replaces* the
> per-sub-interpreter
> > > > cached version of the module's state with the clean state of a new
> module
> > >
> > > I would say this is pretty much expected.
> >
> > I'm struggling to see how it's expected. The two imported csv modules are
> > different (i.e. different id() of members), and yet some state is shared
> > between them.
>
> There are two csv modules, but there are not two _csv modules.
> Extension modules are currently immortal until the end of the
> interpreter:
>
> >>> csv = __import__('csv')
> >>> wcsv = weakref.ref(csv)
> >>> w_csv = weakref.ref(sys.modules['_csv'])
> >>> del sys.modules['csv']
> >>> del sys.modules['_csv']
> >>> del csv
> >>> gc.collect()
> 50
> >>> wcsv()
> >>> w_csv()
>  '/home/antoine/cpython/default/build/lib.linux-x86_64-3.4-pydebug/_
> csv.cpython-34dm.so'>
>
>
> So, "sharing" a state is pretty much expected, since you are
> re-initializating an existing module.
> (but the module does get re-initialized, which is the point of PEP 3121)
>

Yes, you're right - this is an oversight on my behalf. Indeed, the
extensions dict in import.c keeps it alive once loaded, and only ever gets
cleaned up in Py_Finalize.

Eli
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Antoine Pitrou
On Sun, 11 Aug 2013 08:49:56 -0700
Eli Bendersky  wrote:

> On Sun, Aug 11, 2013 at 6:40 AM, Antoine Pitrou  wrote:
> 
> > On Sun, 11 Aug 2013 06:26:55 -0700
> > Eli Bendersky  wrote:
> > > On Sun, Aug 11, 2013 at 3:33 AM, Antoine Pitrou 
> > wrote:
> > >
> > > >
> > > > Hi Eli,
> > > >
> > > > On Sat, 10 Aug 2013 17:12:53 -0700
> > > > Eli Bendersky  wrote:
> > > > >
> > > > > Note how doing some sys.modules acrobatics and re-importing suddenly
> > > > > changes the internal state of a previously imported module. This
> > happens
> > > > > because:
> > > > >
> > > > > 1. The first import of 'csv' (which then imports `_csv) creates
> > > > > module-specific state on the heap and associates it with the current
> > > > > sub-interpreter. The list of dialects, amongst other things, is in
> > that
> > > > > state.
> > > > > 2. The 'del's wipe 'csv' and '_csv' from the cache.
> > > > > 3. The second import of 'csv' also creates/initializes a new '_csv'
> > > > module
> > > > > because it's not in sys.modules. This *replaces* the
> > per-sub-interpreter
> > > > > cached version of the module's state with the clean state of a new
> > module
> > > >
> > > > I would say this is pretty much expected.
> > >
> > > I'm struggling to see how it's expected. The two imported csv modules are
> > > different (i.e. different id() of members), and yet some state is shared
> > > between them.
> >
> > There are two csv modules, but there are not two _csv modules.
> > Extension modules are currently immortal until the end of the
> > interpreter:
> >
> > >>> csv = __import__('csv')
> > >>> wcsv = weakref.ref(csv)
> > >>> w_csv = weakref.ref(sys.modules['_csv'])
> > >>> del sys.modules['csv']
> > >>> del sys.modules['_csv']
> > >>> del csv
> > >>> gc.collect()
> > 50
> > >>> wcsv()
> > >>> w_csv()
> >  > '/home/antoine/cpython/default/build/lib.linux-x86_64-3.4-pydebug/_
> > csv.cpython-34dm.so'>
> >
> >
> > So, "sharing" a state is pretty much expected, since you are
> > re-initializating an existing module.
> > (but the module does get re-initialized, which is the point of PEP 3121)
> >
> 
> Yes, you're right - this is an oversight on my behalf. Indeed, the
> extensions dict in import.c keeps it alive once loaded, and only ever gets
> cleaned up in Py_Finalize.

It's not the extensions dict in import.c, it's modules_by_index in the
interpreter state.
(otherwise it wouldn't be per-interpreter)

The extensions dict holds the module *definition* (the struct
PyModuleDef), not the module instance.

Regards

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-11 Thread Eli Bendersky
On Sun, Aug 11, 2013 at 8:56 AM, Antoine Pitrou  wrote:

> On Sun, 11 Aug 2013 08:49:56 -0700
> Eli Bendersky  wrote:
>
> > On Sun, Aug 11, 2013 at 6:40 AM, Antoine Pitrou 
> wrote:
> >
> > > On Sun, 11 Aug 2013 06:26:55 -0700
> > > Eli Bendersky  wrote:
> > > > On Sun, Aug 11, 2013 at 3:33 AM, Antoine Pitrou  >
> > > wrote:
> > > >
> > > > >
> > > > > Hi Eli,
> > > > >
> > > > > On Sat, 10 Aug 2013 17:12:53 -0700
> > > > > Eli Bendersky  wrote:
> > > > > >
> > > > > > Note how doing some sys.modules acrobatics and re-importing
> suddenly
> > > > > > changes the internal state of a previously imported module. This
> > > happens
> > > > > > because:
> > > > > >
> > > > > > 1. The first import of 'csv' (which then imports `_csv) creates
> > > > > > module-specific state on the heap and associates it with the
> current
> > > > > > sub-interpreter. The list of dialects, amongst other things, is
> in
> > > that
> > > > > > state.
> > > > > > 2. The 'del's wipe 'csv' and '_csv' from the cache.
> > > > > > 3. The second import of 'csv' also creates/initializes a new
> '_csv'
> > > > > module
> > > > > > because it's not in sys.modules. This *replaces* the
> > > per-sub-interpreter
> > > > > > cached version of the module's state with the clean state of a
> new
> > > module
> > > > >
> > > > > I would say this is pretty much expected.
> > > >
> > > > I'm struggling to see how it's expected. The two imported csv
> modules are
> > > > different (i.e. different id() of members), and yet some state is
> shared
> > > > between them.
> > >
> > > There are two csv modules, but there are not two _csv modules.
> > > Extension modules are currently immortal until the end of the
> > > interpreter:
> > >
> > > >>> csv = __import__('csv')
> > > >>> wcsv = weakref.ref(csv)
> > > >>> w_csv = weakref.ref(sys.modules['_csv'])
> > > >>> del sys.modules['csv']
> > > >>> del sys.modules['_csv']
> > > >>> del csv
> > > >>> gc.collect()
> > > 50
> > > >>> wcsv()
> > > >>> w_csv()
> > >  > > '/home/antoine/cpython/default/build/lib.linux-x86_64-3.4-pydebug/_
> > > csv.cpython-34dm.so'>
> > >
> > >
> > > So, "sharing" a state is pretty much expected, since you are
> > > re-initializating an existing module.
> > > (but the module does get re-initialized, which is the point of PEP
> 3121)
> > >
> >
> > Yes, you're right - this is an oversight on my behalf. Indeed, the
> > extensions dict in import.c keeps it alive once loaded, and only ever
> gets
> > cleaned up in Py_Finalize.
>
> It's not the extensions dict in import.c, it's modules_by_index in the
> interpreter state.
> (otherwise it wouldn't be per-interpreter)
>
> The extensions dict holds the module *definition* (the struct
> PyModuleDef), not the module instance.
>

Thanks for the clarification.

Eli
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] redesigning the extension module initialisation protocol (was: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others))

2013-08-11 Thread Eli Bendersky
On Sun, Aug 11, 2013 at 6:52 AM, Stefan Behnel  wrote:

> Nick Coghlan, 11.08.2013 15:19:
> > On 11 Aug 2013 09:02, "Stefan Behnel" wrote:
> >>> BTW, this already suggests a simple module initialisation interface.
> The
> >>> extension module would expose a function that returns a module type,
> and
> >>> the loader/importer would then simply instantiate that. Nothing else is
> >>> needed.
> >>
> >> Actually, strike the word "module type" and replace it with "type". Is
> >> there really a reason why Python needs a module type at all? I mean, you
> >> can stick arbitrary objects in sys.modules, so why not allow arbitrary
> >> types to be returned by the module creation function?
> >
> > That's exactly what I have in mind, but the way extension module imports
> > currently work means we can't easily do it just yet. Fortunately,
> importlib
> > means we now have some hope of fixing that :)
>
> Well, what do we need? We don't need to care about existing code, as long
> as the current scheme is only deprecated and not deleted. That won't happen
> before Py4 anyway. New code would simply export a different symbol when
> compiling for a CPython that supports it, which points to the function that
> returns the type.
>
> Then, there's already the PyType_Copy() function, which can be used to
> create a heap type from a statically defined type. So extension modules can
> simply define an (arbitrary) additional type in any way they see fit, copy
> it to the heap, and return it.
>
> Next, we need to define a signature for the type's __init__() method. This
> can be done in a future proof way by allowing arbitrary keyword arguments
> to be added, i.e. such a type must have a signature like
>
> def __init__(self, currently, used, pos, args, **kwargs)
>
> and simply ignore kwargs for now.
>
> Actually, we may get away with not passing all too many arguments here if
> we allow the importer to add stuff to the type's dict in between,
> specifically __file__, __path__ and friends, so that they are available
> before the type gets instantiated. Not sure if this is a good idea, but it
> would at least relieve the user from having to copy these things over from
> some kind of context or whatever we might want to pass in.
>
> Alternatively, we could split the instantiation up between tp_new() and
> tp_init(), and let the importer set stuff on the instance dict in between
> the two. But given that this context won't actually change once the shared
> library is loaded, the only reason to prefer modifying the instance instead
> of the type would be to avoid requiring a tp_dict for the type. Open for
> discussion, I guess.
>
> Did I forget anything? Sounds simple enough to me so far.
>

Out of curiosity - can we list actual use cases for this new design? The
previous thread, admittedly, deals with an isoteric corner-cases that comes
up in overly-clever tests. If we plan to serious consider these changes -
and this appears to be worth a PEP - we need a list of actual advantages
over the current approach. It's not that a more conceptually pure design is
an insufficient reason, IMHO, but it would be interesting to hear about
other implications.

Eli
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] redesigning the extension module initialisation protocol

2013-08-11 Thread Stefan Behnel
Eli Bendersky, 11.08.2013 19:43:
> Out of curiosity - can we list actual use cases for this new design? The
> previous thread, admittedly, deals with an isoteric corner-cases that comes
> up in overly-clever tests. If we plan to serious consider these changes -
> and this appears to be worth a PEP - we need a list of actual advantages
> over the current approach. It's not that a more conceptually pure design is
> an insufficient reason, IMHO, but it would be interesting to hear about
> other implications.

http://mail.python.org/pipermail/python-dev/2012-November/122599.html

http://bugs.python.org/issue13429

http://bugs.python.org/issue16392

Yes, it definitely needs a PEP.

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Reaping threads and subprocesses

2013-08-11 Thread Serhiy Storchaka

Some tests uses the following idiom:

def test_main():
try:
test.support.run_unittest(...)
finally:
test.support.reap_children()

Other tests uses the following idiom:

def test_main():
key = test.support.threading_setup()
try:
test.support.run_unittest(...)
finally:
test.support.threading_cleanup(*key)

or in other words:

@test.support.reap_threads
def test_main():
test.support.run_unittest(...)

These tests are not discoverable. There are some ways to make them 
discoverable.


1. Create unittest.TestCase subclasses or mixins with overloaded the 
run() method.


class ThreadReaped:
def run(self, result):
key = test.support.threading_setup()
try:
return super().run(result)
finally:
test.support.threading_cleanup(*key)


class ChildReaped:
def run(self, result):
try:
return super().run(result)
finally:
test.support.reap_children()

2. Create unittest.TestCase subclasses or mixins with overloaded 
setUpClass() and tearDownClass() methods.


class ThreadReaped:
@classmethod
def setUpClass(cls):
cls._threads = test.support.threading_setup()
@classmethod
def tearDownClass(cls):
test.support.threading_cleanup(*cls._threads)

class ChildReaped:
@classmethod
def tearDownClass(cls):
test.support.reap_children()

3. Create unittest.TestCase subclasses or mixins with overloaded setUp() 
and tearDown() methods.


class ThreadReaped:
def setUp(self):
self._threads = test.support.threading_setup()
def tearDown(self):
test.support.threading_cleanup(*self._threads)

class ChildReaped:
def tearDown(self):
test.support.reap_children()

4. Create unittest.TestCase subclasses or mixins with using addCleanup() 
in constructor.


class ThreadReaped:
def __init__(self):
self.addCleanup(test.support.threading_cleanup,
*test.support.threading_setup())

class ChildReaped:
def __init__(self):
self.addCleanup(test.support.reap_children)

Of course instead subclassing we can use decorators which modify test class.

What method is better? Do you have other suggestions?

The issue where this problem was first occurred:
http://bugs.python.org/issue16968.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Green buildbot failure.

2013-08-11 Thread David Bolen
Richard Oudkerk  writes:

> On 11/08/2013 11:00am, Antoine Pitrou wrote:
>> You've got the answer at the bottom:
>>
>>"program finished with exit code 0"
>>
>> So for some reason, the test suite crashed, but with a successful exit
>> code. Buildbot thinks it ran fine.
>
> Was the test terminated because it took too long?

Yes, it looks like it.

This test (and one on the XP-4 buildbot in the same time frame) was
terminated by an external watchdog script that kills python_d
processes that have been running for more than 2 hours.  I put the
script in place (quite a while back) as a workaround for failures that
would strand a python process, blocking future tests due to files
remaining in use.  It's a last ditch, crude, sledge-hammer.

Historically, if this code ran, the buildbot had already itself timed
out, so the exit code (which I can't control) wasn't very important.
2 hours had been conservative (and a trade-off as longer values also
risks failing more future tests) but it may need to be increased.

In this particular case it was a false alarm - the host was heavily
loaded during this time frame, which I think prolonged the test time
by an unusually large amount.

-- David

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Green buildbot failure.

2013-08-11 Thread Victor Stinner
2013/8/11 David Bolen :
>> Was the test terminated because it took too long?
>
> Yes, it looks like it.
>
> This test (and one on the XP-4 buildbot in the same time frame) was
> terminated by an external watchdog script that kills python_d
> processes that have been running for more than 2 hours.  I put the
> script in place (quite a while back) as a workaround for failures that
> would strand a python process, blocking future tests due to files
> remaining in use.  It's a last ditch, crude, sledge-hammer.

test.regrtest uses faulthandler.dump_traceback_later() to stop the
test after a timeout if --timeout command line option is used.

http://docs.python.org/dev/library/faulthandler.html#faulthandler.dump_traceback_later

Do you pass this option?

The timeout is not global but one a single function of a test file, so
you can use shorter timeout. It has also the advantage of dumping the
traceback of all Python threads before exiting. It didn't try this
feature recently on Windows, but it is supposed to work :-)

Victor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] redesigning the extension module initialisation protocol (was: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others))

2013-08-11 Thread Nick Coghlan
On 11 Aug 2013 09:55, "Stefan Behnel"  wrote:
>
> Nick Coghlan, 11.08.2013 15:19:
> > On 11 Aug 2013 09:02, "Stefan Behnel" wrote:
> >>> BTW, this already suggests a simple module initialisation interface.
The
> >>> extension module would expose a function that returns a module type,
and
> >>> the loader/importer would then simply instantiate that. Nothing else
is
> >>> needed.
> >>
> >> Actually, strike the word "module type" and replace it with "type". Is
> >> there really a reason why Python needs a module type at all? I mean,
you
> >> can stick arbitrary objects in sys.modules, so why not allow arbitrary
> >> types to be returned by the module creation function?
> >
> > That's exactly what I have in mind, but the way extension module imports
> > currently work means we can't easily do it just yet. Fortunately,
importlib
> > means we now have some hope of fixing that :)
>
> Well, what do we need? We don't need to care about existing code, as long
> as the current scheme is only deprecated and not deleted. That won't
happen
> before Py4 anyway. New code would simply export a different symbol when
> compiling for a CPython that supports it, which points to the function
that
> returns the type.
>
> Then, there's already the PyType_Copy() function, which can be used to
> create a heap type from a statically defined type. So extension modules
can
> simply define an (arbitrary) additional type in any way they see fit, copy
> it to the heap, and return it.
>
> Next, we need to define a signature for the type's __init__() method.

We need the "ModuleSpec" object to pass here, which is what we're currently
working on in import-sig.

We're not going to define something specifically for C extensions when
other modules suffer related problems.

Cheers,
Nick.

This
> can be done in a future proof way by allowing arbitrary keyword arguments
> to be added, i.e. such a type must have a signature like
>
> def __init__(self, currently, used, pos, args, **kwargs)
>
> and simply ignore kwargs for now.
>
> Actually, we may get away with not passing all too many arguments here if
> we allow the importer to add stuff to the type's dict in between,
> specifically __file__, __path__ and friends, so that they are available
> before the type gets instantiated. Not sure if this is a good idea, but it
> would at least relieve the user from having to copy these things over from
> some kind of context or whatever we might want to pass in.
>
> Alternatively, we could split the instantiation up between tp_new() and
> tp_init(), and let the importer set stuff on the instance dict in between
> the two. But given that this context won't actually change once the shared
> library is loaded, the only reason to prefer modifying the instance
instead
> of the type would be to avoid requiring a tp_dict for the type. Open for
> discussion, I guess.
>
> Did I forget anything? Sounds simple enough to me so far.
>
> Stefan
>
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (New) PEP 446: Make newly created file descriptors non-inheritable

2013-08-11 Thread Victor Stinner
Hi,

I fixed various bugs in the implementation of the (new) PEP 446:
http://hg.python.org/features/pep-446

At revision da685bd67524, the full test suite pass on:

- Fedora 18 (Linux 3.9), x86_64
- FreeBSD 9.1, x86_64
- Windows 7 SP1, x86_64
- OpenIndiana (close to Solaris 11), x86_64

Some tests are failing, but these failures are unrelated to the PEP
446 (same tests are failing in the original Python):

- Windows: test_signal, failure related to faulthandler (issue already
fixed in default)
- OpenIndiana: test_locale, test_uuid

Victor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (New) PEP 446: Make newly created file descriptors non-inheritable

2013-08-11 Thread Victor Stinner
2013/8/12 Victor Stinner :
> I fixed various bugs in the implementation of the (new) PEP 446:
> http://hg.python.org/features/pep-446
>
> At revision da685bd67524, the full test suite pass on: (...)

I also checked the usage of atomic flags. There was a minor bug on
Linux, it is now fixed (remove an useless call to fcntl to check if
SOCK_CLOEXEC works).


open(): On Linux, FreeBSD and Solaris 11, O_CLOEXEC flag is used.
fcntl(F_GETFD) is only called once for all file descriptors, to check
if O_CLOEXEC works. On Windows, O_NOINHERIT is used.

socket.socket(): On Linux, SOCK_CLOEXEC flag is used, no extra syscall
is required.

os.pipe(): On Linux, pipe2() is used with O_CLOEXEC.

On other platforms, os.set_inheritable() must be called to make the
new file descriptors non-inheritables.


On Windows, the atomic flag WSA_FLAG_NO_HANDLE_INHERIT is not used to
create a socket. I don't know the Windows well enough to make such
change.

My OpenIndiana VM looks to be older than Solaris 11: O_CLOEXEC flag is missing.


I regenerated the patch in the isssue: http://bugs.python.org/issue18571

Victor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] redesigning the extension module initialisation protocol

2013-08-11 Thread Stefan Behnel
Nick Coghlan, 12.08.2013 00:41:
> On 11 Aug 2013 09:55, "Stefan Behnel" wrote:
> this already suggests a simple module initialisation interface.
> The
> extension module would expose a function that returns a module type,
> and
> the loader/importer would then simply instantiate that. Nothing else
> is needed.
 Actually, strike the word "module type" and replace it with "type".
>> [...]
>> Next, we need to define a signature for the type's __init__() method.
> 
> We need the "ModuleSpec" object to pass here, which is what we're currently
> working on in import-sig.

Ok but that's just the very final step. All the rest is C-API specific.

And for clarification: you want to let the importer create the ModuleSpec
object and the pass it into the module's __init__ method?

I guess it could also be passed into the type creation function then,
right? Since it wouldn't harm to do that, I think it's a good idea to
provide as much information to the extension module as possible, as early
as we can, and that's the first time we talk to the shared library.

I've started writing up a pre-PEP that describes this protocol. I think it
makes sense to keep it separate from the ModuleSpec PEP as the latter can
easily be accepted without changing anything at the C-API level, but it
shouldn't happen the other way round.

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com