[issue6721] Locks in python standard library should be sanitized on fork

Nir Aides Mon, 04 Jul 2011 12:42:35 -0700

Nir Aides <n...@winpdb.org> added the comment:

> Sorry, I fail to see how the "import graph" is related to the correct
> lock acquisition order. Some locks are created dynamically, for
> example.


Import dependency is a reasonable heuristic to look into for inter-module 
locking order. 

The rational is explained in the following pthread_atfork man page:
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html
"A higher-level package may acquire locks on its own data structures before 
invoking lower-level packages. Under this scenario, the order specified for 
fork handler calls allows a simple rule of initialization for avoiding package 
deadlock: a package initializes all packages on which it depends before it 
calls the pthread_atfork() function for itself."

(The rational section is an interpretation which is not part of the standard)

A caveat is that since Python is an object oriented language it is more common 
than with C that code from a higher level module will be invoked by code from a 
lower level module, for example by calling an object method that was 
over-ridden by the higher level module - this actually happens in the logging 
module (emit method).

> That's why I asked for a specific API: when do you register a handler?
> When are they called? When are they reset?

Read the pthread_atfork man page.

> The whole point of atfork is to avoid breaking invariants and
> introduce invalid state in the child process. If there is one thing we
> want to avoid, it's precisely reading/writting corrupted data from/to
> files, so eluding the I/O problem seems foolish to me.

Please don't use insulting adjectives. 
If you think I am wrong, convincing me logically will do.

you can "avoid breaking invariants" using two different strategies:
1) Acquire locks before the fork and release/reset them after it.
2) Initialize the module to some known state after the fork.

For some (most?) modules it may be quite reasonable to initialize the module to 
a known state after the fork without acquiring its locks before the fork; this 
too is explained in the pthread_atfork man page:
"Alternatively, some libraries might be able to supply just a child routine 
that reinitializes the mutexes in the library and all associated states to some 
known value (for example, what it was when the image was originally executed)."

> > A  "critical section" lock that protects in-memory data should not be held 
> > for long.
>
> Not necessarily. See for example I/O locks and logging module, which
> hold locks until I/O completion.

Oops, I have always used the term "critical section" to describe a lock that 
protects data state as tightly as possible, ideally not even across function 
calls but now I see the Wikipedia defines one to protect any resource including 
IO.

The logging module locks the entire emit() function which I think is wrong. 
It should let the derived handler take care of locking when it needs to, if it 
needs to at all.

The logging module is an example for a module we should reinitialize after the 
fork without locking its locks before the fork.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6721>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6721] Locks in python standard library should be sanitized on fork

Reply via email to