Hello,
I've just recently realized the huge problems surrounding the mix of
multithreading and fork() - i.e that only the main thread actually
survived the fork(), and that process data (in particular,
synchronization primitives) could be left in a dangerously broken state
because of such forks, if multithreaded programs.
What bothers me most is that I've actually never seen, in python docs,
any mention of that problems (linux docs are very discreet as well).
It's as if multithreading and multiprocessing were orthogonal designs,
whereas it can quickly happen that someone has a slightly multithreaded
programs, and suddenly uses the multiprocessing module to perform a
separate, performance-demanding task ; with disasters in store, since
few people are blatantly aware of the underlying dangers...
So here are a few propositions to improve this matter :
* documenting the fork/multithreading danger, in fork(), multiprocessing
and maybe subprocess (is it concerned, or is the fork+exec always safe
?) modules. If it's welcome, I might provide documentation patches of
course.
* providing means of taming the fork() beast : is there a possibility
for the inclusion of python-atfork and similar projects into the stdlib
(I mean, their semantic, not the monkey-patch way they currently use) ?
It would also help a lot the proper management of file handle inheritance.
* maybe the most important : providing means to get rid of fork()
whenever wanted. I'm especially thinking about the multiprocessing
module : it seems it always uses forking on *nix platforms. Wouldn't it
be better to also offer a spawnl() semantic, to allow safe
multiprocessing use even in applications crowded with threads ? Win32
already uses something like that, so all the infrastructure of data
transfer is already there, and it would enforce cross-platform
compatibility. Since multiprocessing theoretically means a low coupling,
and little sharing of data, I guess this kind of spawnl() semantic would
be highly sufficient for most situations, which don't require fork-based
multiprocessing and its huge sharing of process data (in my opinion,
inheriting file descriptors is all a child process can require from its
parent.
Does it make sense to you ?
Regards,
Pascal Chambon
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com