Hello,

I've just recently realized the huge problems surrounding the mix of multithreading and fork() - i.e that only the main thread actually survived the fork(), and that process data (in particular, synchronization primitives) could be left in a dangerously broken state because of such forks, if multithreaded programs.

What bothers me most is that I've actually never seen, in python docs, any mention of that problems (linux docs are very discreet as well). It's as if multithreading and multiprocessing were orthogonal designs, whereas it can quickly happen that someone has a slightly multithreaded programs, and suddenly uses the multiprocessing module to perform a separate, performance-demanding task ; with disasters in store, since few people are blatantly aware of the underlying dangers...

So here are a few propositions to improve this matter :

* documenting the fork/multithreading danger, in fork(), multiprocessing and maybe subprocess (is it concerned, or is the fork+exec always safe ?) modules. If it's welcome, I might provide documentation patches of course.

* providing means of taming the fork() beast : is there a possibility for the inclusion of python-atfork and similar projects into the stdlib (I mean, their semantic, not the monkey-patch way they currently use) ? It would also help a lot the proper management of file handle inheritance.

* maybe the most important : providing means to get rid of fork() whenever wanted. I'm especially thinking about the multiprocessing module : it seems it always uses forking on *nix platforms. Wouldn't it be better to also offer a spawnl() semantic, to allow safe multiprocessing use even in applications crowded with threads ? Win32 already uses something like that, so all the infrastructure of data transfer is already there, and it would enforce cross-platform compatibility. Since multiprocessing theoretically means a low coupling, and little sharing of data, I guess this kind of spawnl() semantic would be highly sufficient for most situations, which don't require fork-based multiprocessing and its huge sharing of process data (in my opinion, inheriting file descriptors is all a child process can require from its parent.

Does it make sense to you ?

Regards,
Pascal Chambon

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to