[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

Greg Brockman Mon, 12 Jul 2010 18:02:29 -0700

Greg Brockman <g...@ksplice.com> added the comment:

> For processes disappearing (if that can at all happen), we could solve
> that by storing the jobs a process has accepted (started working on),
> so if a worker process is lost, we can mark them as failed too.
Sure, this would be reasonable behavior.  I had considered it but decided it as 
a larger change than I wanted to make without consulting the devs.


> I was already working on this issue last week actually, and I managed
> to do that in a way that works well enough (at least for me):
If I'm reading this right, you catch the exception upon pickling the result (at 
which point you have the job/i information already; totally reasonable).  I'm 
worried about the case of unpickling the task failing.  (Namely, the "task = 
get()" line of the "worker" method.)  Try running the following:
"""
#!/usr/bin/env python
import multiprocessing
p = multiprocessing.Pool(1)
def foo(x):
  pass
p.apply(foo, [1])
"""
And if "task = get()" fails, then the worker doesn't know what the relevant 
job/i values are.

Anyway, so I guess the question that is forming in my mind is, what sorts of 
errors do we want to handle, and how do we want to handle them?  My answer is 
I'd like to handle all possible errors with some behavior that is not "hang 
forever".  This includes handling children processes dying by signals or 
os._exit, raising unpickling errors, etc.

I believe my patch provides this functionality.  By adding the extra mechanism 
that you've written/proposed, we can improve the error handling in specific 
recoverable cases (which probably constitute the vast majority of real-world 
cases).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9205>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

Reply via email to