Greg Brockman <g...@ksplice.com> added the comment:

Thanks much for taking a look at this!

> why are you terminating the second pass after finding a failed 
> process?
Unfortunately, if you've lost a worker, you are no longer guaranteed that cache 
will eventually be empty.  In particular, you may have lost a task, which could 
result in an ApplyResult waiting forever for a _set call.

More generally, my chief assumption that went into this is that the unexpected 
death of a worker process is unrecoverable.  It would be nice to have a better 
workaround than just aborting everything, but I couldn't see a way to do that.

> Unpickleable errors and other errors occurring in the worker body are
> not exceptional cases, at least not now that the pool is supervised
> by _handle_workers.
I could be wrong, but that's not what my experiments were indicating.  In 
particular, if an unpickleable error occurs, then a task has been lost, which 
means that the relevant map, apply, etc. will wait forever for completion of 
the lost task.

> I think the result should be set also in this case, so the user can
> inspect the exception after the fact.
That does sound useful.  Although, how can you determine the job (and the value 
of i) if it's an unpickleable error?  It would be nice to be able to retrieve 
job/i without having to unpickle the rest.

> For shutdown.patch, I thought this only happened in the worker 
> handler, but you've enabled this for the result handler too? I don't 
> care about the worker handler, but with the result handler I'm 
> worried that I don't know what ignoring these exceptions actually 
> means.
You have a good point.  I didn't think about the patch very hard.  I've only 
seen these exceptions from the worker handler, but AFAICT there's no guarantee 
that bad luck with the scheduler wouldn't result in the same problem in the 
result handler.  One option would be to narrow the breadth of the exceptions 
caught by _make_shutdown_safe (do we need to catch anything but TypeErrors?).  
Another option would be to enable only for the worker handler.  I don't have a 
particularly great sense of what the Right Thing to do here is.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9205>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to