[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2014-09-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: You should certainly create a new issue! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2014-09-11 Thread Dan O'Reilly
Dan O'Reilly added the comment: Thanks, Antoine. I've opened issue22393. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2014-09-10 Thread Dan O'Reilly
Dan O'Reilly added the comment: Is it possible to have this issue re-opened, so that the new patch is more likely to get attention? Or should I create a new issue for the multiprocessing patch? -- ___ Python tracker rep...@bugs.python.org

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2014-08-24 Thread Dan O'Reilly
Dan O'Reilly added the comment: So, concurrent.futures is fixed now. Unless someone wants to patch multiprocessing.Pool, I am closing this issue. I realize I'm 3 years late on this, but I've put together a patch for multiprocessing.Pool. Should a process in a Pool unexpectedly exit

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2014-08-24 Thread Ned Deily
Changes by Ned Deily n...@acm.org: -- nosy: +sbt ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___ Python-bugs-list mailing list

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2014-08-24 Thread Brian Curtin
Changes by Brian Curtin br...@python.org: -- nosy: -brian.curtin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___ Python-bugs-list

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-06-10 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: test_multiprocessing crashs ~700 times on Mac OS X Tiger, regression likely introduced by this issue (6d6099f7fe89): I created issue #12310 for that. -- ___ Python tracker

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-06-08 Thread Roundup Robot
Roundup Robot devnull@devnull added the comment: New changeset 6d6099f7fe89 by Antoine Pitrou in branch 'default': Issue #9205: concurrent.futures.ProcessPoolExecutor now detects killed http://hg.python.org/cpython/rev/6d6099f7fe89 -- nosy: +python-dev

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-06-08 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: So, concurrent.futures is fixed now. Unless someone wants to patch multiprocessing.Pool, I am closing this issue. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-06-07 Thread Charles-François Natali
Charles-François Natali neolo...@free.fr added the comment: Ok, the dependencies are now committed. Here is a new patch addressing Charles-François' comments: select() is now called before each call to read() when sentinels are given, to avoid race conditions. The patch looks fine to me

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-06-06 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Ok, the dependencies are now committed. Here is a new patch addressing Charles-François' comments: select() is now called before each call to read() when sentinels are given, to avoid race conditions. -- stage: - patch review Added

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-16 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Thus, Connection.recv_bytes() will be called: def _recv_bytes(self, maxsize=None): buf = self._recv(4) size, = struct.unpack(=i, buf.getvalue()) if maxsize is not None and size maxsize: return None

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-14 Thread Charles-François Natali
Charles-François Natali neolo...@free.fr added the comment: Indeed, it isn't, Pipe objects are not meant to be safe against multiple access. Queue objects (in multiprocessing/queues.py) use locks so they are safe. But if the write to the Pipe is not atomic, then the select isn't safe. select

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-13 Thread Charles-François Natali
Charles-François Natali neolo...@free.fr added the comment: Antoine, I've got a couple questions concerning your patch: - IIUC, the principle is to create a pipe for each worker process, so that when the child exits the read-end - sentinel - becomes readable (EOF) from the parent, so you know

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-13 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Antoine, I've got a couple questions concerning your patch: - IIUC, the principle is to create a pipe for each worker process, so that when the child exits the read-end - sentinel - becomes readable (EOF) from the parent, so you know that a

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-13 Thread Charles-François Natali
Charles-François Natali neolo...@free.fr added the comment: Not exactly. The select is done on the queue's pipe and on the workers' fds *at the same time*. Thus there's no race condition. You're right, I missed this part, it's perfectly safe. But I think there's a problem with the new

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-13 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: But Lib/multiprocessing/connection.py does: def _send_bytes(self, buf): # For wire compatibility with 3.2 and lower n = len(buf) self._send(struct.pack(=i, len(buf))) # The condition is necessary to

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-13 Thread Jesús Cea Avión
Changes by Jesús Cea Avión j...@jcea.es: -- nosy: +jcea ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___ Python-bugs-list mailing list

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-09 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Part of the patch submitted standalone in issue12040. -- dependencies: +Expose a Process.sentinel property (and fix polling loop in Process.join()) ___ Python tracker rep...@bugs.python.org

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-08 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Ok, here's a patch for the new approach. CancelIoEx is loaded dynamically and, if unavailable, CancelIo is used instead. I take care to cancel or complete the I/O in the same method call where it is initiated, meaning there's no

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-07 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Here is an updated patch. Much of it consists of changes in the Windows Connection implementation, where I had to use overlapped I/O in order to use WaitForMultipleObjects on named pipes. test_concurrent_futures sometimes blocks (under

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-07 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Ok, this patch seems fully debugged (under Windows and Linux). A couple of things come in addition, such as removing repeated polling in PipeConnection.poll() and _Popen.wait(). -- Added file:

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-07 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Hum, I get a strange skip on a XP buildbot: [224/354] test_multiprocessing test_multiprocessing skipped -- DLL load failed: The specified procedure could not be found. Yet _multiprocessing was compiled fine... Does anyone know what it means?

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-07 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Hum, I get a strange skip on a XP buildbot: [224/354] test_multiprocessing test_multiprocessing skipped -- DLL load failed: The specified procedure could not be found. Yet _multiprocessing was compiled fine... Does anyone know what it

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-03 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Actually, it came to me that if a child process exists, the queues are not guaranteed to be a consistent state anymore (the child could have terminated in the middle of a partial read or write). So it may be better to simply declare the

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-03 Thread Brian Quinlan
Brian Quinlan br...@sweetapp.com added the comment: Under what circumstances do we expect a ProcessPoolExecutor child process to be killed outside of the control of the ProcessPoolExecutor? If the user kills a child then maybe all we want to do is raise an exception rather than deadlock as a

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-03 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Under what circumstances do we expect a ProcessPoolExecutor child process to be killed outside of the control of the ProcessPoolExecutor? Killed by the user, or by an automatic device (such as the Linux OOM killer), or crashed. If the user

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-03 Thread Brian Quinlan
Brian Quinlan br...@sweetapp.com added the comment: Killed by the user, or by an automatic device (such as the Linux OOM killer), or crashed. Crashed would be bad - it would indicate a bug in the ProcessPoolExecutor code. If the user kills a child then maybe all we want to do is raise an

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-03 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Killed by the user, or by an automatic device (such as the Linux OOM killer), or crashed. Crashed would be bad - it would indicate a bug in the ProcessPoolExecutor code. I meant a crash in Python itself, or any third-party extension

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-05-02 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Here is a proof-of-concept patch that makes concurrent.futures able to detect killed processes. Works only under POSIX, and needs issue11743. I'm not sure it's a good idea to change the multiprocessing public API (SimpleQueue.get()) for this.

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-04-27 Thread Gökçen Eraslan
Changes by Gökçen Eraslan gok...@pardus.org.tr: -- nosy: +Gökçen.Eraslan ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-04-02 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- dependencies: +Rewrite PipeConnection and Connection in pure Python ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-03-31 Thread STINNER Victor
STINNER Victor victor.stin...@haypocalc.com added the comment: Issue #11663 has been marked as a duplicate. -- nosy: +bquinlan, haypo, pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-03-31 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: The problem with this approach is that it won't help concurrent.futures. Detection of killed endpoints should ideally happen at a lower level, e.g. in Process or Queue or Connection objects. Speaking of which, I wonder why we have both

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-03-31 Thread Jesse Noller
Jesse Noller jnol...@gmail.com added the comment: On Thu, Mar 31, 2011 at 8:25 AM, Antoine Pitrou rep...@bugs.python.org wrote: Antoine Pitrou pit...@free.fr added the comment: Speaking of which, I wonder why we have both multiprocessing.Pool and concurrent.futures.ProcessPoolExecutor.

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-03-31 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Possible plan for POSIX, where a connection uses a pipe() or socketpair(): exploit the fact that an endpoint becomes ready for reading (indicating EOF) when the other endpoint is closed: r, w = os.pipe() select.select([r], [], [r], 0) ([],

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-03-31 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: (certainly not easy, sorry) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-03-31 Thread Jesse Noller
Jesse Noller jnol...@gmail.com added the comment: On Thu, Mar 31, 2011 at 9:05 AM, Antoine Pitrou rep...@bugs.python.org wrote: Antoine Pitrou pit...@free.fr added the comment: Possible plan for POSIX, where a connection uses a pipe() or socketpair(): exploit the fact that an endpoint

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-03-31 Thread R. David Murray
Changes by R. David Murray rdmur...@bitdance.com: -- keywords: -easy ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___ Python-bugs-list

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2011-03-08 Thread Qiangning Hong
Changes by Qiangning Hong hon...@gmail.com: -- nosy: +hongqn ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___ Python-bugs-list mailing

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-10-14 Thread Alexander Ljungberg
Changes by Alexander Ljungberg stillf...@gmail.com: -- nosy: +aljungberg ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-08-27 Thread Ask Solem
Ask Solem a...@opera.com added the comment: Does the problem make sense/do you have any ideas for an alternate solution? Well, I still haven't given up on the trackjobs patch. I changed it to use a single queue for both the acks and the result (see new patch attached:

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-08-27 Thread Greg Brockman
Greg Brockman g...@mit.edu added the comment: Hmm, a few notes. I have a bunch of nitpicks, but those can wait for a later iteration. (Just one style nit: I noticed a few unneeded whitespace changes... please try not to do that, as it makes the patch harder to read.) - Am I correct that you

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-08-27 Thread Ask Solem
Ask Solem a...@opera.com added the comment: New patch attach (termination-trackjobs3.patch). Hmm, a few notes. I have a bunch of nitpicks, but those can wait for a later iteration. (Just one style nit: I noticed a few unneeded whitespace changes... please try not to do that, as it makes the

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-08-27 Thread Greg Brockman
Greg Brockman g...@mit.edu added the comment: Ah, you're right--sorry, I had misread your code. I hadn't noticed the usage of the worker_pids. This explains what you're doing with the ACKs. Now, the problem is, I think doing it this way introduces some races (which is why I introduced the ACK

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-08-27 Thread Ask Solem
Ask Solem a...@opera.com added the comment: - A worker removes a job from the queue and is killed before sending an ACK. Yeah, this may be a problem. I was thinking we could make sure the task is acked before child process shutdown. Kill -9 is then not safe, but do we really want to

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-08-27 Thread Ask Solem
Ask Solem a...@opera.com added the comment: By the way, I'm also working on writing some simple benchmarks for the multiple queues approach, just to see if theres at all an overhead to worry about. -- ___ Python tracker rep...@bugs.python.org

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-08-26 Thread Andrey Vlasovskikh
Changes by Andrey Vlasovskikh andrey.vlasovsk...@gmail.com: -- nosy: +vlasovskikh ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-08-26 Thread Albert Strasheim
Changes by Albert Strasheim full...@gmail.com: -- nosy: +Albert.Strasheim ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-08-20 Thread Ask Solem
Ask Solem a...@opera.com added the comment: @greg Been very busy lately, just had some time now to look at your patch. I'm very ambivalent about using one SimpleQueue per process. What is the reason for doing that? -- ___ Python tracker

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-08-20 Thread Greg Brockman
Greg Brockman g...@mit.deu added the comment: Thanks for looking at it! Basically this patch requires the parent process to be able to send a message to a particular worker. As far as I can tell, the existing queues allow the children to send a message to the parent, or the parent to send a

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-08-13 Thread Greg Brockman
Greg Brockman g...@mit.deu added the comment: I'll take another stab at this. In the attachment (assign-tasks.patch), I've combined a lot of the ideas presented on this issue, so thank you both for your input. Anyway: - The basic idea of the patch is to record the mapping of tasks to

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-29 Thread Jesse Noller
Jesse Noller jnol...@gmail.com added the comment: (sorry, I thought I had replied to your comment when I hadn't!) I think we can get away with a new optional kwarg. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-27 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: You can't have a sensible default timeout, because the worker may be processing something important... In my case, the jobs are either functional or idempotent anyway, so aborting halfway through isn't a problem. In general though, I'm not

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-27 Thread Jesse Noller
Jesse Noller jnol...@gmail.com added the comment: You two are bigger users of this then I currently am (the curse/blessing of switching jobs), which is why I've let you hash it out. Let me point out: my goal is to deal with errors in a way which does not cause a total crash, a lockup, or

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-27 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: Thanks for the comment. It's good to know what constraints we have to deal with. we can not, however, change the API. Does this include adding optional arguments? -- ___ Python tracker

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-25 Thread Ask Solem
Ask Solem a...@opera.com added the comment: A potential implementation is in termination.patch. Basically, try to shut down gracefully, but if you timeout, just give up and kill everything. You can't have a sensible default timeout, because the worker may be processing something

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-21 Thread Ask Solem
Ask Solem a...@opera.com added the comment: At first glance, looks like there are a number of sites where you don't change the blocking calls to non-blocking calls (e.g. get()). Almost all of the get()s have the potential to be called when there is no possibility for them to terminate. I

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-21 Thread Ask Solem
Ask Solem a...@opera.com added the comment: Btw, the current problem with termination3.patch seems to be that the MainProcess somehow appears in self._pool. I have no idea how it gets there. Maybe some unrelated issue that appears when forking that late in the tests. --

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-21 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: I thought the EOF errors would take care of that, at least this has been running in production on many platforms without that happening. There are a lot of corner cases here, some more pedantic than others. For example, suppose a child dies

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-20 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: At first glance, looks like there are a number of sites where you don't change the blocking calls to non-blocking calls (e.g. get()). Almost all of the get()s have the potential to be called when there is no possibility for them to terminate.

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-16 Thread Ask Solem
Ask Solem a...@opera.com added the comment: but if you make a blocking call such as in the following program, you'll get a hang Yeah, and for that we could use the same approach as for the maps. But, I've just implemented the accept callback approach, which should be superior. Maps/Apply

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-16 Thread Ask Solem
Changes by Ask Solem a...@opera.com: Added file: http://bugs.python.org/file18026/multiprocessing-tr...@82502-termination-trackjobs.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-15 Thread Ask Solem
Ask Solem a...@opera.com added the comment: Greg, Before I forget, looks like we also need to deal with the result from a worker being un-unpickleable: This is what my patch in bug 9244 does... Yep. Again, as things stand, once you've lost an worker, you've lost a task, and you can't

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-15 Thread Ask Solem
Ask Solem a...@opera.com added the comment: Ok. I implemented my suggestions in the patch attached (multiprocessing-tr...@82502-termination2.patch) What do you think? Greg, Maybe we could keep the behavior in termination.patch as an option for map jobs? It is certainly a problem that map jobs

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-15 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: Before I forget, looks like we also need to deal with the result from a worker being un-unpickleable: This is what my patch in bug 9244 does... Really? I could be misremembering, but I believe you deal with the case of the result being

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-15 Thread Ask Solem
Changes by Ask Solem a...@opera.com: Removed file: http://bugs.python.org/file18013/multiprocessing-tr...@82502-termination2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-15 Thread Ask Solem
Ask Solem a...@opera.com added the comment: Just some small cosmetic changes to the patch. (added multiprocessing-tr...@82502-termination3.patch) -- Added file: http://bugs.python.org/file18015/multiprocessing-tr...@82502-termination3.patch ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-15 Thread Ask Solem
Ask Solem a...@opera.com added the comment: Really? I could be misremembering, but I believe you deal with the case of the result being unpickleable. I.e. you deal with the put(result) failing, but not the get() in the result handler. Your example is demonstrating the pickle error on

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-15 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: Actually, the program you demonstrate is nonequivalent to the one I posted. The one I posted pickles just fine because 'bar' is a global name, but doesn't unpickle because it doesn't exist in the parent's namespace. (See

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-15 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: Started looking at your patch. It seems to behave reasonably, although it still doesn't catch all of the failure cases. In particular, as you note, crashed jobs won't be noticed until the pool shuts down... but if you make a blocking call

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-14 Thread Ask Solem
Ask Solem a...@opera.com added the comment: There's one more thing if exitcode is not None: cleaned = True if exitcode != 0 and not worker._termination_requested: abnormal.append((worker.pid, exitcode)) Instead of restarting crashed worker

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-14 Thread Ask Solem
Ask Solem a...@opera.com added the comment: Jesse wrote, We can work around the shutdown issue (really, bug 9207) by ignoring the exception such as shutdown.patch does, or passing in references/adding references to the functions those methods need. Or (as Brett suggested) converting them

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-14 Thread Jesse Noller
Jesse Noller jnol...@gmail.com added the comment: Passing the references seems to be a losing game; for _handle_workers - we only need 1 function (debug) - for others (say _join_exited_workers), we need references to reversed/range/len. A possible alternative is to make those threads

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-14 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: Before I forget, looks like we also need to deal with the result from a worker being un-unpickleable: #!/usr/bin/env python import multiprocessing def foo(x): global bar def bar(x): pass return bar p = multiprocessing.Pool(1)

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-13 Thread Ask Solem
Ask Solem a...@opera.com added the comment: I think I misunderstood the purpose of the patch. This is about handling errors on get(), not on put() like I was working on. So sorry for that confusion. What kind of errors are you having that makes the get() call fail? If the queue is not

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-13 Thread Ask Solem
Ask Solem a...@opera.com added the comment: For reference I opened up a new issue for the put() case here: http://bugs.python.org/issue9244 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-13 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: What kind of errors are you having that makes the get() call fail? Try running the script I posted. It will fail with an AttributeError (raised during unpickling) and hang. I'll note that the particular issues that I've run into in practice

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-13 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: While looking at your patch in issue 9244, I realized that my code fails to handle an unpickleable task, as in: #!/usr/bin/env python import multiprocessing foo = lambda x: x p = multiprocessing.Pool(1) p.apply(foo, [1]) This should be fixed

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-12 Thread Ask Solem
Ask Solem a...@opera.com added the comment: termination.patch, in the result handler you've added: while cache and thread._state != TERMINATE and not failed why are you terminating the second pass after finding a failed process? Unpickleable errors and other errors occurring in the worker

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-12 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: Thanks much for taking a look at this! why are you terminating the second pass after finding a failed process? Unfortunately, if you've lost a worker, you are no longer guaranteed that cache will eventually be empty. In particular, you may

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-12 Thread Jesse Noller
Jesse Noller jnol...@gmail.com added the comment: Greg - I asked Ask to take a look - his celery package is a huge consumer of multiprocessing, and so I tend to run things past him as well. That said - to both of you - the fundamental problem the shutdown patch is trying to scratch is

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-12 Thread Ask Solem
Ask Solem a...@opera.com added the comment: Unfortunately, if you've lost a worker, you are no longer guaranteed that cache will eventually be empty. In particular, you may have lost a task, which could result in an ApplyResult waiting forever for a _set call. More generally, my chief

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-12 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: For processes disappearing (if that can at all happen), we could solve that by storing the jobs a process has accepted (started working on), so if a worker process is lost, we can mark them as failed too. Sure, this would be reasonable

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-10 Thread Jesse Noller
Jesse Noller jnol...@gmail.com added the comment: thanks greg; I'm going to take a look and think about this. I'd like to resolve bug 9207 first though -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-10 Thread Jesse Noller
Changes by Jesse Noller jnol...@gmail.com: -- assignee: - jnoller ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___ Python-bugs-list

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-10 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: Cool, thanks. I'll note that with this patch applied, using the test program from 9207 I consistently get the following exception: Exception in thread Thread-1 (most likely raised during interpreter shutdown): Traceback (most recent call

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-10 Thread Jesse Noller
Jesse Noller jnol...@gmail.com added the comment: Ugh. I'm going to have to think about the cleanest way of handling this case of functions vanishing from us since this is going to be more widespread inside the code. Suggestions welcome. -- ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-10 Thread Greg Brockman
Greg Brockman g...@ksplice.com added the comment: What about just catching the exception? See e.g. the attached patch. (Disclaimer: not heavily tested). -- Added file: http://bugs.python.org/file17934/shutdown.patch ___ Python tracker

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-10 Thread Jesse Noller
Jesse Noller jnol...@gmail.com added the comment: A+ for creativity; I wouldn't have thought of that ;) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-09 Thread R. David Murray
Changes by R. David Murray rdmur...@bitdance.com: -- nosy: +jnoller ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9205 ___ ___ Python-bugs-list

[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

2010-07-08 Thread Greg Brockman
New submission from Greg Brockman g...@ksplice.com: I have recently begun using multiprocessing for a variety of batch jobs. It's a great library, and it's been quite useful. However, I have been bitten several times by situations where a worker process in a Pool will unexpectedly die,