[issue26793] uuid causing thread issues when forking using os.fork py3.4+
Andre Merzky added the comment: This one might be related: https://bugs.python.org/issue27889 -- nosy: +Andre Merzky ___ Python tracker <http://bugs.python.org/issue26793> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27889] ctypes interfers with signal handling
Andre Merzky added the comment: FWIW, a workaround seems to be a nested try/except clause: try: try: do_lots_of_work() except RuntimeError as e: print 'signal handling worked' except RuntimeError: print 'signal handling delayed' I did a stress test over 100k runs, and got no unexpected behavior that way. I am not sure if the underlying race condition is just hidden, or if the inner except clause triggers the signal handling code internally -- either way, while cumbersome to use, it seems to work. I'd still appreciate feedback on a cleaner solution. Thanks, Andre. -- ___ Python tracker <http://bugs.python.org/issue27889> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27889] ctypes interfers with signal handling
Andre Merzky added the comment: I would appreciate any suggestion on how to avoid this problem -- otherwise it seems that I can't reliably use signals in threaded contexts at all :( Thanks, Andre. -- ___ Python tracker <http://bugs.python.org/issue27889> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27889] ctypes interfers with signal handling
Andre Merzky added the comment: I think we are on the same page then, thanks. AFAIU, the C-level signal handler results in a flag being set, which is evaluated at some later point in time[1], after a certain number of opcodes have been executed. Could it be that those opcodes blindly continue to walk into the `else` clause despite the sleep interruption? [1] https://docs.python.org/3/library/signal.html#execution-of-python-signal-handlers -- ___ Python tracker <http://bugs.python.org/issue27889> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27889] ctypes interfers with signal handling
Andre Merzky added the comment: Hi George, > From these results, it appears there is no guarentee that the signal handler > will run before the main thread continues execution at the time.sleep(500) > line. This would explain why we advance to the else clause before the > exception is raised. To me it looks like the problem pops up *way* before the `sleep(100)` (or whatever) finishes, in fact it looks consistently like the sleep is indeed interrupted after one second. I would it thus interpret differently, as the code should not be able to advance to the `else` clause at that time. Is that different for you? -- ___ Python tracker <http://bugs.python.org/issue27889> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27889] ctypes interfers with signal handling
Andre Merzky added the comment: Also thanks for the reply! :) Interesting that our results differ: as said, my code stumbles over it rather frequently, and I can't reproduce w/o ctypes activities. But indeed, the latter might just reinforce a race condition which is present either way... Yes, I also observed that a print statement (or other activities) makes the problem appear less frequently -- I would also guess that it influences a potential race somewhere. Thanks for looking into it! Andre. -- ___ Python tracker <http://bugs.python.org/issue27889> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27889] ctypes interfers with signal handling
Andre Merzky added the comment: > The repro is tied to the time.sleep call in the try block. If I do > time.sleep(1) Yes - if both sleeps, the one in the `try` block and the one in the thread's routine (`sub`) are equal, then you'll have the typical race, and you can well be in the `finally` clause when the exception arises. The problem though is different: please feel free to set the sleep in the `try` block to `sleep(100)` -- and unless the thread creation and startup takes 99 seconds, you should *not* run into that race, the problem however persists: --- merzky@thinkie:~ $ grep -C 2 sleep bug.py def sub(pid): time.sleep(1) os.kill(pid, signal.SIGUSR2) -- t = mt.Thread(target=sub, args=[os.getpid()]) t.start() time.sleep(100) except Exception as e: print 'except: %s' % e merzky@thinkie:~ $ while true; do i=$((i+1)); echo -n "$i: "; python bug.py || break; done 1: except: caught sigusr2 2: except: caught sigusr2 3: except: caught sigusr2 4: Traceback (most recent call last): File "bug.py", line 30, in print 'unexcepted' File "bug.py", line 14, in sigusr2_handler raise RuntimeError('caught sigusr2') RuntimeError: caught sigusr2 -- In this case, the begaviour does depend on `ctypes` -- or at least I have not seen that problem without `ctypes` being used. Thanks, Andre. -- ___ Python tracker <http://bugs.python.org/issue27889> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27889] ctypes interfers with signal handling
Andre Merzky added the comment: I also see the problem on 2.7.11, on MacOS, but with significantly lower frequency. I can't tell if that is due to changes in Python, the different architecture, or whatever... $ python -V Python 2.7.11 $ uname -a Darwin cameo.local 14.4.0 Darwin Kernel Version 14.4.0: Thu May 28 11:35:04 PDT 2015; root:xnu-2782.30.5~1/RELEASE_X86_64 x86_64 $ while true; do i=$((i+1)); echo -n "$i: "; python t.py || break; done 1: except: caught sigusr2 2: except: caught sigusr2 3: except: caught sigusr2 ... 134: except: caught sigusr2 Traceback (most recent call last): File "t.py", line 30, in print 'unexcepted' File "t.py", line 14, in sigusr2_handler raise RuntimeError('caught sigusr2') RuntimeError: caught sigusr2 -- ___ Python tracker <http://bugs.python.org/issue27889> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27889] ctypes interfers with signal handling
Andre Merzky added the comment: I can confirm the same unexpected behavior for Python 2.7.6 / Linux radical 4.2.0-25-generic #30~14.04.1-Ubuntu (2nd attempt) Python 2.7.9 / Linux login1.stampede.tacc.utexas.edu 2.6.32-431.17.1.el6.x86_64 (32nd attempt) Python 2.7.10 / Linux gordon-ln2.sdsc.edu 2.6.32-431.29.2.el6.x86_64 (5th attempt) Any suggestion for further testing? -- ___ Python tracker <https://bugs.python.org/issue27889> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27889] ctypes interfers with signal handling
Andre Merzky added the comment: Thanks for checking! I use: $ uname -a Linux thinkie 3.11-2-amd64 #1 SMP Debian 3.11.8-1 (2013-11-13) x86_64 GNU/Linux $ python -V Python 2.7.5+ Note that the problem does not occur on every run -- but in more than 50% of the cases, for me. I am on a somewhat old machine right now (M620), not sure if that matters, and will try to reproduce on some other boxes tomorrow. -- ___ Python tracker <https://bugs.python.org/issue27889> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27889] ctypes interfers with signal handling
New submission from Andre Merzky: Summary: A thread uses signals to the MainThread to signal an abnormal condition. The main thread is expected to get that signal, invoke the signal handler, and raise an exception. In combination with 'ctypes' calls, that does not happen. Consider the following code: #!/usr/bin/env python import multiprocessing as mp import threading as mt import signal import time import os # from uuid.py line 400 import ctypes, ctypes.util lib = ctypes.CDLL(ctypes.util.find_library('uuid')) def sigusr2_handler(signum, frame): raise RuntimeError('caught sigusr2') signal.signal(signal.SIGUSR2, sigusr2_handler) def sub(pid): time.sleep(1) os.kill(pid, signal.SIGUSR2) try: # p = mp.Process(target=sub, args=[os.getpid()]) # p.start() t = mt.Thread(target=sub, args=[os.getpid()]) t.start() time.sleep(3) except Exception as e: print 'except: %s' % e else: print 'unexcepted' finally: # p.join() t.join() With Python 2.7 I would expect the output: except: caught sigusr2 but I get: Traceback (most recent call last): File "./bug.py", line 29, in print 'unexcepted' File "./bug.py", line 13, in sigusr2_handler raise RuntimeError('caught sigusr2') File "./bug.py", line 29, in print 'unexcepted' File "./bug.py", line 13, in sigusr2_handler raise RuntimeError('caught sigusr2') RuntimeError: caught sigusr2 most of the time. The problem only happens when the 'ctypes.CDLL' line is enabled -- commenting it out results in the expected behavior. That line is copied from uuid.py -- importing uuid.py triggers the same unexpected behavior, which is ultimately why I am stuck. Note that the problem only occurs when a *thread* sends the signal -- it does *not* happen if the signal is sent by the main thread or by a different process (switch to the multiprocessing code path for confirmation). Interestingly, the problem also disappears when a 'print' statement is added after the 'time.sleep(3)', which may (or may not) indicate a timing issue. I would welcome any suggestion on how to further triage this. -- components: ctypes messages: 273889 nosy: Andre Merzky priority: normal severity: normal status: open title: ctypes interfers with signal handling type: behavior versions: Python 2.7 ___ Python tracker <https://bugs.python.org/issue27889> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23395] _thread.interrupt_main() errors if SIGINT handler in SIG_DFL, SIG_IGN
Andre Merzky added the comment: I can confirm that the patch provided by Victor addresses the problem. It seems to be against the current HEAD -- is there a chance that this will be backported to 2.7 (which is what I need to use)? Thanks! Andre. -- ___ Python tracker <http://bugs.python.org/issue23395> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23395] _thread.interrupt_main() errors if SIGINT handler in SIG_DFL, SIG_IGN
Andre Merzky added the comment: It seems you were right, that needed a DecRef indeed: the IncRef is already called on construction. The DecRef for the result also needed fixing - an updated patch is attached. -- Added file: http://bugs.python.org/file43546/issue23395.2.patch ___ Python tracker <http://bugs.python.org/issue23395> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23395] _thread.interrupt_main() errors if SIGINT handler in SIG_DFL, SIG_IGN
Andre Merzky added the comment: thanks for looking into this! And also, thanks for the details in the original bug report -- I found this by chance, after unsuccessfully banging my head against this very problem for quite a while! I am not sure if the DecRef needs to be called really if the arglist is not stored or passed on. But thanks for pointing that out, I'll check if I can find where the corresponding IncRef is called (assuming that happens somewhere). -- ___ Python tracker <http://bugs.python.org/issue23395> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23395] _thread.interrupt_main() errors if SIGINT handler in SIG_DFL, SIG_IGN
Andre Merzky added the comment: Attached is a patch which seems to resolve the issue for me -- this now triggers the expected `KeyboardInterrupt` in the main thread. For verification one can run the unaltered code as provided by Thomas. I would very much appreciate feedback, to make sure that the semantics is actually what one would expect. The patch is against the 2.7 branch from https://github.com/python/cpython.git, and I did not test it against any other branch. I also opened a pull request (https://github.com/python/cpython/pull/39). -- keywords: +patch Added file: http://bugs.python.org/file43544/issue23395.patch ___ Python tracker <http://bugs.python.org/issue23395> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23395] _thread.interrupt_main() errors if SIGINT handler in SIG_DFL, SIG_IGN
Andre Merzky added the comment: Thanks Thomas. -- ___ Python tracker <http://bugs.python.org/issue23395> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23395] _thread.interrupt_main() errors if SIGINT handler in SIG_DFL, SIG_IGN
Changes by Andre Merzky : -- versions: +Python 2.7 ___ Python tracker <http://bugs.python.org/issue23395> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23395] _thread.interrupt_main() errors if SIGINT handler in SIG_DFL, SIG_IGN
Andre Merzky added the comment: Thanks for the PingBack, Thomas. I am not very familiar with the Python community approach to bug reports, so can someone comment if that is worth waiting for to get fixed, or is it that a rather futile hope without providing a patch myself? I don't think I currently have the resources to dig into the details... Thanks, Andre. -- ___ Python tracker <http://bugs.python.org/issue23395> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23395] _thread.interrupt_main() errors if SIGINT handler in SIG_DFL, SIG_IGN
Andre Merzky added the comment: Did anything ever come of this? I also frequently stumble over that error -- but alas not in a debug setting, but on actual running code... :/ -- nosy: +Andre Merzky ___ Python tracker <http://bugs.python.org/issue23395> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24862] subprocess.Popen behaves incorrect when moved in process tree
Changes by Andre Merzky : Added file: http://bugs.python.org/file40449/subprocess.py.diff ___ Python tracker <http://bugs.python.org/issue24862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24862] subprocess.Popen behaves incorrect when moved in process tree
Changes by Andre Merzky : Removed file: http://bugs.python.org/file40448/subprocess.py.diff ___ Python tracker <http://bugs.python.org/issue24862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24862] subprocess.Popen behaves incorrect when moved in process tree
Andre Merzky added the comment: This is patch is meant to be illustrative rather than functional (but it works in the limited set of cases I tested). -- keywords: +patch Added file: http://bugs.python.org/file40448/subprocess.py.diff ___ Python tracker <http://bugs.python.org/issue24862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24862] subprocess.Popen behaves incorrect when moved in process tree
Andre Merzky added the comment: Yes, I have a workaround (and even a clean solution) in my code. My interest in this ticket is more academic than anything else :) Thanks for the pointer to issue1731717. While I am not sure which 'comment at the end' you exactly refer to, the whole discussion provides some more insight on why SIGCHLD is handled the way it is, so that was interesting. I agree that changing the behavior in a way which is unexpected for existing applications is something one wants to avoid, generally. I can't judge if it is worth to break existing code to get more correctness in a corner case -- depends on how much (and what kind of) code relies on it, which I have no idea about. One option to minimize change and improve correctness might be to keep track of the parent process. So one would keep self.parent=os.getpid() along with self.pid. In the implementation of _internal_poll one can then check if self.parent==os.getpid() still holds, and raise an ECHILD or EINVAL otherwise. That would catch the pickle/unpickle across processes case (I don't know Python well enough to see if there are easier ways to check if a class instance is passed across process boundaries). The above would still not be fully POSIX (it ignores process groups which would allow to wait on non-direct descendants), but going down that route would probably almost result in a reimplementation of what libc does... -- ___ Python tracker <http://bugs.python.org/issue24862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24862] subprocess.Popen behaves incorrect when moved in process tree
Andre Merzky added the comment: Hi again, can I do anything to help moving this forward? Thanks, Andre. -- ___ Python tracker <http://bugs.python.org/issue24862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24862] subprocess.Popen behaves incorrect when moved in process tree
Andre Merzky added the comment: Looking a little further, it seems indeed to be a problem with ignoring SIGCHLD. The behavior has been introduced with [1] at [2] AFAICS, which is a response to issue15756 [3]. IMHO, that issue should have been resolved with raising an exception instead of assuming that the child exited successfully (neither is true in this case, not the 'exited' nor the 'successfully'). [1] https://hg.python.org/cpython/rev/484c50bf445c/ [2] https://github.com/python/cpython/blob/2.7/Lib/subprocess.py#L1370 [3] http://bugs.python.org/issue15756 -- ___ Python tracker <http://bugs.python.org/issue24862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24862] subprocess.Popen behaves incorrect when moved in process tree
Andre Merzky added the comment: As mentioned in the PS, I understand that the approach might be questionable. But (a) the attached test shows the problem also for watcher *processes*, not threads, and (b) an error should be raised in unsupported uses, not a silent, unexpected behavior which mimics success. -- ___ Python tracker <http://bugs.python.org/issue24862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24862] subprocess.Popen behaves incorrect when moved in process tree
New submission from Andre Merzky: - create a class which is a subclass of multiprocessing.Process ('A') - in its __init__ create new thread ('B') and share a queue with it - in A's run() method, run 'C=subprocess.Popen(args="/bin/false")' - push 'C' though the queue to 'B' - call 'C.pull()' --> returns 0 Apart from returning 0, the pull will also return immediately, even if the task is long running. The task does not die -- 'ps' shows it is well alive. I assume that the underlying reason is that 'C' is moved sideways in the process tree, and the wait is happening in a thread which is not the parent of C. I assume (or rather guess, really) that the system level waitpid call raises a 'ECHILD' (see wait(2)), but maybe that is misinterpreted as 'process gone'? I append a test script which shows different combinations of process spawner and watcher classes. All of them should report an exit code of '1' (as all run /bin/false), or should raise an error. None should report an exit code of 0 -- but some do. PS.: I implore you not to argue if the above setup makes sense -- it probably does not. However, it took significant work to condense a real problem into that small excerpt, and it is not a full representation of our application stack. I am not interested in discussing alternative approaches: we have those, and I can live with the error not being fixed. #!/usr/bin/env python from subprocess import Popen from threading import Thread as T from multiprocessing import Process as P import multiprocessing as mp class A(P): def __init__(self): P.__init__(self) self.q = mp.Queue() def b(q): C = q.get() exit_code = C.poll() print "exit code: %s" % exit_code B = T(target = b, args=[self.q]) B.start () def run(self): C = Popen(args = '/bin/false') self.q.put(C) a = A() a.start() a.join() -- components: Library (Lib) files: test_mp.py messages: 248553 nosy: Andre Merzky priority: normal severity: normal status: open title: subprocess.Popen behaves incorrect when moved in process tree type: behavior versions: Python 2.7 Added file: http://bugs.python.org/file40177/test_mp.py ___ Python tracker <http://bugs.python.org/issue24862> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com