[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120575#comment-16120575 ] ASF subversion and git services commented on QPID-7317: --- Commit 7c968c8318f4c4a70fbe0ebbcdbe0a09d8cfbb3e in qpid-python's branch refs/heads/master from [~aconway] [ https://git-wip-us.apache.org/repos/asf?p=qpid-python.git;h=7c968c8 ] QUID-7884: Python client should not raise on close() after stop. The python client throws exceptions out of AMQP object methods (Connection, Session and Link objects) if the selector has been stopped, to prevent hanging (see QPID-7317 Deadlock on publish) However to be robust to shut-down order, the close() method should not throw an exception in this case, but should be a no-op. > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse >Assignee: Alan Conway > Fix For: qpid-python-1.36.0 > > Attachments: bad_child.py, bad_child.py, bt.txt, lsof.txt, > pystack.17806, spout-hang.py, spout-hang-trace.txt, taabt.txt, worker-stacks > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861827#comment-15861827 ] Alan Conway commented on QPID-7317: --- I hope this will address the pulp hang in the wild, I have been unable to reproduce it with the fix. Note you can apply this patch by replacing /usr/lib/python2.7/site-packages/qpid/selector.py with the patched file, it is the only file modified and should work with and version of python-qpid released in the last year. If you do see this hang again please report to this JIRA with the output of the following commands from the machine where the hung celery workers are: {code} > rpm -q python-qpid # or attach a copy of > /usr/lib/python2.7/site-packages/qpid/selector.py > journalctl # use --since and --until to get a few minutes > before/after the hang > yum install -y gdb python-debug # needed for worker-stacks script > worker-stacks # script attached to this JIRA {code} Here is log output showing that pulp does indeed use qpid.messaging in an illegal state that could have caused a hang prior to this fix. However it is not an exact match for reported stack traces so I'm not yet 100% sure the problem is solved. I am not able to reproduce the original hang or traces that look like it with the fix. {code} Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) illegal use of qpid.messaging at: Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) self.__bootstrap_inner() Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) self.run() Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 55, in run Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) self.monitor_events() Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 82, in monitor_events Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) recv.capture(limit=None, timeout=None, wakeup=True) Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 715, in __exit__ Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) self.release() Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 330, in release Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) self._close() Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 298, in _close Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) self._do_close_self() Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 288, in _do_close_self Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) self.maybe_close_channel(self._default_channel) Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 280, in maybe_close_channel Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) channel.close() Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 983, in close Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) self._broker.close() Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib/python2.7/site-packages/qpidtoollibs/broker.py", line 48, in close Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) self.sess.close() Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 213, in log_raise Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) _check(exception, 1) Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) qpid.messaging thread has been stopped Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) qpid.messaging was previously stopped at: Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861790#comment-15861790 ] ASF subversion and git services commented on QPID-7317: --- Commit fda9594010b13d99134c10cff54b0ba9d82c0c27 in qpid-python's branch refs/heads/master from [~aconway] [ https://git-wip-us.apache.org/repos/asf?p=qpid-python.git;h=fda9594 ] QPID-7317: More robust qpid.selector with better logging This commit disables the selector and related qpid.messaging objects when the selector thread exits for any reason: process exit, fork, exception etc. Any subsequent use will throw an exception and log the locations of the failed call and where the selector thread was stopped. This should be slightly more predictable & robust than commit 037c573 which tried to keep the selector alive in a daemon thread. I have not been able to hang the pulp_smash test suite with this patch. The new logging shows that celery workers do sometimes use qpid.messaging in an illegal state, which could cause the reported hang. So far I have not seen a stack trace that is an exact match for reported stacks. If this patch does not address the pulp problem it should at least provide much better debugging information in journalctl log output after the hang. > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse >Assignee: Alan Conway > Fix For: qpid-python-1.36.0 > > Attachments: bad_child.py, bad_child.py, bt.txt, lsof.txt, > pystack.17806, spout-hang.py, spout-hang-trace.txt, taabt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826589#comment-15826589 ] Alan Conway commented on QPID-7317: --- I have seen a similar hang. I'm starting to think that the missing thread is a consequence of the hang, not a cause - which would explain why our attempts to prevent/log that thread's death doesn't solve the problem or provide any new log data. > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse >Assignee: Alan Conway > Fix For: qpid-python-1.36.0 > > Attachments: bad_child.py, bad_child.py, bt.txt, lsof.txt, > pystack.17806, spout-hang.py, spout-hang-trace.txt, taabt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782996#comment-15782996 ] Brian Bouterse commented on QPID-7317: -- We haven't been able to reproduce this issue reliably (such as a unit or integration test). Today I read a report from a user which suggests their environment was deadlocked. Also, the Jenkins environment Pulp uses (which uses the Qpid Python client) to run integration tests deadlocks roughly 5% of the time. With those reports in mind, I believe this issue is not resolved. > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse >Assignee: Alan Conway > Fix For: qpid-python-1.36.0 > > Attachments: bad_child.py, bad_child.py, bt.txt, lsof.txt, > spout-hang-trace.txt, spout-hang.py, taabt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15775154#comment-15775154 ] Robbie Gemmell commented on QPID-7317: -- The commit details show [~k-wall] was simply removing an exclude entry used when running the python test suite against the Java broker to verify if QPID-6122 was resolved by the changes made previously on this JIRA, with the test previously excluded via QPID-6122, which is the main JIRA referenced. The change had no relation to the Java client, and does not suggest this issue is fully resolved, just that the other one may have been. > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse >Assignee: Alan Conway > Fix For: qpid-python-1.36.0 > > Attachments: bad_child.py, bad_child.py, bt.txt, lsof.txt, > spout-hang-trace.txt, spout-hang.py, taabt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15775097#comment-15775097 ] Brian Bouterse commented on QPID-7317: -- This issue is about the Python client, not the Java client. Perhaps commit 1775853 resolves a similar issue in the Java client, but since that is a separate codebase it couldn't resolve this issue. > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse >Assignee: Alan Conway > Fix For: qpid-python-1.36.0 > > Attachments: bad_child.py, bad_child.py, bt.txt, lsof.txt, > spout-hang-trace.txt, spout-hang.py, taabt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773258#comment-15773258 ] ASF subversion and git services commented on QPID-7317: --- Commit 1775853 from [~k-wall] in branch 'java/trunk' [ https://svn.apache.org/r1775853 ] QPID-6122: [Python Test Suite] Remove test exclusion qpid.tests.messaging.endpoints.TimeoutTests from Java Broker 0-10 Python excludes file >From this description of QPID-7317, it seems possible this is fixed. > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse >Assignee: Alan Conway > Fix For: qpid-python-1.36.0 > > Attachments: bad_child.py, bad_child.py, bt.txt, lsof.txt, > spout-hang-trace.txt, spout-hang.py, taabt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517632#comment-15517632 ] Alan Conway commented on QPID-7317: --- Not marking resolved yet. The fix above has not been proven to resolve the issues in the field, but it definitely fixes several ways that an identical-looking hang can be created in the lab. Will update this when I get confirmation or denial that this resolves the real problem. > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse >Assignee: Alan Conway > Attachments: bad_child.py, bad_child.py, bt.txt, lsof.txt, > spout-hang-trace.txt, spout-hang.py, taabt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517624#comment-15517624 ] ASF subversion and git services commented on QPID-7317: --- Commit 037c5738734d8fecb7b7f7e7af4e4f14f9cd3a64 in qpid-python's branch refs/heads/master from [~aconway] [ https://git-wip-us.apache.org/repos/asf?p=qpid-python.git;h=037c573 ] QPID-7317: Fix hangs in qpid.messaging. Hang is observed in processes using qpid.messaging with a thread blocked waiting for the Selector to wake it, but no Selector.run thread. This patch removes all the known ways that this hang can occur. Either we function normally or immediately raise an exception and log to the "qpid.messaging" logger a message starting with "qpid.messaging:" The following issues are fixed: 1. The Selector.run() thread raises a fatal exception. Use of qpid.messaging will re-raise the exception immediately, not hang. 2. The process forks, so child has no Selector thread. https://issues.apache.org/jira/browse/QPID-5637 resets the Selector after a fork. In addition we now: - Close Selector.waiter: its file descriptors are shared with the parent which can cause havoc if they "steal" each other's wakeups. - Replace Endpoint._lock in related endpoints with a BrokenLock. If the parent is holding locks when it forks, they remain locked forever in the child. BrokenLock.acquire() raises instead of hanging. 3. Selector.stop() called on atexit. Selector.stop was registered via atexit, which could cause a hang if qpid.messaging was used in a later-executing atexit function. That has been removed, Selector.run() is in a daemon thread so there is no need for stop() 4. User calls Selector.stop() directly There is no reason to do this for the default Selector used by qpid.messaging, so for that case stop() is now ignored. It works as before for code that creates its own qpid.Selector instances. > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse >Assignee: Alan Conway > Attachments: bad_child.py, bad_child.py, bt.txt, lsof.txt, > spout-hang-trace.txt, spout-hang.py, taabt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426432#comment-15426432 ] Brian Bouterse commented on QPID-7317: -- These issues are observed in the Pulp [0] codebase. The pulp codebase does very little directly with qpid.messaging so I expect you'll not find an issue there. Pulp does rely heavily on celery [0] which is what does the forking using a dependency it maintains called billiard [2]. Billiard itself is a fork of the Python multiprocessing library. Celery uses Kombu [3] which has a plugin for Qpid [4] which I maintain. Note the master branches on [1] [2][3] are not the versions we use and they may be significantly different. Browsing on the right branch is important. A typical recent install gets these versions: python-kombu-3.0.33-5.fc24.noarch python2-celery-3.1.20-2.fc24.noarch python-billiard-3.3.0.22-2.fc24.x86_64 [0]: https://github.com/pulp/pulp [1]: https://github.com/celery/celery/tree/3.1 [2]: https://github.com/celery/billiard [3]: https://github.com/celery/kombu/tree/3.0/kombu [4]: https://github.com/celery/kombu/blob/3.0/kombu/transport/qpid.py > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse >Assignee: Alan Conway > Attachments: bt.txt, lsof.txt, spout-hang-trace.txt, spout-hang.py, > taabt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375584#comment-15375584 ] Dennis Kliban commented on QPID-7317: - I experienced this issue two times in the last 24 hours. I am attaching a core dump from my process. > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse > Attachments: bt.txt, lsof.txt, taabt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355169#comment-15355169 ] Brian Bouterse commented on QPID-7317: -- One other point of information. I believe my machine resumed from sleep just before I attempted to do this publish. The order of events would have been these: 1. create my vagrant VM and start the processes which use qpid.messaging 2. put my host (and thus the VM) to sleep 3. resume my host (and the VM) 4. ssh back to the VM 5. run the publish (no process restart) 6. observe the deadlock. Note that there are two other processes which did publish messages correctly, but the third process publish deadlocked. Each message is published by a different process. My point is that not all processes were affected, just that 1 process. I don't know why. > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse > Attachments: bt.txt, lsof.txt, taabt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346797#comment-15346797 ] Michael Hrivnak commented on QPID-7317: --- The process in question did not produce any log statements from or related to qpid. This is what I see from strace: # strace -p 21739 Process 21739 attached restart_syscall(<... resuming interrupted call ...>) = 0 poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout) poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout) poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout) poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout) poll([{fd=19, events=POLLIN}], 1, 3000) = 0 (Timeout) Note that for a healthy child worker process, I do not see that polling happen. I think thread 3, the one the attached backtrace is from, is stuck in some kind of loop polling for some condition on FD 19 every 3 seconds that is never going to happen. FD 19 appears to be a FIFO pipe. I will attach lsof output separately. # ls -l /proc/21739/fd/19 lr-x--. 1 apache apache 64 Jun 21 13:45 /proc/21739/fd/19 -> pipe:[152836] > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse > Attachments: bt.txt, lsof.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346775#comment-15346775 ] Michael Hrivnak commented on QPID-7317: --- List of installed packages that go with that backtrace: # rpm -qa| grep qpid qpid-cpp-client-devel-0.30-11.el7sat.x86_64 qpid-java-client-0.30-3.el7.noarch python-qpid-qmf-0.30-5.el7.x86_64 sat6-atom.refarch.bos.redhat.com-qpid-router-server-1.0-1.noarch sat6-atom.refarch.bos.redhat.com-qpid-broker-1.0-1.noarch qpid-dispatch-router-0.4-11.el7.x86_64 qpid-cpp-server-linearstore-0.30-11.el7sat.x86_64 qpid-cpp-server-0.30-11.el7sat.x86_64 libqpid-dispatch-0.4-11.el7.x86_64 python-gofer-qpid-2.7.6-1.el7sat.noarch qpid-tools-0.30-4.el7.noarch sat6-atom.refarch.bos.redhat.com-qpid-client-cert-1.0-1.noarch sat6-atom.refarch.bos.redhat.com-qpid-router-client-1.0-1.noarch qpid-proton-c-0.9-16.el7.x86_64 qpid-java-common-0.30-3.el7.noarch qpid-qmf-0.30-5.el7.x86_64 qpid-cpp-client-0.30-11.el7sat.x86_64 python-qpid-0.30-9.el7sat.noarch tfm-rubygem-qpid_messaging-0.30.0-7.el7sat.x86_64 > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse > Attachments: bt.txt > > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-7317) Deadlock on publish
[ https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346542#comment-15346542 ] Brian Bouterse commented on QPID-7317: -- There is no logging that the background thread is dying. On the EL6 machine the following turned up nothing: `grep -r Fatal /var/log/` Also on F23 (journalctl) the following turned up nothing:journalctl --no-pager -l | grep Fatal We expect 'Fatal' to be in the logged message because both implementations were verified to contain the logging with this commit [0]. The line we expect to log is here [1]. Per the commit message in [0], I think the thread that is dying is called the Selector thread. I also grepped for `grep -r "thread has" journalctl_output.txt` thinking it could be logged by this line [2]. [0]: https://github.com/apache/qpid/commit/11368ef1a01233f253eb9eadbadaa9cb9b8465f3 [1]: https://github.com/apache/qpid/blob/trunk/qpid/python/qpid/messaging/driver.py#L420 [2]: https://github.com/apache/qpid/commit/11368ef1a01233f253eb9eadbadaa9cb9b8465f3#diff-a2870153748f29e8583ccdbe0c527e8dR157 > Deadlock on publish > --- > > Key: QPID-7317 > URL: https://issues.apache.org/jira/browse/QPID-7317 > Project: Qpid > Issue Type: Bug > Components: Python Client >Affects Versions: 0.32 > Environment: python-qpid-0.32-13.fc23.noarch >Reporter: Brian Bouterse > > When publishing a task with qpid.messaging it deadlocks and our application > cannot continue. This has not been a problem for several releases, but within > a few days recently, another Satellite developer and I both experienced the > issue on separate machines, different distros. He is using a MRG built > pacakge (not sure of version). I am using python-qpid-0.32-13.fc23. > Both deadlocked machines had core dumps taken on the deadlocked processes and > only show only 1 Qpid thread when I expect there to be 2. There are other > mongo threads, but those are idle as expected and not related. The traces > show our application calling into qpid.messaging to publish a message to the > message bus. > This problem happens intermittently, and in cases where message publish is > successful I've verified by core dump that there are the expected 2 threads > for Qpid. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org