On 18/12/2019 12:35, Gordon Sim wrote:
On 18/12/2019 10:42 am, Toralf Lund wrote:
On 18/12/2019 10:40, Gordon Sim wrote:
On 18/12/2019 8:45 am, Toralf Lund wrote:
An additional, more serious issue is that the system has also
locked up a couple of times following an exception during
Sender::send(). Slightly simplified stack trace from one of the cases:
Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
#0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1 0x00007fb93f9c7677 in
qpid::sys::Poller::wait(qpid::sys::Duration) ()
from /usr/lib64/libqpidcommon.so.2
#2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
from /usr/lib64/libqpidcommon.so.2
#3 0x00007fb93f9b8e4a in ?? ()
from /usr/lib64/libqpidcommon.so.2
#4 0x00007fb941663dd5 in start_thread () from
/usr/lib64/libpthread.so.0
#5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
#0 0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /usr/lib64/libpthread.so.0
#1 0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
from /usr/lib64/libqpidcommon.so.2
#2 0x00007fb93f9b8e4a in ?? ()
from /usr/lib64/libqpidcommon.so.2
#3 0x00007fb941663dd5 in start_thread () from
/usr/lib64/libpthread.so.0
#4 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
[ ... ]
Thread 3's trace didn't make it through. Do you still have that?
Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
#0 0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1 0x00007fb93f9c7677 in
qpid::sys::Poller::wait(qpid::sys::Duration) ()
from /usr/lib64/libqpidcommon.so.2
#2 0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
from /usr/lib64/libqpidcommon.so.2
#3 0x00007fb93f9b8e4a in ?? ()
from /usr/lib64/libqpidcommon.so.2
#4 0x00007fb941663dd5 in start_thread () from
/usr/lib64/libpthread.so.0
#5 0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6
Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
#0 0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
#1 0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
#2 0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
#3 0x00007fb94039add7 in _Unwind_Resume () from
/usr/lib64/libgcc_s.so.1
#4 0x00007fb93fd8b927 in
qpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody
const&, qpid::framing::MethodContent const*) ()
from /usr/lib64/libqpidclient.so.2
#5 0x00007fb93fd8b97b in
qpid::client::SessionImpl::send(qpid::framing::AMQBody const&) ()
from /usr/lib64/libqpidclient.so.2
#6 0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
from /usr/lib64/libqpidclient.so.2
#7 0x00007fb9422403f8 in ?? ()
from /usr/lib64/libqpidmessaging.so.2
#8 0x00007fb942241e6a in ?? ()
from /usr/lib64/libqpidmessaging.so.2
#9 0x00000000004f2dc9 in ?? ()
It looks like the send is stuck trying to unwind after an exception
(possibly on the checkOpen()), due to not being able to unlock the
sendLock semaphore. Is something in thread 3 holding any locks?
It may be holding locks, but not anything related to QPid.
Specifically, it's probably waiting for a condition via
pthread_cond_wait(), which implies that it's also holding a mutex
associated with the condition variable.
Ok, thread 1 may of course not be stuck at all, and it was just
captured at that point. Getting pstack output separated by a few
seconds would shed more light.
Does the frequency of the 'locking up' match the frequency of the
session-busy exceptions?
I get the exact behaviour I'm seeing now only in a version that was
deployed about a week ago, and since then, between 60 and 70
session-busy exceptions have been registered. Most of these came from
Session::nextReceiver() or Receiver::fetch(), but exactly 2 were raised
by Sender::send(). In both these cases, the application locked up more
or less directly - the session error was at least the last item logged
when the process was in the locked state. It has not got stuck in any
other situations.
Due to the way exception handling is implemented, there is a chance that
there is another attempt at sending data after an exception from
Sender::send(), without resetting the session or anything. On the other
hand, a receive/nextReceiver() exception is in practice always followed
directly by
if(session.isValid() && session.hasError()) {
Util::debug(1, "Reset AMQP session with error...");
session.close();
session=qpid::messaging::Session();
}
The same code is also executed eventually in the send case, but there
may be other intervening Sender::send() calls, and also operations like
Session::getSender() or even (at least in theory) Session::createSender().
- Toralf
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org