On 18/12/2019 12:35, Gordon Sim wrote:
On 18/12/2019 10:42 am, Toralf Lund wrote:
On 18/12/2019 10:40, Gordon Sim wrote:
On 18/12/2019 8:45 am, Toralf Lund wrote:
An additional, more serious issue is that the system has also locked up a couple of times following an exception during Sender::send(). Slightly simplified stack trace from one of the cases:

Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
#0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1  0x00007fb93f9c7677 in qpid::sys::Poller::wait(qpid::sys::Duration) ()
    from /usr/lib64/libqpidcommon.so.2
#2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
    from /usr/lib64/libqpidcommon.so.2
#3  0x00007fb93f9b8e4a in ?? ()
    from /usr/lib64/libqpidcommon.so.2
#4  0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6

Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
#0  0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
    from /usr/lib64/libpthread.so.0
#1  0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
    from /usr/lib64/libqpidcommon.so.2
#2  0x00007fb93f9b8e4a in ?? ()
    from /usr/lib64/libqpidcommon.so.2
#3  0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#4  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6

Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
[ ... ]

Thread 3's trace didn't make it through. Do you still have that?
Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
#0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1  0x00007fb93f9c7677 in qpid::sys::Poller::wait(qpid::sys::Duration) ()
    from /usr/lib64/libqpidcommon.so.2
#2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
    from /usr/lib64/libqpidcommon.so.2
#3  0x00007fb93f9b8e4a in ?? ()
    from /usr/lib64/libqpidcommon.so.2
#4  0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6

Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
#0  0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
#1  0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
#2  0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
#3  0x00007fb94039add7 in _Unwind_Resume () from /usr/lib64/libgcc_s.so.1 #4  0x00007fb93fd8b927 in qpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody const&, qpid::framing::MethodContent const*) ()
    from /usr/lib64/libqpidclient.so.2
#5  0x00007fb93fd8b97b in qpid::client::SessionImpl::send(qpid::framing::AMQBody const&) () from /usr/lib64/libqpidclient.so.2
#6  0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
    from /usr/lib64/libqpidclient.so.2
#7  0x00007fb9422403f8 in ?? ()
    from /usr/lib64/libqpidmessaging.so.2
#8  0x00007fb942241e6a in ?? ()
    from /usr/lib64/libqpidmessaging.so.2
#9  0x00000000004f2dc9 in ?? ()

It looks like the send is stuck trying to unwind after an exception (possibly on the checkOpen()), due to not being able to unlock the sendLock semaphore. Is something in thread 3 holding any locks?

It may be holding locks, but not anything related to QPid. Specifically, it's probably waiting for a condition via pthread_cond_wait(), which implies that it's also holding a mutex associated with the condition variable.

Ok, thread 1 may of course not be stuck at all, and it was just captured at that point. Getting pstack output separated by a few seconds would shed more light.

Does the frequency of the 'locking up' match the frequency of the session-busy exceptions?

I get the exact behaviour I'm seeing now only in a version that was deployed about a week ago, and since then, between 60 and 70 session-busy exceptions have been registered. Most of these came from Session::nextReceiver() or Receiver::fetch(), but exactly 2 were raised by Sender::send(). In both these cases, the application locked up more or less directly - the session error was at least the last item logged when the process was in the locked state. It has not got stuck in any other situations.

Due to the way exception handling is implemented, there is a chance that there is another attempt at sending data after an exception from Sender::send(), without resetting the session or anything. On the other hand, a receive/nextReceiver() exception is in practice always followed directly by

  if(session.isValid() && session.hasError()) {
    Util::debug(1, "Reset AMQP session with error...");
    session.close();
    session=qpid::messaging::Session();
  }

The same code is also executed eventually in the send case, but there may be other intervening Sender::send() calls, and also operations like Session::getSender() or even (at least in theory) Session::createSender().

- Toralf




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org

Reply via email to