Re: "Session detached by peer"; extra connections on broker, lock-up on send

Gordon Sim Wed, 18 Dec 2019 01:40:42 -0800

On 18/12/2019 8:45 am, Toralf Lund wrote:

On one of our installations of one of our AMQP 0-10 applications usingthe QPid messaging API (in C++), we keep getting exceptions with themessage
session-busy: Session detached by peer
- usually while waiting for incoming messages, but in some cases alsowhen sending them. After resolving some issues with "dangling" sessionpointers (with help from the list), I thought this wasn't really aproblem, but I just found that the broker has several extra connectionsfor the application; qpid-stat reports the following:
$ qpid-stat -c
Connections
connection cproc cpidmech auth connected idle msgIn msgOut=========================================================================================================================================== qpid.192.168.0.215:5672-192.168.0.61:58532 our-program 134233ANONYMOUS anonymous 4d 19h 42m 25s 4d 18h 37m 44s 47.3k 531k qpid.192.168.0.215:5672-192.168.0.61:58564 our-program 134233ANONYMOUS anonymous 4d 5h 27m 9s 4d 4h 32m 4s 42.3k 488k qpid.192.168.0.215:5672-192.168.0.61:58566 our-program 134233ANONYMOUS anonymous 4d 4h 32m 12s 4d 3h 33m 34s 44.6k 513k qpid.192.168.0.215:5672-192.168.0.61:58580 our-program 134233ANONYMOUS anonymous 1d 17h 37m 49s 1d 16h 41m 54s 40.6k 515k qpid.192.168.0.215:5672-192.168.0.61:58582 our-program 134233ANONYMOUS anonymous 1d 16h 42m 1s 1d 15h 46m 44s 41.3k 497k qpid.192.168.0.215:5672-192.168.0.61:58584 our-program 134233ANONYMOUS anonymous 1d 15h 46m 46s 1d 14h 43m 24s 46.8k 535k qpid.192.168.0.215:5672-192.168.0.61:58618 our-program 134233ANONYMOUS anonymous 22h 55m 50s 21h 48m 4s 44.5k 496k qpid.192.168.0.215:5672-192.168.0.61:58846 our-program 134233ANONYMOUS anonymous 18h 21m 22s 0s 0 0
(Names and addresses have been edited.)
The implied times seem to match instances of the session-busy exception,but a new connection doesn't appear every time. The exception hasoccurred 27 times during the last 4 days, and as you can see, there area not that many connections.

It is possible some connections are actually closed. The session-busycan be a race, e.g. if the client times out a connection before theserver (but the original connection and session is then eventuallyclosed, just too late for the reattach attempt to succeed with the samesession name).

Your heartbeat interval is set quite low. I don't know if that wasdeliberately so, but a slightly less sensitive setting might avoidtiming out in some cases.

That said you still clearly also still have multiple connections open,which I'm assuming is not expected? I.e. you would expect only oneconnection from that process?

I don't see any signs of connection errors or explicit reopens butautomatic reconnect with a limit has been enabled; the following connectoptions are used
   connection.setOption("reconnect", true);
   connection.setOption("reconnect_limit", 0);
(I believe this means trying reconnect once, as "reconnect_limit"specifies retries after an initial attempt.)

I believe a reconnect_limit of 0 will reconnect without any limit. Ifyou want to only reconnect once, you would specify a value of 1. Howeverthat would I believe be reset every time it does successfully reconnect.

An additional, more serious issue is that the system has also locked upa couple of times following an exception during Sender::send(). Slightlysimplified stack trace from one of the cases:


Thread 5 (Thread 0x7fb93bc7c700 (LWP 134270)):
#0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1  0x00007fb93f9c7677 in qpid::sys::Poller::wait(qpid::sys::Duration) ()
    from /usr/lib64/libqpidcommon.so.2
#2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
    from /usr/lib64/libqpidcommon.so.2
#3  0x00007fb93f9b8e4a in ?? ()
    from /usr/lib64/libqpidcommon.so.2
#4  0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6

Thread 4 (Thread 0x7fb93b47b700 (LWP 134271)):
#0  0x00007fb941667cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
    from /usr/lib64/libpthread.so.0
#1  0x00007fb93fa35b35 in qpid::sys::Timer::run() ()
    from /usr/lib64/libqpidcommon.so.2
#2  0x00007fb93f9b8e4a in ?? ()
    from /usr/lib64/libqpidcommon.so.2
#3  0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#4  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6

Thread 3 (Thread 0x7fb93987f700 (LWP 134344)):
[ ... ]


Thread 3's trace didn't make it through. Do you still have that?

Thread 2 (Thread 0x7fb93907e700 (LWP 395509)):
#0  0x00007fb9400bd113 in epoll_wait () from /usr/lib64/libc.so.6
#1  0x00007fb93f9c7677 in qpid::sys::Poller::wait(qpid::sys::Duration) ()
    from /usr/lib64/libqpidcommon.so.2
#2  0x00007fb93f9c9d5f in qpid::sys::Poller::run() ()
    from /usr/lib64/libqpidcommon.so.2
#3  0x00007fb93f9b8e4a in ?? ()
    from /usr/lib64/libqpidcommon.so.2
#4  0x00007fb941663dd5 in start_thread () from /usr/lib64/libpthread.so.0
#5  0x00007fb9400bcb3d in clone () from /usr/lib64/libc.so.6

Thread 1 (Thread 0x7fb942671300 (LWP 134233)):
#0  0x00007fb94039850f in ?? () from /usr/lib64/libgcc_s.so.1
#1  0x00007fb940399f5f in ?? () from /usr/lib64/libgcc_s.so.1
#2  0x00007fb94039a8ca in ?? () from /usr/lib64/libgcc_s.so.1
#3  0x00007fb94039add7 in _Unwind_Resume () from /usr/lib64/libgcc_s.so.1

#4 0x00007fb93fd8b927 inqpid::client::SessionImpl::sendCommand(qpid::framing::AMQBody const&,qpid::framing::MethodContent const*) ()

    from /usr/lib64/libqpidclient.so.2

#5 0x00007fb93fd8b97b inqpid::client::SessionImpl::send(qpid::framing::AMQBody const&) () from/usr/lib64/libqpidclient.so.2

#6  0x00007fb93fd876a6 in qpid::client::SessionBase_0_10::sync() ()
    from /usr/lib64/libqpidclient.so.2
#7  0x00007fb9422403f8 in ?? ()
    from /usr/lib64/libqpidmessaging.so.2
#8  0x00007fb942241e6a in ?? ()
    from /usr/lib64/libqpidmessaging.so.2
#9  0x00000000004f2dc9 in ?? ()

It looks like the send is stuck trying to unwind after an exception(possibly on the checkOpen()), due to not being able to unlock thesendLock semaphore. Is something in thread 3 holding any locks?

The application became responsive again after running a debug session toget the above info. There is a chance that it did another Sender::send()on the same sender (without trying to reset the session or whatever)following the exception.
Any idea what's going on here? Is there anything I can do to avoid thesession-busy errors? Can I prevent extra connections from appearing ifit should occur after all? Also, is there a way to prevent the apparentlock-ups after exception on send()? I think I've assumed that it's"safe" to try other operations as they will just fail directly, unlessthe session is actually able to recover, but perhaps that's a bad idea?
Note that besides what's been mentioned already, the followingconnection options are used:
connection.setOption("tcp_nodelay", true);
connection.setOption("heartbeat", 2);

Thanks,

- Toralf



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: "Session detached by peer"; extra connections on broker, lock-up on send

Reply via email to