Hi,
I am trying to run QPID on solaris using sun studio and have managed to get
the broker to compile with a few minor fixes. Unfortunately many unit tests
are blocking.
The issue is a deadlock when SessionFixture is created. On the main thread,
the thread is blocked on a DispatchHandler during the call to newSession
=>[5] qpid::sys::Mutex::lock(this = <value unavailable>) (optimized), at
0xfffffd7ffdb81a0e (line ~116) in "Mutex.h"
[6] qpid::sys::ScopedLock<qpid::sys::Mutex>::ScopedLock(this = <value
unavailable>, l = CLASS) (optimized), at 0xfffffd7ffdb819df (line ~33) in
"Mutex.h"
[7] qpid::sys::DispatchHandle::rewatchWrite(this = 0xb63558) (optimized),
at 0xfffffd7ffdbf4cc0 (line ~109) in "DispatchHandle.cpp"
[8] qpid::sys::posix::AsynchIO::notifyPendingWrite(this = <value
unavailable>) (optimized), at 0xfffffd7ffdb62824 (line ~389) in
"AsynchIO.cpp"
[9] qpid::client::TCPConnector::handle(this = 0xb60fe0, frame = CLASS)
(optimized), at 0xfffffd7ffdf6dc1d (line ~209) in "TCPConnector.cpp"
[... shortened output]
[22] qpid::client::Connection::newSession(this = <value unavailable>,
name = CLASS, timeout = 0) (optimized), at 0xfffffd7ffdf05b15 (line ~141)
in "Connection.cpp"
[23]
qpid::tests::SessionFixtureT<qpid::tests::LocalConnection,qpid::client::Session_0_10>::SessionFixtureT(this
= 0xfffffd7fffdfe3d0, opts = STRUCT) (optimized), at 0x5d95b5 (line ~141)
in "BrokerFixture.h"
The lock is also held by one of two Poller threads which is waiting on poll
=>[4] qpid::sys::PollerPrivate::EventStream::getEvent(this = 0xb60ee8,
targetTimeout = CLASS) (optimized), at 0xfffffd7ffdb875cf (line ~466) in
"PosixPoller.cpp"
[5] qpid::sys::PollerPrivate::EventStream::next(this = 0xb60ee8, timeout
= CLASS) (optimized), at 0xfffffd7ffdb86127 (line ~354) in "PosixPoller.cpp"
[6] qpid::sys::Poller::wait(this = 0xb467f0, timeout = CLASS)
(optimized), at 0xfffffd7ffdb847c6 (line ~729) in "PosixPoller.cpp"
[7] qpid::sys::Poller::run(this = 0xb467f0) (optimized), at
0xfffffd7ffdb84540 (line ~690) in "PosixPoller.cpp"
I do not understand how the same lock can be held simultaneously on both
threads but the deadlock is due to the fact that the call to poll will
never wake. I have noticied a suspicious comment on the main thread which
may be linked to this behavior. In TCPConnector::handle, there is the
following comment before the blocking call to AsynchIO.
/*
NOTE: Moving the following line into this mutex block
is a workaround for BZ 570168, in which the test
testConcurrentSenders causes a hang about 1.5%
of the time. ( To see the hang much more frequently
leave this line out of the mutex block, and put a
small usleep just before it.)
TODO mgoulish - fix the underlying cause and then
move this call back outside the mutex.
*/
if (notifyWrite && !closed) aio->notifyPendingWrite();
Do you have any hints what the underlying issue could be ?
Thanks,
Alexandre Trufanow
www.murex.com