Hi, We have implemented an application server using TNonBlockingServer in C++ (thrift version 0.10.0). We experience intermittent application hangup (suspect on high load) where the worker threads no longer accept incoming connections.
Analyzing thread dumps shows that all worker threads are waiting on manager monitor locks. This looks like a deadlock as the application is unable to get out of this state without a restart. We also see exactly one thread waiting on select after handling a task expiry always. Can anybody help understand if this is a bug in task expiry handler? - Thrift version: 0.10.0 - Library: C++ - Thread Model: Non Blocking - Number of worker threads: 8 CPU cores: 4 (Thread dump attached) Regards, Amar Agrawal RevX<http://revx.io/>, VP of Engineering +91 - 9986 303 844 | amar.komli@skype<mailto:amar.komli@skype> Customer Success Stories<http://revx.io/success-stories> | Latest Blogs<http://revx.io/blog> | www.revx.io<http://www.revx.io/> Facebook<https://www.facebook.com/RevXPlatform/> | LinkedIn<https://www.linkedin.com/company/revx-remarketing-platform> | Twitter<https://twitter.com/RevX_Platform> [RevX_Logo]<http://www.revx.io/>
+ xargs -n1 sudo /usr/bin/gdb --batch -ex 'thread apply all bt' -p [New LWP 26266] [New LWP 26265] [New LWP 26264] [New LWP 26263] [New LWP 26262] [New LWP 26261] [New LWP 26260] [New LWP 26259] [New LWP 26258] [New LWP 26257] [New LWP 26256] [New LWP 26255] [New LWP 26254] [New LWP 26253] [New LWP 26252] [New LWP 26251] [New LWP 26250] [New LWP 26249] [New LWP 26248] [New LWP 26247] [New LWP 26170] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 Thread 22 (Thread 0x7ffb4f140700 (LWP 26170)): #0 0x00007ffb505ba65d in nanosleep () from /lib64/libc.so.6 #1 0x00007ffb505ba4f4 in sleep () from /lib64/libc.so.6 #2 0x000000000074455f in cluster_tender_fn (gcc_is_ass=<optimized out>) at cl_cluster.c:1528 #3 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 21 (Thread 0x7ffb4df23700 (LWP 26247)): #0 0x00007ffb5184771d in accept () from /lib64/libpthread.so.0 #1 0x000000000045362f in controller_listener_thread (id=<optimized out>) at src/adserver.c:982 #2 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 20 (Thread 0x7ffb4cc3d700 (LWP 26248)): #0 0x00007ffb505ba65d in nanosleep () from /lib64/libc.so.6 #1 0x00007ffb505ba4f4 in sleep () from /lib64/libc.so.6 #2 0x0000000000452c4f in log_rotator_thread (id=<optimized out>) at src/adserver.c:1832 #3 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 19 (Thread 0x7ffb4d722700 (LWP 26249)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb5220a4d6 in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 18 (Thread 0x7ffb4d621700 (LWP 26250)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb5220a4d6 in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 17 (Thread 0x7ffb4d520700 (LWP 26251)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb5220a4d6 in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 16 (Thread 0x7ffb4d41f700 (LWP 26252)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb5220a4d6 in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x7ffb4d31e700 (LWP 26253)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb5220a4d6 in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7ffb4d21d700 (LWP 26254)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb5220a4d6 in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7ffb4c43c700 (LWP 26255)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb5220a4d6 in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7ffb4c33b700 (LWP 26256)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb5220a4d6 in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7ffb4c23a700 (LWP 26257)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb5220a4d6 in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7ffb4c139700 (LWP 26258)): #0 0x00007ffb505eab53 in select () from /lib64/libc.so.6 #1 0x00007ffb52472b4c in apache::thrift::server::TNonblockingIOThread::notify(apache::thrift::server::TNonblockingServer::TConnection*) () from /usr/lib64/libthriftnb-0.10.0.so #2 0x00007ffb52475647 in apache::thrift::server::TNonblockingServer::expireClose(boost::shared_ptr<apache::thrift::concurrency::Runnable>) () from /usr/lib64/libthriftnb-0.10.0.so #3 0x00007ffb5247697d in std::tr1::_Function_handler<void (boost::shared_ptr<apache::thrift::concurrency::Runnable>), std::tr1::_Bind<std::tr1::_Mem_fn<void (apache::thrift::server::TNonblockingServer::*)(boost::shared_ptr<apache::thrift::concurrency::Runnable>)> (apache::thrift::server::TNonblockingServer*, std::tr1::_Placeholder<1>)> >::_M_invoke(std::tr1::_Any_data const&, boost::shared_ptr<apache::thrift::concurrency::Runnable>) () from /usr/lib64/libthriftnb-0.10.0.so #4 0x00007ffb52208cde in std::tr1::function<void (boost::shared_ptr<apache::thrift::concurrency::Runnable>)>::operator()(boost::shared_ptr<apache::thrift::concurrency::Runnable>) const () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5220a3ec in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #7 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #8 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7ffac7ffd700 (LWP 26259)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb5220a4d6 in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7ffac7efc700 (LWP 26260)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb5220a4d6 in apache::thrift::concurrency::ThreadManager::Worker::run() () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb5223f3b1 in apache::thrift::concurrency::PthreadThread::threadMain(void*) () from /usr/lib64/libthrift-0.10.0.so #6 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7ffac7dfb700 (LWP 26261)): #0 0x00007ffb518446d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000000074a369 in cf_queue_pop (q=0x7ffaac00fdc0, buf=0x7ffac7dfac10, ms_wait=<optimized out>) at cf_queue.c:323 #2 0x000000000074677a in batch_worker_fn (dummy=<optimized out>) at cl_batch.c:554 #3 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7ffac75fa700 (LWP 26262)): #0 0x00007ffb518446d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000000074a369 in cf_queue_pop (q=0x7ffaac00fdc0, buf=0x7ffac75f9c10, ms_wait=<optimized out>) at cf_queue.c:323 #2 0x000000000074677a in batch_worker_fn (dummy=<optimized out>) at cl_batch.c:554 #3 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7ffac6df9700 (LWP 26263)): #0 0x00007ffb518446d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000000074a369 in cf_queue_pop (q=0x7ffaac00fdc0, buf=0x7ffac6df8c10, ms_wait=<optimized out>) at cf_queue.c:323 #2 0x000000000074677a in batch_worker_fn (dummy=<optimized out>) at cl_batch.c:554 #3 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7ffac65f8700 (LWP 26264)): #0 0x00007ffb518446d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000000074a369 in cf_queue_pop (q=0x7ffaac00fdc0, buf=0x7ffac65f7c10, ms_wait=<optimized out>) at cf_queue.c:323 #2 0x000000000074677a in batch_worker_fn (dummy=<optimized out>) at cl_batch.c:554 #3 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7ffac5df7700 (LWP 26265)): #0 0x00007ffb518446d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000000074a369 in cf_queue_pop (q=0x7ffaac00fdc0, buf=0x7ffac5df6c10, ms_wait=<optimized out>) at cf_queue.c:323 #2 0x000000000074677a in batch_worker_fn (dummy=<optimized out>) at cl_batch.c:554 #3 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7ffac55f6700 (LWP 26266)): #0 0x00007ffb518446d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000000074a369 in cf_queue_pop (q=0x7ffaac00fdc0, buf=0x7ffac55f5c10, ms_wait=<optimized out>) at cf_queue.c:323 #2 0x000000000074677a in batch_worker_fn (dummy=<optimized out>) at cl_batch.c:554 #3 0x00007ffb51840dc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007ffb505f36ed in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7ffb533309c0 (LWP 26169)): #0 0x00007ffb518471bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007ffb51842d02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ffb51842c08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007ffb5223e11e in apache::thrift::concurrency::Mutex::lock() const () from /usr/lib64/libthrift-0.10.0.so #4 0x00007ffb52206ed6 in apache::thrift::concurrency::ThreadManager::Impl::add(boost::shared_ptr<apache::thrift::concurrency::Runnable>, long, long) () from /usr/lib64/libthrift-0.10.0.so #5 0x00007ffb52474879 in apache::thrift::server::TNonblockingServer::TConnection::transition() () from /usr/lib64/libthriftnb-0.10.0.so #6 0x00007ffb524750b0 in apache::thrift::server::TNonblockingServer::TConnection::workSocket() () from /usr/lib64/libthriftnb-0.10.0.so #7 0x00007ffb513ec8c4 in event_base_loop () from /usr/lib64/libevent-2.0.so.5 #8 0x00007ffb52472d7c in apache::thrift::server::TNonblockingIOThread::run() () from /usr/lib64/libthriftnb-0.10.0.so #9 0x00007ffb524735ad in apache::thrift::server::TNonblockingServer::serve() () from /usr/lib64/libthriftnb-0.10.0.so #10 0x0000000000419bf0 in main () at src/processor_main.cpp:75
