GitHub user zwoop opened an issue:
https://github.com/apache/trafficserver/issues/1335
Deadlock in HostDB
We have some 7.0.0 boxes, which ends up completely wedged, where all ET_NET
threads get stuck on the same lock (so, a deadlock):
```
#6 HostDBProcessor::getbyname_imm (this=,
cont=cont@entry=0x2ab037b1d420, process_hostdb_info=,
hostname=, len=, opt=...) at HostDB.cc:816
#6 HostDBProcessor::getbyname_imm (this=,
cont=cont@entry=0x2aabc1e66a00, process_hostdb_info=,
hostname=, len=, opt=...) at HostDB.cc:816
...
```
The trace is always the same in every thread:
```
#0 __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x2d73e5d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x2d73e4a7 in __pthread_mutex_lock (mutex=0x2aaab098a290) at
pthread_mutex_lock.c:61
#3 0x2adca986 in ink_mutex_acquire (m=0x2aaab098a290) at
../../lib/ts/ink_mutex.h:90
#4 Mutex_lock (t=0x2aaab160db40, m=0x2aaab098a280) at
../../iocore/eventsystem/I_Lock.h:410
#5 MutexLock::MutexLock (t=0x2aaab160db40, am=0x2aaab098a280,
this=0x2aaab470a890) at ../../iocore/eventsystem/I_Lock.h:497
#6 HostDBProcessor::getbyname_imm (this=,
cont=cont@entry=0x2aab91432580, process_hostdb_info=,
hostname=, len=, opt=...) at HostDB.cc:816
#7 0x2acae21c in HttpSM::do_hostdb_lookup
(this=this@entry=0x2aab91432580) at HttpSM.cc:4133
#8 0x2acc0093 in HttpSM::set_next_state (this=0x2aab91432580) at
HttpSM.cc:7248
#9 0x2acad47a in HttpSM::call_transact_and_set_next_state
(this=this@entry=0x2aab91432580, f=f@entry=0x0) at HttpSM.cc:7111
#10 0x2acb7baf in HttpSM::handle_api_return (this=0x2aab91432580)
at HttpSM.cc:1604
#11 0x2acba5eb in HttpSM::state_api_callout (this=0x2aab91432580,
event=0, data=0x0) at HttpSM.cc:1542
#12 0x2acbf62b in HttpSM::set_next_state (this=0x2aab91432580) at
HttpSM.cc:7144
#13 0x2acad47a in HttpSM::call_transact_and_set_next_state
(this=this@entry=0x2aab91432580, f=f@entry=0x0) at HttpSM.cc:7111
#14 0x2acb9910 in HttpSM::state_hostdb_lookup (this=0x2aab91432580,
event=500, data=0x2aebe3144800) at HttpSM.cc:2217
#15 0x2acc165d in HttpSM::main_handler (this=0x2aab91432580,
event=500, data=0x2aebe3144800) at HttpSM.cc:2661
#16 0x2adc7f37 in Continuation::handleEvent (data=0x2aebe3144800,
event=500, this=0x2aab91432580) at ../../iocore/eventsystem/I_Continuation.h:153
#17 reply_to_cont (cont=0x2aab91432580, r=0x2aebe3144800, is_srv=) at HostDB.cc:474
#18 0x2adcc79d in HostDBContinuation::dnsEvent (this=, event=, e=) at HostDB.cc:1450
#19 0x2ade3821 in Continuation::handleEvent (data=,
event=600, this=) at
../../iocore/eventsystem/I_Continuation.h:153
#20 DNSEntry::postEvent (this=this@entry=0x2aaab76b4e00) at DNS.cc:1269
#21 0x2ade880b in dns_result (h=h@entry=0x2aaabafc9ec0,
e=e@entry=0x2aaab76b4e00, ent=, ent@entry=0x2aaaee3aa440,
retry=retry@entry=false) at DNS.cc:1221
#22 0x2adeb189 in dns_process (len=,
buf=0x2aaaee3aa440, handler=0x2aaabafc9ec0) at DNS.cc:1587
#23 DNSHandler::recv_dns (this=this@entry=0x2aaabafc9ec0) at DNS.cc:782
#24 0x2adebac9 in DNSHandler::mainEvent (this=0x2aaabafc9ec0,
event=, e=) at DNS.cc:794
#25 0x2af0758e in Continuation::handleEvent (data=0x2aaab1788980,
event=5, this=) at I_Continuation.h:153
#26 EThread::process_event (calling_code=5, e=0x2aaab1788980,
this=0x2aaab160db40) at UnixEThread.cc:143
#27 EThread::execute (this=0x2aaab160db40) at UnixEThread.cc:270
#28 0x2af06136 in spawn_thread_internal (a=0x2aaab09981f0) at
Thread.cc:84
#29 0x2d73caa1 in start_thread (arg=0x2aaab470c700) at
pthread_create.c:301
#30 0x2e5f393d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
```
We're not sure if this relates to HostDB sync or not, but the boxes we
encountered this on, did have syncing on.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---