Andrew Wong created KUDU-2244:
---------------------------------

             Summary: spinlock contention in raft_consensus
                 Key: KUDU-2244
                 URL: https://issues.apache.org/jira/browse/KUDU-2244
             Project: Kudu
          Issue Type: Bug
          Components: consensus
            Reporter: Andrew Wong


I was going through the logs of a cluster that was seeing a bunch of 
kernel_stack_watchdog traces, and the slowness seemed to be caused by a lot of 
activity in consensus requests. E.g.

W1214 18:57:29.514219 36138 kernel_stack_watchdog.cc:145] Thread 36317 stuck at 
/data/jenkins/workspace/generic-package-centos64-7-0-impala/topdir/BUILD/kudu-1.3.0-cdh5.11.0/src/kudu/rpc/outbound_call.cc:192
 for 123ms:
Kernel stack:
[<ffffffff810b7ef5>] sys_sched_yield+0x65/0xd0
[<ffffffff81645909>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

User stack:
    @     0x7f72fab92057  __GI___sched_yield
    @          0x19498bf  kudu::Thread::StartThread()
    @          0x1952e7d  kudu::ThreadPool::CreateThreadUnlocked()
    @          0x19534d3  kudu::ThreadPool::Submit()
    @          0x1953a27  kudu::ThreadPool::SubmitFunc()
    @          0x1953ecb  kudu::ThreadPool::SubmitClosure()
    @           0x9c94ec  kudu::consensus::RaftConsensus::ElectionCallback()
    @           0x9e6032  kudu::consensus::LeaderElection::CheckForDecision()
    @           0x9e78c3  
kudu::consensus::LeaderElection::VoteResponseRpcCallback()
    @           0xa8b137  kudu::rpc::OutboundCall::CallCallback()
    @           0xa8c2bc  kudu::rpc::OutboundCall::SetResponse()
    @           0xa822c0  kudu::rpc::Connection::HandleCallResponse()
    @           0xa83ffc  ev::base<>::method_thunk<>()
    @          0x198a07f  ev_invoke_pending
    @          0x198af71  ev_run
    @           0xa5e049  kudu::rpc::ReactorThread::RunThread()

So it seemed to be cause by some slowness in getting threads. Upon perusing the 
logs a bit more, there were a sizable number of spinlock profiling traces:

W1214 18:54:27.897955 36379 rpcz_store.cc:238] Trace:
1214 18:54:26.766922 (+     0us) service_pool.cc:143] Inserting onto call queue
1214 18:54:26.771135 (+  4213us) service_pool.cc:202] Handling call
1214 18:54:26.771138 (+     3us) raft_consensus.cc:1126] Updating replica for 0 
ops
1214 18:54:27.897699 (+1126561us) raft_consensus.cc:1165] Early marking 
committed up to index 0
1214 18:54:27.897700 (+     1us) raft_consensus.cc:1170] Triggering prepare for 
0 ops
1214 18:54:27.897701 (+     1us) raft_consensus.cc:1282] Marking committed up 
to 1766
1214 18:54:27.897702 (+     1us) raft_consensus.cc:1332] Filling consensus 
response to leader.
1214 18:54:27.897736 (+    34us) spinlock_profiling.cc:255] Waited 991 ms on 
lock 0x120b3540. stack: 00000000019406c5 00000000009c60d7 00000000009c75f7 
00000000007dc628 0000000000a7adfc 0000000000a7b9cd 000000000194d059 
00007f72fbcc2dc4 00007f72fabad1cc ffffffffffffffff
1214 18:54:27.897737 (+     1us) raft_consensus.cc:1327] UpdateReplicas() 
finished
1214 18:54:27.897741 (+     4us) inbound_call.cc:130] Queueing success response
Metrics: {"spinlock_wait_cycles":2478395136}

Each of the traces noted on the order of 500-1000ms of waiting on spinlocks. 
Upon looking at raft_consensus.cc, it seems we're holding a spinlock 
(update_lock_) while we call RaftConsensus::UpdateReplica(), which according to 
its header, "won't return until all operations have been stored in the log and 
all Prepares() have been completed". While locking may be necessary, it's worth 
considering using a different kind of lock here.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to