[jira] [Assigned] (KUDU-2727) Contention on the Raft consensus lock can cause tablet service queue overflows

2020-06-18 Thread Alexey Serbin (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin reassigned KUDU-2727:
---

Assignee: Alexey Serbin

> Contention on the Raft consensus lock can cause tablet service queue overflows
> --
>
> Key: KUDU-2727
> URL: https://issues.apache.org/jira/browse/KUDU-2727
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf
>Reporter: William Berkeley
>Assignee: Alexey Serbin
>Priority: Major
>  Labels: performance, scalability
>
> Here's stacks illustrating the phenomenon:
> {noformat}
>   tids=[2201]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb4e68e kudu::consensus::Peer::SignalRequest()
> 0xb9c0df kudu::consensus::PeerManager::SignalRequest()
> 0xb8c178 kudu::consensus::RaftConsensus::Replicate()
> 0xaab816 kudu::tablet::TransactionDriver::Prepare()
> 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[4515]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb74c60 kudu::consensus::RaftConsensus::NotifyCommitIndex()
> 0xb59307 kudu::consensus::PeerMessageQueue::NotifyObserversTask()
> 0xb54058 
> _ZN4kudu8internal7InvokerILi2ENS0_9BindStateINS0_15RunnableAdapterIMNS_9consensus16PeerMessageQueueEFvRKSt8functionIFvPNS4_24PeerMessageQueueObserverEEFvPS5_SC_EFvNS0_17UnretainedWrapperIS5_EEZNS5_34NotifyObserversOfCommitIndexChangeElEUlS8_E_EEESH_E3RunEPNS0_13BindStateBaseE
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[22185,22194,22193,22188,22187,22186]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb8bff8 
> kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
> 0xaaaef9 kudu::tablet::TransactionDriver::ExecuteAsync()
> 0xaa3742 kudu::tablet::TabletReplica::SubmitWrite()
> 0x92812d kudu::tserver::TabletServiceImpl::Write()
>0x1e28f3c kudu::rpc::GeneratedServiceIf::Handle()
>0x1e2986a kudu::rpc::ServicePool::RunThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[22192,22191]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
>0x1e13dec kudu::rpc::ResultTracker::TrackRpc()
>0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle()
>0x1e2986a kudu::rpc::ServicePool::RunThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[4426]
> 0x379ba0f710 
>0x206d3d0 
>0x212fd25 google::protobuf::Message::SpaceUsedLong()
>0x211dee4 
> google::protobuf::internal::GeneratedMessageReflection::SpaceUsedLong()
> 0xb6658e kudu::consensus::LogCache::AppendOperations()
> 0xb5c539 kudu::consensus::PeerMessageQueue::AppendOperations()
> 0xb5c7c7 kudu::consensus::PeerMessageQueue::AppendOperation()
> 0xb7c675 
> kudu::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked()
> 0xb8c147 kudu::consensus::RaftConsensus::Replicate()
> 0xaab816 kudu::tablet::TransactionDriver::Prepare()
> 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
> {noformat}
> {{kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()}} needs to 
> take the lock to check the term and the Raft role. When many RPCs come in for 
> the same tablet, the contention can hog service threads and cause queue 
> overflows on busy systems.
> Yugabyte switched their equivalent lock to be an atomic that allows them to 
> read the term and role wait-free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-2727) Contention on the Raft consensus lock can cause tablet service queue overflows

2020-06-03 Thread Alexey Serbin (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin reassigned KUDU-2727:
---

Assignee: (was: Mike Percy)

> Contention on the Raft consensus lock can cause tablet service queue overflows
> --
>
> Key: KUDU-2727
> URL: https://issues.apache.org/jira/browse/KUDU-2727
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf
>Reporter: William Berkeley
>Priority: Major
>
> Here's stacks illustrating the phenomenon:
> {noformat}
>   tids=[2201]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb4e68e kudu::consensus::Peer::SignalRequest()
> 0xb9c0df kudu::consensus::PeerManager::SignalRequest()
> 0xb8c178 kudu::consensus::RaftConsensus::Replicate()
> 0xaab816 kudu::tablet::TransactionDriver::Prepare()
> 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[4515]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb74c60 kudu::consensus::RaftConsensus::NotifyCommitIndex()
> 0xb59307 kudu::consensus::PeerMessageQueue::NotifyObserversTask()
> 0xb54058 
> _ZN4kudu8internal7InvokerILi2ENS0_9BindStateINS0_15RunnableAdapterIMNS_9consensus16PeerMessageQueueEFvRKSt8functionIFvPNS4_24PeerMessageQueueObserverEEFvPS5_SC_EFvNS0_17UnretainedWrapperIS5_EEZNS5_34NotifyObserversOfCommitIndexChangeElEUlS8_E_EEESH_E3RunEPNS0_13BindStateBaseE
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[22185,22194,22193,22188,22187,22186]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb8bff8 
> kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
> 0xaaaef9 kudu::tablet::TransactionDriver::ExecuteAsync()
> 0xaa3742 kudu::tablet::TabletReplica::SubmitWrite()
> 0x92812d kudu::tserver::TabletServiceImpl::Write()
>0x1e28f3c kudu::rpc::GeneratedServiceIf::Handle()
>0x1e2986a kudu::rpc::ServicePool::RunThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[22192,22191]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
>0x1e13dec kudu::rpc::ResultTracker::TrackRpc()
>0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle()
>0x1e2986a kudu::rpc::ServicePool::RunThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[4426]
> 0x379ba0f710 
>0x206d3d0 
>0x212fd25 google::protobuf::Message::SpaceUsedLong()
>0x211dee4 
> google::protobuf::internal::GeneratedMessageReflection::SpaceUsedLong()
> 0xb6658e kudu::consensus::LogCache::AppendOperations()
> 0xb5c539 kudu::consensus::PeerMessageQueue::AppendOperations()
> 0xb5c7c7 kudu::consensus::PeerMessageQueue::AppendOperation()
> 0xb7c675 
> kudu::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked()
> 0xb8c147 kudu::consensus::RaftConsensus::Replicate()
> 0xaab816 kudu::tablet::TransactionDriver::Prepare()
> 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
> {noformat}
> {{kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()}} needs to 
> take the lock to check the term and the Raft role. When many RPCs come in for 
> the same tablet, the contention can hog service threads and cause queue 
> overflows on busy systems.
> Yugabyte switched their equivalent lock to be an atomic that allows them to 
> read the term and role wait-free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-2727) Contention on the Raft consensus lock can cause tablet service queue overflows

2019-03-26 Thread Mike Percy (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Percy reassigned KUDU-2727:


Assignee: Mike Percy

> Contention on the Raft consensus lock can cause tablet service queue overflows
> --
>
> Key: KUDU-2727
> URL: https://issues.apache.org/jira/browse/KUDU-2727
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Will Berkeley
>Assignee: Mike Percy
>Priority: Major
>
> Here's stacks illustrating the phenomenon:
> {noformat}
>   tids=[2201]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb4e68e kudu::consensus::Peer::SignalRequest()
> 0xb9c0df kudu::consensus::PeerManager::SignalRequest()
> 0xb8c178 kudu::consensus::RaftConsensus::Replicate()
> 0xaab816 kudu::tablet::TransactionDriver::Prepare()
> 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[4515]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb74c60 kudu::consensus::RaftConsensus::NotifyCommitIndex()
> 0xb59307 kudu::consensus::PeerMessageQueue::NotifyObserversTask()
> 0xb54058 
> _ZN4kudu8internal7InvokerILi2ENS0_9BindStateINS0_15RunnableAdapterIMNS_9consensus16PeerMessageQueueEFvRKSt8functionIFvPNS4_24PeerMessageQueueObserverEEFvPS5_SC_EFvNS0_17UnretainedWrapperIS5_EEZNS5_34NotifyObserversOfCommitIndexChangeElEUlS8_E_EEESH_E3RunEPNS0_13BindStateBaseE
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[22185,22194,22193,22188,22187,22186]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb8bff8 
> kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
> 0xaaaef9 kudu::tablet::TransactionDriver::ExecuteAsync()
> 0xaa3742 kudu::tablet::TabletReplica::SubmitWrite()
> 0x92812d kudu::tserver::TabletServiceImpl::Write()
>0x1e28f3c kudu::rpc::GeneratedServiceIf::Handle()
>0x1e2986a kudu::rpc::ServicePool::RunThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[22192,22191]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
>0x1e13dec kudu::rpc::ResultTracker::TrackRpc()
>0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle()
>0x1e2986a kudu::rpc::ServicePool::RunThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[4426]
> 0x379ba0f710 
>0x206d3d0 
>0x212fd25 google::protobuf::Message::SpaceUsedLong()
>0x211dee4 
> google::protobuf::internal::GeneratedMessageReflection::SpaceUsedLong()
> 0xb6658e kudu::consensus::LogCache::AppendOperations()
> 0xb5c539 kudu::consensus::PeerMessageQueue::AppendOperations()
> 0xb5c7c7 kudu::consensus::PeerMessageQueue::AppendOperation()
> 0xb7c675 
> kudu::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked()
> 0xb8c147 kudu::consensus::RaftConsensus::Replicate()
> 0xaab816 kudu::tablet::TransactionDriver::Prepare()
> 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
> {noformat}
> {{kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()}} needs to 
> take the lock to check the term and the Raft role. When many RPCs come in for 
> the same tablet, the contention can hog service threads and cause queue 
> overflows on busy systems.
> Yugabyte switched their equivalent lock to be an atomic that allows them to 
> read the term and role wait-free.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)