Alexey Serbin has submitted this change and it was merged.
Change subject: [consensus_queue] fix race in UpdateLagMetrics()
......................................................................
[consensus_queue] fix race in UpdateLagMetrics()
TSAN reports warnings on races on writing/reading the
QueueState::last_idx_appended_to_leader field:
CatalogManagerTskITest.LeadershipChangeOnTskGeneration: WARNING:
ThreadSanitizer: data race (pid=2710) Write of size 8 at 0x7d500003e480 by
thread T33 (mutexes: write M1863, write M1821):
#0 kudu::consensus::PeerMessageQueue::UpdateLastIndexAppendedToLeader(long)
consensus/consensus_queue.cc:607:44
#1
kudu::consensus::RaftConsensus::UpdateReplica(kudu::consensus::ConsensusRequestPB
const*, kudu::consensus::ConsensusResponsePB*)
consensus/raft_consensus.cc:1155:13
#2
kudu::consensus::RaftConsensus::Update(kudu::consensus::ConsensusRequestPB
const*, kudu::consensus::ConsensusResponsePB*)
consensus/raft_consensus.cc:752:14
#3
kudu::tserver::ConsensusServiceImpl::UpdateConsensus(kudu::consensus::ConsensusRequestPB
const*, kudu::consensus::ConsensusResponsePB*, kudu::rpc::RpcContext*)
tserver/tablet_service.cc:861:25
#4
kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity>
const&, scoped_refptr<kudu::rpc::ResultTracker>
const&)::$_1::operator()(google::protobuf::Message const*,
google::protobuf::Message*, kudu::rpc::RpcContext*) const
consensus/consensus.service.cc:100:13
... skipped ...
Previous read of size 8 at 0x7d500003e480 by thread T79 (mutexes: write
M1822):
#0 kudu::consensus::PeerMessageQueue::UpdateLagMetrics()
consensus/consensus_queue.cc:602:20
#1 kudu::consensus::PeerMessageQueue::UpdateMetrics()
consensus/consensus_queue.cc:875:3
#2
kudu::consensus::PeerMessageQueue::ResponseFromPeer(std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> > const&,
kudu::consensus::ConsensusResponsePB const&, bool*)
consensus/consensus_queue.cc:828:5
... skipped ...
For more details, see
http://dist-test.cloudera.org:8080/diagnose?key=0784c33a-41b5-11e7-9f6b-0242ac11000f
This patch fixes the above race. Also, it contains some extras:
* The PeerMessageQueue::UpdateLagMetrics() method has been renamed
into PeerMessageQueue::UpdateLagMetricsUnlocked() and made private.
* The PeerMessageQueue::UpdateMetrics() method has been renamed
into PeerMessageQueue::UpdateMetricsUnlocked().
* Added DCHECK(queue_lock_.is_locked()) into all
PeerMessageQueue::XxxUnlocked() methods.
Change-Id: I25feb676619cc1f3a94fb8e631bffd8ca02ead49
Reviewed-on: http://gerrit.cloudera.org:8080/7032
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <[email protected]>
---
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
2 files changed, 24 insertions(+), 16 deletions(-)
Approvals:
Todd Lipcon: Looks good to me, approved
Kudu Jenkins: Verified
--
To view, visit http://gerrit.cloudera.org:8080/7032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I25feb676619cc1f3a94fb8e631bffd8ca02ead49
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>