Alexey Serbin has submitted this change and it was merged.

Change subject: [consensus_queue] fix race in UpdateLagMetrics()
......................................................................


[consensus_queue] fix race in UpdateLagMetrics()

TSAN reports warnings on races on writing/reading the
QueueState::last_idx_appended_to_leader field:

CatalogManagerTskITest.LeadershipChangeOnTskGeneration: WARNING: 
ThreadSanitizer: data race (pid=2710)  Write of size 8 at 0x7d500003e480 by 
thread T33 (mutexes: write M1863, write M1821):
    #0 kudu::consensus::PeerMessageQueue::UpdateLastIndexAppendedToLeader(long) 
consensus/consensus_queue.cc:607:44
    #1 
kudu::consensus::RaftConsensus::UpdateReplica(kudu::consensus::ConsensusRequestPB
 const*, kudu::consensus::ConsensusResponsePB*) 
consensus/raft_consensus.cc:1155:13
    #2 
kudu::consensus::RaftConsensus::Update(kudu::consensus::ConsensusRequestPB 
const*, kudu::consensus::ConsensusResponsePB*) 
consensus/raft_consensus.cc:752:14
    #3 
kudu::tserver::ConsensusServiceImpl::UpdateConsensus(kudu::consensus::ConsensusRequestPB
 const*, kudu::consensus::ConsensusResponsePB*, kudu::rpc::RpcContext*) 
tserver/tablet_service.cc:861:25
    #4 
kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity>
 const&, scoped_refptr<kudu::rpc::ResultTracker> 
const&)::$_1::operator()(google::protobuf::Message const*, 
google::protobuf::Message*, kudu::rpc::RpcContext*) const 
consensus/consensus.service.cc:100:13
    ... skipped ...

  Previous read of size 8 at 0x7d500003e480 by thread T79 (mutexes: write 
M1822):
    #0 kudu::consensus::PeerMessageQueue::UpdateLagMetrics() 
consensus/consensus_queue.cc:602:20
    #1 kudu::consensus::PeerMessageQueue::UpdateMetrics() 
consensus/consensus_queue.cc:875:3
    #2 
kudu::consensus::PeerMessageQueue::ResponseFromPeer(std::__1::basic_string<char,
 std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
kudu::consensus::ConsensusResponsePB const&, bool*) 
consensus/consensus_queue.cc:828:5
    ... skipped ...

For more details, see
  
http://dist-test.cloudera.org:8080/diagnose?key=0784c33a-41b5-11e7-9f6b-0242ac11000f

This patch fixes the above race.  Also, it contains some extras:
  * The PeerMessageQueue::UpdateLagMetrics() method has been renamed
    into PeerMessageQueue::UpdateLagMetricsUnlocked() and made private.
  * The PeerMessageQueue::UpdateMetrics() method has been renamed
    into PeerMessageQueue::UpdateMetricsUnlocked().
  * Added DCHECK(queue_lock_.is_locked()) into all
    PeerMessageQueue::XxxUnlocked() methods.

Change-Id: I25feb676619cc1f3a94fb8e631bffd8ca02ead49
Reviewed-on: http://gerrit.cloudera.org:8080/7032
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <[email protected]>
---
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
2 files changed, 24 insertions(+), 16 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I25feb676619cc1f3a94fb8e631bffd8ca02ead49
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to