Hello Alexey Serbin,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/11918

to review the following change.


Change subject: raft_consensus_nonvoter-itest: deflake a bit
......................................................................

raft_consensus_nonvoter-itest: deflake a bit

I saw a failure in ReplicaBehindWalGcThresholdITest.ReplicaReplacement
(GetParam() was (1, false)) just after the master was restarted:

  raft_consensus_nonvoter-itest.cc:2070: Failure
  Failed
  Bad status: Service unavailable: Leader not yet ready to serve requests

This is odd as there's a WaitForCatalogManager() call in there, so why would
a subsequent GetTabletLocations RPC return this ServiceUnavailable? As best
I can tell, the only way for this to happen is if the attempt to grab the
leadership lock from within the ListTables RPC (sent from
WaitForCatalogManager()) returns IllegalState, which it'll do if the
UUID in the master's cstate doesn't match the UUID on disk. Perhaps this can
happen during a leader master election; maybe the cstate's UUID becomes
empty for a little while?

If that's true, this should fix the problem by considering IllegalState to
be a non-final state and continuing the loop.

Change-Id: I8192bd669e7e309943ea82718dd715238d520bbd
---
M src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
3 files changed, 21 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/11918/1
--
To view, visit http://gerrit.cloudera.org:8080/11918
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8192bd669e7e309943ea82718dd715238d520bbd
Gerrit-Change-Number: 11918
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>

Reply via email to