Hello Dan Burkert, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3550

to look at the new patch set (#8).

Change subject: master: add read-write lock to serialize operations around 
elections
......................................................................

master: add read-write lock to serialize operations around elections

This rigmarole began with an investigation into a test failure [1], which
led to a new integration test that hammers VisitTablesAndTablets() while
creating tables. That test revealed other locking issues, which brings us
to where we are now.

This patch introduces a read-write lock to serialize all master operations
so that they fall on one side or the other of a leader election. The idea
is to avoid performing operations concurrently with a reload of the master
metadata; doing so can lead to problems in Shutdown() and (very rarely,
perhaps only conceptually) to inconsistent on-disk state.

I was hoping this lock could replace the fencing done by leader_ready_term_,
but eventually reasoned that we need both; without leader_ready_term_
fencing, the master's consensus state machine could fool an operation into
thinking the master became the leader before the metadata was reloaded.

Three other things of note here:
- The new lock is acquired via TryLock() so that, if the lock could not be
  acquired, the RPC will fail rather than block. A future patch modifies
  TSHeartbeat() to partially accept heartbeats even if the master is a
  follower; TryLock() means that a transitioning leader that is pelted with
  RPCs won't fill up its service queue and can still process heartbeats.
- TableInfo's AddTask() and RemoveTask() methods now don't hold the table's
  lock when adding and removing refs from the task respectively. This is
  the fix for the original test failure.
- When reloading metadata, we now abort all outstanding table tasks to
  avoid orphaning them.

1. 
http://dist-test.cloudera.org:8080/diagnose?key=224b3aa2-3c87-11e6-9a09-0242ac110001

Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
---
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master_service.cc
6 files changed, 334 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/3550/8
-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to