Hello Dan Burkert, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/3550 to look at the new patch set (#8). Change subject: master: add read-write lock to serialize operations around elections ...................................................................... master: add read-write lock to serialize operations around elections This rigmarole began with an investigation into a test failure [1], which led to a new integration test that hammers VisitTablesAndTablets() while creating tables. That test revealed other locking issues, which brings us to where we are now. This patch introduces a read-write lock to serialize all master operations so that they fall on one side or the other of a leader election. The idea is to avoid performing operations concurrently with a reload of the master metadata; doing so can lead to problems in Shutdown() and (very rarely, perhaps only conceptually) to inconsistent on-disk state. I was hoping this lock could replace the fencing done by leader_ready_term_, but eventually reasoned that we need both; without leader_ready_term_ fencing, the master's consensus state machine could fool an operation into thinking the master became the leader before the metadata was reloaded. Three other things of note here: - The new lock is acquired via TryLock() so that, if the lock could not be acquired, the RPC will fail rather than block. A future patch modifies TSHeartbeat() to partially accept heartbeats even if the master is a follower; TryLock() means that a transitioning leader that is pelted with RPCs won't fill up its service queue and can still process heartbeats. - TableInfo's AddTask() and RemoveTask() methods now don't hold the table's lock when adding and removing refs from the task respectively. This is the fix for the original test failure. - When reloading metadata, we now abort all outstanding table tasks to avoid orphaning them. 1. http://dist-test.cloudera.org:8080/diagnose?key=224b3aa2-3c87-11e6-9a09-0242ac110001 Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8 --- M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/mini_cluster.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.cc M src/kudu/master/master_service.cc 6 files changed, 334 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/3550/8 -- To view, visit http://gerrit.cloudera.org:8080/3550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <d...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org>