[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-14 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged.

Change subject: master: add read-write lock to serialize operations around 
elections
..


master: add read-write lock to serialize operations around elections

This rigmarole began with an investigation into a test failure [1], which
led to a new integration test that hammers VisitTablesAndTablets() while
creating tables. That test revealed other locking issues, which brings us
to where we are now.

This patch introduces a read-write lock to serialize all master operations
so that they fall on one side or the other of a leader election. The idea
is to avoid performing operations concurrently with a reload of the master
metadata; doing so can lead to problems in Shutdown() and (very rarely,
perhaps only conceptually) to inconsistent on-disk state.

I was hoping this lock could replace the fencing done by leader_ready_term_,
but eventually reasoned that we need both; without leader_ready_term_
fencing, the master's consensus state machine could fool an operation into
thinking the master became the leader before the metadata was reloaded.

Three other things of note here:
- The new lock is acquired via TryLock() so that, if the lock could not be
  acquired, the RPC will fail rather than block. A future patch modifies
  TSHeartbeat() to partially accept heartbeats even if the master is a
  follower; TryLock() means that a transitioning leader that is pelted with
  RPCs won't fill up its service queue and can still process heartbeats.
- TableInfo's AddTask() and RemoveTask() methods now don't hold the table's
  lock when adding and removing refs from the task respectively. This is
  the fix for the original test failure.
- When reloading metadata, we now abort all outstanding table tasks to
  avoid orphaning them.

1. 
http://dist-test.cloudera.org:8080/diagnose?key=224b3aa2-3c87-11e6-9a09-0242ac110001

Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Reviewed-on: http://gerrit.cloudera.org:8080/3550
Reviewed-by: Dan Burkert 
Tested-by: Adar Dembo 
---
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master_service.cc
6 files changed, 334 insertions(+), 144 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Adar Dembo: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-14 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 10: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-14 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 10:

Build Started http://104.196.14.100/job/kudu-gerrit/2464/

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-14 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-13 Thread Adar Dembo (Code Review)
Hello Dan Burkert, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3550

to look at the new patch set (#8).

Change subject: master: add read-write lock to serialize operations around 
elections
..

master: add read-write lock to serialize operations around elections

This rigmarole began with an investigation into a test failure [1], which
led to a new integration test that hammers VisitTablesAndTablets() while
creating tables. That test revealed other locking issues, which brings us
to where we are now.

This patch introduces a read-write lock to serialize all master operations
so that they fall on one side or the other of a leader election. The idea
is to avoid performing operations concurrently with a reload of the master
metadata; doing so can lead to problems in Shutdown() and (very rarely,
perhaps only conceptually) to inconsistent on-disk state.

I was hoping this lock could replace the fencing done by leader_ready_term_,
but eventually reasoned that we need both; without leader_ready_term_
fencing, the master's consensus state machine could fool an operation into
thinking the master became the leader before the metadata was reloaded.

Three other things of note here:
- The new lock is acquired via TryLock() so that, if the lock could not be
  acquired, the RPC will fail rather than block. A future patch modifies
  TSHeartbeat() to partially accept heartbeats even if the master is a
  follower; TryLock() means that a transitioning leader that is pelted with
  RPCs won't fill up its service queue and can still process heartbeats.
- TableInfo's AddTask() and RemoveTask() methods now don't hold the table's
  lock when adding and removing refs from the task respectively. This is
  the fix for the original test failure.
- When reloading metadata, we now abort all outstanding table tasks to
  avoid orphaning them.

1. 
http://dist-test.cloudera.org:8080/diagnose?key=224b3aa2-3c87-11e6-9a09-0242ac110001

Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
---
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master_service.cc
6 files changed, 334 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/3550/8
-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-13 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 8:

Build Started http://104.196.14.100/job/kudu-gerrit/2424/

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-13 Thread Adar Dembo (Code Review)
Hello Dan Burkert, Todd Lipcon,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3550

to look at the new patch set (#7).

Change subject: master: add read-write lock to serialize operations around 
elections
..

master: add read-write lock to serialize operations around elections

This rigmarole began with an investigation into a test failure [1], which
led to a new integration test that hammers VisitTablesAndTablets() while
creating tables. That test revealed other locking issues, which brings us
to where we are now.

This patch introduces a read-write lock to serialize all master operations
so that they fall on one side or the other of a leader election. The idea
is to avoid performing operations concurrently with a reload of the master
metadata; doing so can lead to problems in Shutdown() and (very rarely,
perhaps only conceptually) to inconsistent on-disk state.

I was hoping this lock could replace the fencing done by leader_ready_term_,
but eventually reasoned that we need both; without leader_ready_term_
fencing, the master's consensus state machine could fool an operation into
thinking the master became the leader before the metadata was reloaded.

Three other things of note here:
- The new lock is acquired via TryLock() so that, if the lock could not be
  acquired, the RPC will fail rather than block. A future patch modifies
  TSHeartbeat() to partially accept heartbeats even if the master is a
  follower; TryLock() means that a transitioning leader that is pelted with
  RPCs won't fill up its service queue and can still process heartbeats.
- TableInfo's AddTask() and RemoveTask() methods now don't hold the table's
  lock when adding and removing refs from the task respectively. This is
  the fix for the original test failure.
- When reloading metadata, we now abort all outstanding table tasks to
  avoid orphaning them.

1. 
http://dist-test.cloudera.org:8080/diagnose?key=224b3aa2-3c87-11e6-9a09-0242ac110001

Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
---
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master_service.cc
6 files changed, 331 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/3550/7
-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-12 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 6: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3550/6//COMMIT_MSG
Commit Message:

Line 30:   RPCs won't fill up its service queue and can still process 
heartbeats.
ah yes, we had basically this problem with the original HDFS HA implementation!


-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-12 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 6: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-11 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 6:

Build Started http://104.196.14.100/job/kudu-gerrit/2317/

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-11 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 5: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-10 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 5:

Build Started http://104.196.14.100/job/kudu-gerrit/2296/

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-10 Thread Adar Dembo (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3550

to look at the new patch set (#5).

Change subject: master: add read-write lock to serialize operations around 
elections
..

master: add read-write lock to serialize operations around elections

This rigmarole began with an investigation into a test failure [1], which
led to a new integration test that hammers VisitTablesAndTablets() while
creating tables. That test revealed other locking issues, which brings us
to where we are now.

This patch introduces a read-write lock to serialize all master operations
so that they fall on one side or the other of a leader election. The idea
is to avoid performing operations concurrently with a reload of the master
metadata; doing so can lead to problems in Shutdown() and (very rarely,
perhaps only conceptually) to inconsistent on-disk state.

I was hoping this lock could replace the fencing done by leader_ready_term_,
but eventually reasoned that we need both; without leader_ready_term_
fencing, the master's consensus state machine could fool an operation into
thinking the master became the leader before the metadata was reloaded.

Three other things of note here:
- The new lock is acquired via TryLock() so that, if the lock could not be
  acquired, the RPC will fail rather than block. A future patch modifies
  TSHeartbeat() to partially accept heartbeats even if the master is a
  follower; TryLock() means that a transitioning leader that is pelted with
  RPCs won't fill up its service queue and can still process heartbeats.
- TableInfo's AddTask() and RemoveTask() methods now don't hold the table's
  lock when adding and removing refs from the task respectively. This is
  the fix for the original test failure.
- When reloading metadata, we now abort all outstanding table tasks to
  avoid orphaning them.

1. 
http://dist-test.cloudera.org:8080/diagnose?key=224b3aa2-3c87-11e6-9a09-0242ac110001

Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
---
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master_service.cc
6 files changed, 331 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/3550/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-10 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3550/4/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

Line 358: // If there is an error (e.g., we are not the leader) 
abort this task
> Should this comment be update?
Not necessarily; this master could lose its leadership after taking the lock. 
It'll find out when trying to replicate to the followers, and that error will 
propagate up here.


Line 3147:   std::lock_guard l(catalog_->state_lock_);
> I expected these checks to be made 'on demand' in CheckIsInitializedOrRespo
For one, some users don't use the CheckIs...() methods; they call the status() 
accessors only.

For two, I modeled this class after lock_guard and shared_lock; I want its 
users to think that the resource is taken when the class is declared, and 
released when the instance goes out of scope.


-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-09 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3550/4/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

Line 358: // If there is an error (e.g., we are not the leader) 
abort this task
Should this comment be update?


Line 3147:   std::lock_guard l(catalog_->state_lock_);
I expected these checks to be made 'on demand' in 
CheckIsInitializedOrRespond/CheckIsInitializedAndIsLeaderOrRespond.  Any reason 
not to do it there?


-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-08 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 4:

Build Started http://104.196.14.100/job/kudu-gerrit/2262/

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-08 Thread Adar Dembo (Code Review)
Hello Dan Burkert, David Ribeiro Alves, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3550

to look at the new patch set (#4).

Change subject: master: add read-write lock to serialize operations around 
elections
..

master: add read-write lock to serialize operations around elections

This rigmarole began with an investigation into a test failure [1], which
led to a new integration test that hammers VisitTablesAndTablets() while
creating tables. That test revealed other locking issues, which brings us
to where we are now.

This patch introduces a read-write lock to serialize all master operations
so that they fall on one side or the other of a leader election. The idea
is to avoid performing operations concurrently with a reload of the master
metadata; doing so can lead to problems in Shutdown() and (very rarely,
perhaps only conceptually) to inconsistent on-disk state.

I was hoping this lock could replace the fencing done by leader_ready_term_,
but eventually reasoned that we need both; without leader_ready_term_
fencing, the master's consensus state machine could fool an operation into
thinking the master became the leader before the metadata was reloaded.

Three other things of note here:
- The new lock is acquired via TryLock() so that, if the lock could not be
  acquired, the RPC will fail rather than block. A future patch modifies
  TSHeartbeat() to partially accept heartbeats even if the master is a
  follower; TryLock() means that a transitioning leader that is pelted with
  RPCs won't fill up its service queue and can still process heartbeats.
- TableInfo's AddTask() and RemoveTask() methods now don't hold the table's
  lock when adding and removing refs from the task respectively. This is
  the fix for the original test failure.
- When reloading metadata, we now abort all outstanding table tasks to
  avoid orphaning them.

1. 
http://dist-test.cloudera.org:8080/diagnose?key=224b3aa2-3c87-11e6-9a09-0242ac110001

Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
---
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master_service.cc
6 files changed, 331 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/3550/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-08 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/3550/3/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

PS3, Line 345:  l.c
> could be more descriptive here?
Done


PS3, Line 367: aborting the current task...
> hmm, this (old) log message is nonsense, right?
Yeah. I'll just remove the whole thing. Not useful and adds noise to the 
control flow.


Line 657: e.second->WaitTasksCompletion();
> does this put a restriction on the tasks themselves that they may not block
Correct. As it turns out, there was a separate deadlock issue which I've fixed: 
the task callbacks ran on the same singleton thread pool as this function. Now 
none of the tasks touch the leader lock.


http://gerrit.cloudera.org:8080/#/c/3550/3/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

PS3, Line 304: General status 
> a specific example of what types of "bad status" this might represent would
Done


PS3, Line 305: leadership_status
> you mean leader_status()?
Yes to both.


PS3, Line 308: Leadership status
> would a simple bool suffice?
CheckIsInitializedAndIsLeaderOrRespond() needs access to the status itself 
because it writes the message into the response, but that doesn't preclude the 
leader_status() accessor from being a bool.

But, this way it's more consistent with catalog_status(), which I like. Do you 
feel strongly?


PS3, Line 316: '.
> and returns false
Done


PS3, Line 691: etc
> what about read operations? should you specifically say write operations? a
As you observed, reads should also be guarded by this lock. It may be possible 
for reads to get by with the existing locks, but that means blocking for a long 
period of time on a spinlock, which isn't ideal. I went back and forth on this, 
but in the end reasoned that it's far simpler for _every_ operation to acquire 
it.


-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-01 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/3550/3/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

PS3, Line 345:  l.c
could be more descriptive here?


PS3, Line 367: aborting the current task...
hmm, this (old) log message is nonsense, right?


Line 657: e.second->WaitTasksCompletion();
does this put a restriction on the tasks themselves that they may not block on 
the leader lock? otherwise is there a deadlock case?


http://gerrit.cloudera.org:8080/#/c/3550/3/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

PS3, Line 304: General status 
a specific example of what types of "bad status" this might represent would be 
useful


PS3, Line 305: leadership_status
you mean leader_status()?
should we DCHECK(catalog_status.ok()) within leader_status() in this case?


PS3, Line 308: Leadership status
would a simple bool suffice?


PS3, Line 316: '.
and returns false


PS3, Line 691: etc
what about read operations? should you specifically say write operations? are 
read operations already protected by the tableinfo locks, etc? or should we 
acquire in them too to prevent a race where the Visit*() functions clear the 
maps just as a read request arrives (but after the read request checks 
leadership)


-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-01 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 3:

Build Started http://104.196.14.100/job/kudu-gerrit/2151/

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-01 Thread Adar Dembo (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3550

to look at the new patch set (#3).

Change subject: master: add read-write lock to serialize operations around 
elections
..

master: add read-write lock to serialize operations around elections

This rigmarole began with an investigation into a test failure [1], which
led to a new integration test that hammers VisitTablesAndTablets() while
creating tables. That test revealed other locking issues, which brings us
to where we are now.

This patch introduces a read-write lock to serialize all master operations
so that they fall on one side or the other of a leader election. The idea
is to avoid performing operations concurrently with a reload of the master
metadata; doing so can lead to problems in Shutdown() and (very rarely,
perhaps only conceptually) to inconsistent on-disk state.

I was hoping this lock could replace the fencing done by leader_ready_term_,
but eventually reasoned that we need both; without leader_ready_term_
fencing, the master's consensus state machine could fool an operation into
thinking the master became the leader before the metadata was reloaded.

Two other things of note here:
- TableInfo's AddTask() and RemoveTask() methods now don't hold the table's
  lock when adding and removing refs from the task respectively. This is
  the fix for the original test failure.
- When reloading metadata, we now abort all outstanding table tasks to
  avoid orphaning them.

1. 
http://dist-test.cloudera.org:8080/diagnose?key=224b3aa2-3c87-11e6-9a09-0242ac110001

Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
---
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master_service.cc
6 files changed, 294 insertions(+), 136 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/3550/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-07-01 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 2:

Build Started http://104.196.14.100/job/kudu-gerrit/2150/

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-06-30 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 1:

I've been reviewing the test failure and reproducing it locally. I think I need 
to continue incorporating leader_term_ into the check in some way; at least 
checking that it's not -1, I believe.

Otherwise it's possible for a caller to grab a ScopedLeaderSharedLock just as 
the master is starting up, determine that the master is the leader, and let go 
of the lock, after which VisitTablesAndTablets grabs the lock and begins 
loading metadata.

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-06-30 Thread Adar Dembo (Code Review)
Hello Dan Burkert, Todd Lipcon,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/3550

to review the following change.

Change subject: master: add read-write lock to serialize operations around 
elections
..

master: add read-write lock to serialize operations around elections

This rigmarole began with an investigation into a test failure [1], which
led to a new integration test that hammers VisitTablesAndTablets() while
creating tables. That test revealed other locking issues, which brings us
to where we are now.

This patch introduces a read-write lock to serialize all master operations
so that they fall on one side or the other of a leader election. The idea
is to avoid performing operations concurrently with a reload of the master
metadata; doing so can lead to problems in Shutdown() and (very rarely,
perhaps only conceptually) to inconsistent on-disk state.

Two other things of note here:
- TableInfo's AddTask() and RemoveTask() methods now don't hold the table's
  lock when adding and removing refs from the task respectively. This is
  the fix for the original test failure.
- When reloading metadata, we now abort all outstanding table tasks to
  avoid orphaning them.

1. 
http://dist-test.cloudera.org:8080/diagnose?key=224b3aa2-3c87-11e6-9a09-0242ac110001

Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
---
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/mini_cluster.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master.cc
M src/kudu/master/master_service.cc
6 files changed, 292 insertions(+), 146 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/3550/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] master: add read-write lock to serialize operations around elections

2016-06-30 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: master: add read-write lock to serialize operations around 
elections
..


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/2143/

-- 
To view, visit http://gerrit.cloudera.org:8080/3550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5084c09f1a77ccf620fb6cd621094c4778d636f8
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No