Hello Dan Burkert, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/7957

to review the following change.

Change subject: KUDU-1807: improve IsCreateInProgress performance
......................................................................

KUDU-1807: improve IsCreateInProgress performance

For some inexplicable reason, IsCreateInProgress is invoked when servicing a
GetTableSchema RPC. While it's attractive to change that, it's also
untenable without affecting backwards compatibility. So instead, here's a
patch that adds in-memory caching to the master so that IsCreateInProgress
needn't acquire N tablet locks.

Just like schema_version_counts_, the cached state is a map of all tablet
states to the number of tablets currently in that state. For this particular
problem a single count of "number of creating tablets" might suffice, but I
liked the consistency with schema_version_counts_ and I found it easier to
handle the various corner cases when accounting for all states.

Because the map contents reflect persistent state, the locking semantics
differ somewhat from schema_version_counts_:
1. Tablet state changes are usually wrapped in a tablet metadata lock, so
   care must be taken to only update the state if the lock commits.
2. Further, the count must be updated with the tablet metadata lock held.
   Why? So that count updates are serialized in the same order as state
   changes themselves. Consider an example where two threads are trying to
   update a tablet's state (currently X) to Y and Z respectively. The count
   of state X begins at 1 (C_X=1) and the counts of states Y and Z are 0
   (C_Y=C_Z=0). There's no defined order so whichever state change happens
   second "wins". However, if the count updates were allowed outside the
   metadata locks, it would be possible for the order of operations to be
   X->Y, Y->Z, (C_Y-- C_Z++), (C_X-- C_Y++), causing C_Y to momentarily go
   negative. If count updates are restricted to inside the metadata locks, a
   state change order of X->Y, Y->Z forces the count update order to be
   (C_X-- C_Y++), (C_Y--, C_Z++).

To accommodate these semantics, the ScopedTabletInfoCommitter was modified
to support tracking state changes. There are two notable (and confusing)
exceptions which are documented in the code.

Change-Id: Idca379bb1249e54c0b7fbb0df408ed829ab9fbd2
---
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
2 files changed, 224 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/7957/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7957
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Idca379bb1249e54c0b7fbb0df408ed829ab9fbd2
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to