Andrew Wong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16510


Change subject: KUDU-2612 p12: have MRS iteration account for txn ID
......................................................................

KUDU-2612 p12: have MRS iteration account for txn ID

WIP somewhat. See below for an alternate approach.

This patch introduces the ability to iterate through the rows of a MRS
as of a transaction's commit timestamp, rather than the apply timestamps
of the individual mutations therein. It does so by adding an interface
to the MvccManager that tracks the commit timestamp for each
transaction, and upon generating MvccSnapshots for scans, only returning
the appropriate commit timestamps for the iteration timestamp.

This only adds the APIs to the MemRowSet and MvccManager; there is still
no way to exercise these APIs using a real tablet. Additionally, this
does not entail iteration required for flushes and compactions; that
will come in a later patch.

Why wip?

I don't love this approach for a couple of reasons:
- Commit timestamps are already tracked in memory by the TxnMetadata,
  so it seems wasteful to keep track of commit timestamps again in the
  MvccManager.
- The tracking of commit timestamps in the MvccManager currently means
  that iterating through transactional mutations entails a look-up
  into a potentially huge map of commit timestamps. There are some
  ways to improve this by limiting the number of commit timestamps
  tracked (e.g. as transactional state gets merged with the rest of the
  tablet, forget about the commit timestamps, e.g. use some clever short
  circuiting to avoid map lookups), but it would be nice if we didn't
  have to do these lookups at all.

An alternate approach is to keep a reference to the appropriate
TxnMetadata in each mem-store or mutation. Upon iterating, rather than
relying on the MvccSnapshot(s) to tell us whether a mutation is relevant
to a scan or not, we could then dereference the commit timestamp, if
any, and determine the relevancy status then and there.
- For scans, MvccSnapshots are "clean", i.e. they can be represented by
  a single timestamp each, meaning relevancy may be a matter of simple
  timestamp comparison to the commit timestamp.
- For flushes, MvccSnapshots are not as straightforward; if we ever want
  to support flushing uncommitted mutations, it becomes quite difficult
  to think through races with transaction commit. For instance, the
  flush path is roughly the following:

  1. Take an MvccSnapshot of the MRS
  2. Create DRS, using mutations up to that snapshot
  3. Start duplicating new updates to both the MRS and to the DRS
  4. Duplicate all mutations in between the first snapshot and the time
     we started duplicating to the DRS
  5. Stop duplicating new updates to both rowsets

  Longer term, concurrent commit would introduce a new layer of
  complexity here, e.g. what happens if the insertions are a part of an
  uncommitted transaction, and the transaction is committed part-way
  through the flush?
  - Assuming there were some kind of atomic delta store for
    transactions, perhaps that would result in a mix of atomic UNDOs
    (for the mutations we iterated through while the transaction wasn't
    committed) and regular UNDOs (for the mutations we iterated through
    after the transaction was committed).
- Questions also arise when we begin to think about other forms of
  iteration. What happens if we have a diff scan between two snapshots,
  and while evaluating the relevancy of our mutations, a transaction
  commits?
- With some thought, there are solutions here, but it does add to the
  complexity of some already complex processes.
- The benefit, through this complexity, is what I would expect to be
  more performant iteration through transactional mutations, as well as
  reduced memory overhead for commit timestamps.

Change-Id: I6bb02c6025eea1a327cf9d9ee1f14a38d63ae4ad
---
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mvcc-test.cc
M src/kudu/tablet/mvcc.cc
M src/kudu/tablet/mvcc.h
6 files changed, 339 insertions(+), 87 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/10/16510/1
--
To view, visit http://gerrit.cloudera.org:8080/16510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6bb02c6025eea1a327cf9d9ee1f14a38d63ae4ad
Gerrit-Change-Number: 16510
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <[email protected]>

Reply via email to