Andrew Wong has uploaded this change for review. (
http://gerrit.cloudera.org:8080/16510
Change subject: KUDU-2612 p12: have MRS iteration account for txn ID
......................................................................
KUDU-2612 p12: have MRS iteration account for txn ID
WIP somewhat. See below for an alternate approach.
This patch introduces the ability to iterate through the rows of a MRS
as of a transaction's commit timestamp, rather than the apply timestamps
of the individual mutations therein. It does so by adding an interface
to the MvccManager that tracks the commit timestamp for each
transaction, and upon generating MvccSnapshots for scans, only returning
the appropriate commit timestamps for the iteration timestamp.
This only adds the APIs to the MemRowSet and MvccManager; there is still
no way to exercise these APIs using a real tablet. Additionally, this
does not entail iteration required for flushes and compactions; that
will come in a later patch.
Why wip?
I don't love this approach for a couple of reasons:
- Commit timestamps are already tracked in memory by the TxnMetadata,
so it seems wasteful to keep track of commit timestamps again in the
MvccManager.
- The tracking of commit timestamps in the MvccManager currently means
that iterating through transactional mutations entails a look-up
into a potentially huge map of commit timestamps. There are some
ways to improve this by limiting the number of commit timestamps
tracked (e.g. as transactional state gets merged with the rest of the
tablet, forget about the commit timestamps, e.g. use some clever short
circuiting to avoid map lookups), but it would be nice if we didn't
have to do these lookups at all.
An alternate approach is to keep a reference to the appropriate
TxnMetadata in each mem-store or mutation. Upon iterating, rather than
relying on the MvccSnapshot(s) to tell us whether a mutation is relevant
to a scan or not, we could then dereference the commit timestamp, if
any, and determine the relevancy status then and there.
- For scans, MvccSnapshots are "clean", i.e. they can be represented by
a single timestamp each, meaning relevancy may be a matter of simple
timestamp comparison to the commit timestamp.
- For flushes, MvccSnapshots are not as straightforward; if we ever want
to support flushing uncommitted mutations, it becomes quite difficult
to think through races with transaction commit. For instance, the
flush path is roughly the following:
1. Take an MvccSnapshot of the MRS
2. Create DRS, using mutations up to that snapshot
3. Start duplicating new updates to both the MRS and to the DRS
4. Duplicate all mutations in between the first snapshot and the time
we started duplicating to the DRS
5. Stop duplicating new updates to both rowsets
Longer term, concurrent commit would introduce a new layer of
complexity here, e.g. what happens if the insertions are a part of an
uncommitted transaction, and the transaction is committed part-way
through the flush?
- Assuming there were some kind of atomic delta store for
transactions, perhaps that would result in a mix of atomic UNDOs
(for the mutations we iterated through while the transaction wasn't
committed) and regular UNDOs (for the mutations we iterated through
after the transaction was committed).
- Questions also arise when we begin to think about other forms of
iteration. What happens if we have a diff scan between two snapshots,
and while evaluating the relevancy of our mutations, a transaction
commits?
- With some thought, there are solutions here, but it does add to the
complexity of some already complex processes.
- The benefit, through this complexity, is what I would expect to be
more performant iteration through transactional mutations, as well as
reduced memory overhead for commit timestamps.
Change-Id: I6bb02c6025eea1a327cf9d9ee1f14a38d63ae4ad
---
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mvcc-test.cc
M src/kudu/tablet/mvcc.cc
M src/kudu/tablet/mvcc.h
6 files changed, 339 insertions(+), 87 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/10/16510/1
--
To view, visit http://gerrit.cloudera.org:8080/16510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6bb02c6025eea1a327cf9d9ee1f14a38d63ae4ad
Gerrit-Change-Number: 16510
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <[email protected]>