Hello Tidy Bot, Mike Percy, Alexey Serbin, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7221

to look at the new patch set (#25).

Change subject: Simplify OpId/Timestamp assignment and make it atomic
......................................................................

Simplify OpId/Timestamp assignment and make it atomic

There are a couple of constraints that are tricky to enforce and were
requiring us to have convoluted interactions on OpId/Timestamp assignment
and registration between the TimeManager, MvccManager and ConsensusQueue.

These convoluted interactions are currently preventing a clean and simple
leader leases implementation, particularly the single timestamp outstanding
in TimeManager.

The "tricky" constraints are:

i) Before clean time is advanced in mvcc, all transactions with a
timestamp lower than the current time must be registered with mvcc.
This allows to wait for a certain timestamp t to be "safe" with TimeManager
and then wait for all transactions with a timestamp t1 < t to commit, thus
making sure that a scan at t is repeatable.

ii) The leader can't advance follower safe timestamp to t, before
all the operations with a timestamp lower than t have been sent.
This is hard since timestamps are assigned outside of consensus.

In general it's problematic to generate OpIds non-atomically with
timestamps, since these are both supposed to be monotonically increasing
in unison. To address this we had some complex state and interactions between
the components. For instance this was requiring us to have TimeManager
stop safe time advancement until a transaction being prepared was
submitted to the queue, after assigning a timestamp in the
TransactionDriver. This was problematic because with it would only allow at
most one assigned timestamp in-flight, meaning we were not able to move safe
time at will.

This patch completely addresses ii) Timestamps and OpIds are now assigned
by PeerMessageQueue, atomically. This makes it trivial to make sure that the
safe time sent to replicas is inline with the sent messages. This also allows
some simplifications and refactorings of the PeerMessageQueue and TimeManager.

As for i) the way we enforce this constraint changed. Instead of registering
a transaction before submitting it to consensus (which we now can't since
consensus submission is also where we assign the timestamp) we register
another, dummy transaction on the MvccManager before submission and abort it
afterwards. This allows to "pin" safe time advancement down to a timestamp
that is mandatorily smaller than then one that will be assigned thus enforcing
the constraint with a simpler implementation at the cost of an additional
(short-lived) transaction in mvcc. Note in this regard that there will be
at most one aditional transaction in mvcc.

One side effect of this change is that, for new config changes, we won't
know the OpId until after the queue submission, which needs to happen
after actually setting the pending config. This is largely inconsequential
since, by design, we can have at most one pending config, but did require
some refactoring.

Change-Id: I2bedcc5c3708d06f48131f34ea14c835867b4800
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_meta-test.cc
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/log_cache-test.cc
M src/kudu/consensus/pending_rounds.cc
M src/kudu/consensus/quorum_util.cc
M src/kudu/consensus/quorum_util.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/time_manager-test.cc
M src/kudu/consensus/time_manager.cc
M src/kudu/consensus/time_manager.h
M src/kudu/integration-tests/alter_table-test.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/transactions/transaction_driver.cc
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/ts_tablet_manager.cc
24 files changed, 357 insertions(+), 284 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7221/25
--
To view, visit http://gerrit.cloudera.org:8080/7221
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2bedcc5c3708d06f48131f34ea14c835867b4800
Gerrit-Change-Number: 7221
Gerrit-PatchSet: 25
Gerrit-Owner: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to