Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11142

to look at the new patch set (#2).

Change subject: tablet_bootstrap: adjust mvcc safetime with no-ops
......................................................................

tablet_bootstrap: adjust mvcc safetime with no-ops

Previously, during tablet bootstrap, a tablet would only update its MVCC
safetime based on write messages, as the timstamps in the write messages
are guaranteed to be serialized with respect to one another, by virtue
of being assigned in a single thread (the prepare thread) on the leader
replica.

>From this, we conclude that timestamps for write operations are
monotonically increasing in unison with opid. The same cannot
necessarily be said for timestamps of no-ops and change configs.

This is a conservative conclusion about assigned timestamps, and this
patch hinges on the fact that our Raft implementation ensures the
following sequence of events:

1. replica A becomes leader of Term N
2. leader A assigns a timestamp t1 to its no-op
3. leader A replicates the no-op to replicas B and C, asserting its
   leadership for Term N
4. leader A prepares a write and assigns it a timestamp t2. A assigns a
   higher timestamp than t1, as this step happens after Step 2
5. leader A replicates the write to replicas B and C, checking that it
   is leader for the current term

Given the above series of operations, within the same term, the no-op
used to assert leadership is always assigned a timestamp that must be
lower than any writes in that term. As such, the timestamps assigned to
no-ops can and should be used to bump safetime.

This patch updates tablet bootstrap to adjust MVCC safetime based on
no-ops seen in the WALs. A test is added asserting that this is true of
no-ops with respect to writes. I.e. all replicate messages must have
monotonically increasing OpIds and monotonically increasing timestamps.

A case in tablet_bootstrap-test depends on the ability for no-ops to
written out of timestamp order. To maintain this, and to keep this
functionality around (which may be useful for general timestamp
assignment testing), a flag has been added to the NoOpRequestPB
indicating whether or not its timestamp should be trusted to advance
safetime.

Additionally, I tweaked the artificial timestamps used in
raft_consensus-itest. These timestamps were previously very low and
would overlap with the real timestamp used by the leadership no-op.

Change-Id: I26deff32da8c990cb8a2ba220bb81858ddd6d73f
---
M src/kudu/consensus/consensus.proto
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/log_verifier.cc
M src/kudu/integration-tests/log_verifier.h
M src/kudu/integration-tests/raft_consensus-itest.cc
A src/kudu/integration-tests/timestamp_serialization-itest.cc
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_bootstrap.cc
8 files changed, 241 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/42/11142/2
--
To view, visit http://gerrit.cloudera.org:8080/11142
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I26deff32da8c990cb8a2ba220bb81858ddd6d73f
Gerrit-Change-Number: 11142
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

Reply via email to