David Ribeiro Alves has submitted this change and it was merged.

Change subject: KUDU-798 (part 5) Correct safe time advancement
......................................................................


KUDU-798 (part 5) Correct safe time advancement

This patch fixes safe time advancement in general and allows for
safe time advancement in the absence of writes in particular.
The core of the patch is to plug in the new TimeManager class wherever
needed. But there is also a major cleanup of the waiting story in
TabletService (KUDU-1127) and a few new integration test features.

There is an instance of a TimeManager per tablet. It's used for:

- When replicating messages to other replicas a leader uses the
  TimeManager to assign timestamps and to obtain a safe time to send,
  even when there are no writes.

- When receiving messages from a leader consensus uses the TimeManager
  to update the clock and to unblock any waiters that might waiting
  for a particular timestamp to be safe.

- Before a snapshot scan proceeds to scan, it must first wait for the
  TimeManager to deem whatever timestamp it has safe. Then it proceeds
  to wait for the snapshot at timestamp to have all its transactions
  committed and, finally, proceeds with the scan.

Put together, these changes allow to make sure that snapshot scans are
repeatable in the large majority of cases. The one "hole" in safe time
is solved by leader leases. Until we have those this patch takes a
conservative approach to safe time progress.

Fixing safe time broke a bunch of our tests that were expecting broken
snapshot scans. In particular we would return broken snapshots all the
time instead of waiting for the snapshot to be correct. Of course
when these errors were fixed the tests started failing.

In order to address these test failures I cleaned up our snapshot scan
waiting story in TabletServer::HandleScanAtSnapshot(). In particular:

- The client's deadline in no longer taken into account when deciding
  how long to wait for a snapshot to be repeatable. There is a hard
  (configurable) max that the server will wait for, "clamping" the
  client's deadline. The key here is that, when the client deadline
  is clamped, we return Status::ServiceUnavailable on time out
  instead of Status::TimeOut(). Since HandleScanAtSnapshot() is called
  on KuduScanner::Open() and ServiceUnavailable is a retryable status
  this enables the client to try somewhere else, perhaps where it won't
  have to wait as long.

- TimeManager now does a pre-flight check before waiting on safe time.
  In particular it checks that: i) it has heard from the leader within
  a configurable amount of time (that safe time _can_ make progress).
  ii) it checks that the safe time is not more that a configurable
  amount of time in the past, 30 seconds by default (that safe time
  is likely to make progress to the required timestamp).

Finally, this patch adds two new integration test workloads that
prove that it works. It adds a new read workload to TestWorkload
that performs snapshot scans in the present, while writes are
happening. This can be enabled anywhere but this patch enables
it for a bunch of tests in RaftConsensusItest, in particular the
*Churny* and *Crashy* ones with unique keys. This patch also enables
linked_list-test to perform snapshot scans in the present after
the verification.

Results:

I ran raft_consensus-itest with the new snapshot read workload
on dist-test, asan, with the following config:

KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \
--disable-sharding loop -n 1000 bin/raft_consensus-itest \
--gtest_filter=*Churny*:*Crashy*-*Duplicate*

I pulled the test to before this patch. It failed 1000/1000 on
master. With this patch it passed 1000/1000:

http://dist-test.cloudera.org//job?job_id=david.alves.1481097865.18287

I ran linked_list-test with the new snapshot scans in dist-test,
asan, with the following config:

KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \
loop -n 1000 bin/linked_list-test --stress_cpu_threads=2 \
--gtest_filter=*TestLoadAndVerify*

The test passed 1000/1000 whereas before it would fail 427/1000.
Results:

http://dist-test.cloudera.org//job?job_id=david.alves.1481104106.24665

I also ran the test in client-test that tests fault tolerance.
Run config:

KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \
--disable-sharding loop -n 1000 -- bin/client-test \
--gtest_filter=*ScanFaultTolerance* --stress_cpu_threads=2

The test passed 1000/1000 times. Results:

http://dist-test.cloudera.org//job?job_id=david.alves.1481171410.4460

Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e
Reviewed-on: http://gerrit.cloudera.org:8080/5240
Reviewed-by: Todd Lipcon <t...@apache.org>
Tested-by: David Ribeiro Alves <dral...@apache.org>
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java
M src/kudu/client/client-test-util.cc
M src/kudu/client/client-test.cc
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/consensus/raft_consensus_state.h
M src/kudu/consensus/time_manager.cc
M src/kudu/consensus/time_manager.h
M src/kudu/integration-tests/alter_table-test.cc
M src/kudu/integration-tests/cluster_verifier.cc
M src/kudu/integration-tests/linked_list-test-util.h
M src/kudu/integration-tests/linked_list-test.cc
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/tablet_history_gc-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/mvcc-test.cc
M src/kudu/tablet/mvcc.cc
M src/kudu/tablet/mvcc.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/tablet_peer.h
M src/kudu/tablet/transactions/alter_schema_transaction.cc
M src/kudu/tablet/transactions/transaction_driver.cc
M src/kudu/tablet/transactions/transaction_driver.h
M src/kudu/tablet/transactions/transaction_tracker-test.cc
M src/kudu/tablet/transactions/transaction_tracker.cc
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tablet/transactions/write_transaction.h
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
41 files changed, 591 insertions(+), 274 deletions(-)

Approvals:
  David Ribeiro Alves: Verified
  Todd Lipcon: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e
Gerrit-PatchSet: 35
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to