David Ribeiro Alves has posted comments on this change.

Change subject: Add integration tests for duplicate keys
......................................................................


Patch Set 7:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/5349/7/src/kudu/integration-tests/cluster_verifier.cc
File src/kudu/integration-tests/cluster_verifier.cc:

PS7, Line 126:   
CHECK_OK(scanner.SetReadMode(client::KuduScanner::READ_AT_SNAPSHOT));
             :   CHECK_OK(scanner.SetFaultTolerant());
             :   CHECK_OK(scanner.SetTimeoutMillis(5000));
             :   CHECK_OK(scanner.SetProjectedColumns(vector<string>()));
> Is it so crucial to abort if something goes non-OK among those?  Looking at
We were already using this pattern. We RETURN_NOT_OK() on things that may fail 
in the test or the things we're testing and we CHECK_OK() on the invariants. 
You should RETURN_NOT_OK() when there is a chance that the caller can do 
something about it (e.g. as in this case in CheckRowCount vs 
CheckRowCountWithRetries) otherwise ASSERT_OK/CHECK_OK is better. We can't use 
ASSERT here, so CHECK will have to do


PS7, Line 129: vector<string>()
> nit: consider using '{}' instead of 'vector<string>()' for brevity.
Done


http://gerrit.cloudera.org:8080/#/c/5349/7/src/kudu/integration-tests/raft_consensus-itest.cc
File src/kudu/integration-tests/raft_consensus-itest.cc:

PS7, Line 359:   static const bool WITH_NOTIFICATION_LATENCY = true;
             :   static const bool WITHOUT_NOTIFICATION_LATENCY = false;
> Are these needed in the new code?
Done


PS7, Line 870: kCrashesToCause
> nit: since it's not static or global, consider renaming into 'crashes_to_ca
Done


PS7, Line 907: CHECK_OK
> nit: ASSERT_OK() ?
Done


PS7, Line 947: 3)
> Is batch of size 3 is enough to get higher chance of collision?  I wonder w
were only inserting 300 rows so this would be finished super fast. IMO 3 is a 
good tradeoff between speed and coverage. There are only 20 dup keys. Note that 
even with 1 there is a chance of collision.


PS7, Line 954: master_flags
> It seems this stays empty, consider removing this variable and using '{}' f
Done


PS7, Line 959:   ts_flags.push_back("--raft_heartbeat_interval_ms=5");
             : #else
             :   ts_flags.push_back("--raft_heartbeat_interval_ms=1");
             : #endif
             :   ts_flags.push_back("--leader_failure_monitor_check_mean_ms=1");
             :   
ts_flags.push_back("--leader_failure_monitor_check_stddev_ms=1");
             :   ts_flags.push_back("--never_fsync");
> nit: consider replacing the construction of the ts_flags via push_back() wi
this was not changed in the patch. would rather not stray too much into random 
backlog fixes. Besides even though I do prefer initialized lists, we'de have to 
have an ifdef in the middle (or resort to push_back again) which I think is 
worse


PS7, Line 968: master_flags
> As described above: replace with {}.
Done


PS7, Line 1009: TEST_F(RaftConsensusITest, TestChurnyElections) {
              :   const int kNumWrites = AllowSlowTests() ? 10000 : 1000;
              :   CreateClusterForChurnyElectionsTests({});
              :   TestWorkload workload(cluster_.get());
              :   workload.set_write_batch_size(1);
              :   DoTestChurnyElections(&workload, kNumWrites);
              : }
              : 
              : // The same test, except inject artificial latency when 
propagating notifications
              : // from the queue back to consensus. This previously reproduced 
bugs like KUDU-1078 which
              : // normally only appear under high load.
              : TEST_F(RaftConsensusITest, 
TestChurnyElections_WithNotificationLatency) {
              :   
CreateClusterForChurnyElectionsTests({"--consensus_inject_latency_ms_in_notifications=50"});
              :   const int kNumWrites = AllowSlowTests() ? 10000 : 1000;
              :   TestWorkload workload(cluster_.get());
              :   workload.set_write_batch_size(1);
              :   DoTestChurnyElections(&workload, kNumWrites);
              : }
              : 
              : // The same as TestChurnyElections except insert many 
duplicated rows.
              : // This emulates cases where there are many duplicate keys 
which, due to two phase
              : // locking, may cause deadlocks and other anomalies that cannot 
be observed when
              : // keys are unique.
              : TEST_F(RaftConsensusITest, 
TestChurnyElections_WithDuplicateKeys) {
              :   CreateClusterForChurnyElectionsTests({});
              :   const int kNumWrites = AllowSlowTests() ? 10000 : 1000;
              :   TestWorkload workload(cluster_.get());
              :   
workload.set_write_pattern(TestWorkload::INSERT_WITH_MANY_DUP_KEYS);
              :   // Increase the number of rows per batch to get a higher 
chance of key collision.
              :   workload.set_write_batch_size(3);
              :   DoTestChurnyElections(&workload, kNumWrites);
> These look like candidates for extracting a common function and then re-usi
that's what this patch does :) we need common code to create the cluster, then 
setup a custom workload then common code to run it. all common workload params 
are set on the common methods
I also like that we can tweak the number of writes and workload params per test 
to be able to address flakyness individually


-- 
To view, visit http://gerrit.cloudera.org:8080/5349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d018281d412ae034bd7b70c8311077a52b2795d
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to