Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19445 )

Change subject: KUDU-1945 Update auto incrementing counter during bootstrap
......................................................................


Patch Set 6:

(36 comments)

http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG@21
PS6, Line 21: the
this?


http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG@23
PS6, Line 23: in memory
nit: in-memory


http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG@35
PS6, Line 35:
I guess there is also a case when those non-commited operations are DELETEs.

I guess reading those ops to retrieve highest available value for the 
auto-incrementing column happens after reconciliation of Raft logs, so all the 
replicas has common history in the result WAL.


http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG@38
PS6, Line 38: Columns - A Non Unique Primary Key column of type INT32
What if we have just one replica that lags behind two others, where the missing 
operations are DELETEs?

Say, what about the following scenario:

1.  All three tablet replicas are up and running.
2.  5 INSERTS arrive.
3.  5 DELETES arrive (so, the tablet has no rows).
4.  Tablet server TS2 is shut down.
5.  The tablet server TS2 is started back, and after tablet bootstratpping it 
hosts a follower replica, not a leader replica.
6.  5 more INSERTS arrive.

Will the replica at TS2 have the proper value for the auto-incrementing column? 
 What if re-election happens and the replica at TS2 becomes a new leader when 
new INSERTs arrive?

Thanks!


http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG@49
PS6, Line 49: Change-Id: I61b305efd7d5a065a2976327567163956c0c2184
nit: keep an empty line between the commit description and 'Change-Id' line for 
better readbility


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc
File src/kudu/integration-tests/auto_incrementing-itest.cc:

http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@71
PS6, Line 71: using kudu::tablet::TabletReplica;
> warning: using decl 'TabletReplica' is unused [misc-unused-using-decls]
consider remove this 'using'?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@132
PS6, Line 132: void DeleteRow
Why not to make this method returning Status and use RETURN_NOT_OK() instead of 
CHECK_OK()?  If necessary, you could wrap this DeleteRow() into CHECK_OK() at 
the call site.


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@161
PS6, Line 161:     // Stringify Rows from the response.
Is it possible to re-use  TabletServerTestBase::StringifyRowsFromResponse()?  
You could make it static and try to re-use here?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@168
PS6, Line 168:
nit: fix the indent?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@172
PS6, Line 172:                
nit: fix the indent?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@206
PS6, Line 206: DCHECK_EQ
ASSERT_EQ?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@219
PS6, Line 219:   
opts.extra_tserver_flags.emplace_back("--log_segment_size_bytes_for_tests=1024");
             :   
opts.extra_tserver_flags.emplace_back("--log_max_segments_to_retain=2");
             :   
opts.extra_tserver_flags.emplace_back("--maintenance_manager_polling_interval_ms=10");
             :   
opts.extra_tserver_flags.emplace_back("--maintenance_manager_num_threads=4");
             :   
opts.extra_tserver_flags.emplace_back("--flush_threshold_secs=1");
             :   
opts.extra_tserver_flags.emplace_back("--log_compression_codec=no_compression");
nit: use initializers list for opts.extra_tserver_flags instead?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@238
PS6, Line 238: cluster_->tserver_proxy(0)->ListTablets(req, &resp, &rpc);
Wrap this into ASSERT_OK()?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@239
PS6, Line 239: DCHECK_EQ(1, resp.status_and_schema_size());
Why not to use ASSERT_EQ() here instead?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@240
PS6, Line 240: string tablet_uuid
nit: could be 'const string& tablet_uuid'?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@245
PS6, Line 245: +
nit: separate '+' from the arguments with spaces


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@267
PS6, Line 267: cluster_->Restart()
Wrap Restart() into ASSERT_OK()?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@284
PS6, Line 284: TEST_F(AutoIncrementingItest, BootstrapNoWalsNoData) {
Is it possible to separate the majority of the code in this 
BootstrapNoWalsNoData and prior BootstrapWithNoWals scenario into a 
method/function to avoid duplicating the code?

Alternatively, you could consider adding a new parameterized test based on 
AutoIncrementingItest.


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@287
PS6, Line 287:   
opts.extra_tserver_flags.emplace_back("--log_segment_size_bytes_for_tests=1024");
             :   
opts.extra_tserver_flags.emplace_back("--log_max_segments_to_retain=2");
             :   
opts.extra_tserver_flags.emplace_back("--maintenance_manager_polling_interval_ms=10");
             :   
opts.extra_tserver_flags.emplace_back("--maintenance_manager_num_threads=4");
             :   
opts.extra_tserver_flags.emplace_back("--flush_threshold_secs=1");
             :   
opts.extra_tserver_flags.emplace_back("--log_compression_codec=no_compression");
nit: use initializers list for opts.extra_tserver_flags instead?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@306
PS6, Line 306: cluster_->tserver_proxy(0)->ListTablets(req, &resp, &rpc);
Wrap this into ASSERT_OK()?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@307
PS6, Line 307: DCHECK_EQ
Why not to use ASSERT_EQ() instead?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@334
PS6, Line 334: cluster_->Restart();
Wrap this into ASSERT_OK()?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@380
PS6, Line 380: raft
nit: Raft


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@381
PS6, Line 381: InsertData
Wrap into CHECK_OK()?  Alternatively, store the status per thread and check for 
failures upon threads' completion.


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@381
PS6, Line 381:  5);
Why to sleep between each row, at all?  Would be great to comment on the 
reasoning behind.


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@383
PS6, Line 383:   // Ensure at least a couple of rows are attempted to be
             :   // written before shutting down.
             :   SleepFor(MonoDelta::FromMilliseconds(10));
             :   // Shutdown the tablet servers hosting followers while the 
data is still being written.
             :   for (int i = 0; i < kNumTabletServers; i++) {
             :     if (i != leader_ts_index) {
             :       cluster_->tablet_server(i)->Shutdown();
             :     }
             :   }
Would it be easier to run these in their own threads, but run the writer in the 
main test thread?  Shutdown() doesn't return a status, so need to check out on 
the results at least given the signature of the method?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@394
PS6, Line 394: leader replica
nit: the leader replica


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@394
PS6, Line 394:   // At this point tablet server hosting leader replica would be 
having a replicate op without
             :   // a corresponding COMMIT message
Is there a way to ensure that's exactly the case?  Maybe, use smallest possible 
WAL segments and count their number at different tablet servers?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@400
PS6, Line 400: CHECK_OK
Why not to use ASSERT_OK() here?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@405
PS6, Line 405: ASSERT_OK(InsertData(kNumRows, kNumRows * 2));
Is there a way to check for


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@410
PS6, Line 410: CHECK_OK
Why not to use ASSERT_OK() here?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@411
PS6, Line 411: leader_ts_index + kNumTabletServers + 1
Could this part be shorten to leader_ts_index + 1?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@421
PS6, Line 421:     ASSERT_EQ(results[i].size(), results[i+1].size());
             :     for (int k = 0; k < results[i].size(); k++) {
             :       ASSERT_EQ(results[i][k], results[i+1][k]);
             :     }
Does it make sense to verify this across all the tablet replicas, not just the 
first two?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/tablet/tablet.cc@437
PS6, Line 437:     if (schema()->has_auto_incrementing()) {
Consider moving this code out into a separate method.


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/tablet/tablet.cc@442
PS6, Line 442: TODO(achennaka): Materialize only
That's good point!  Is it possible to do so in this patch or you are planning 
to post a follow-up one?


http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/tablet/tablet.cc@466
PS6, Line 466:     LOG_WITH_PREFIX(INFO) << "Fetching the auto incrementing 
counter took "
             :                           << delta.ToMilliseconds() << "ms";
It's better to use Stopwatch(Stopwatch::THIS_THREAD)  for this purpose -- it 
provides more information on the system, user, and real time.



--
To view, visit http://gerrit.cloudera.org:8080/19445
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61b305efd7d5a065a2976327567163956c0c2184
Gerrit-Change-Number: 19445
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Tue, 28 Mar 2023 01:45:23 +0000
Gerrit-HasComments: Yes

Reply via email to