Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/19445 )
Change subject: KUDU-1945 Update auto incrementing counter during bootstrap ...................................................................... Patch Set 6: (36 comments) http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG@21 PS6, Line 21: the this? http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG@23 PS6, Line 23: in memory nit: in-memory http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG@35 PS6, Line 35: I guess there is also a case when those non-commited operations are DELETEs. I guess reading those ops to retrieve highest available value for the auto-incrementing column happens after reconciliation of Raft logs, so all the replicas has common history in the result WAL. http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG@38 PS6, Line 38: Columns - A Non Unique Primary Key column of type INT32 What if we have just one replica that lags behind two others, where the missing operations are DELETEs? Say, what about the following scenario: 1. All three tablet replicas are up and running. 2. 5 INSERTS arrive. 3. 5 DELETES arrive (so, the tablet has no rows). 4. Tablet server TS2 is shut down. 5. The tablet server TS2 is started back, and after tablet bootstratpping it hosts a follower replica, not a leader replica. 6. 5 more INSERTS arrive. Will the replica at TS2 have the proper value for the auto-incrementing column? What if re-election happens and the replica at TS2 becomes a new leader when new INSERTs arrive? Thanks! http://gerrit.cloudera.org:8080/#/c/19445/6//COMMIT_MSG@49 PS6, Line 49: Change-Id: I61b305efd7d5a065a2976327567163956c0c2184 nit: keep an empty line between the commit description and 'Change-Id' line for better readbility http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc File src/kudu/integration-tests/auto_incrementing-itest.cc: http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@71 PS6, Line 71: using kudu::tablet::TabletReplica; > warning: using decl 'TabletReplica' is unused [misc-unused-using-decls] consider remove this 'using'? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@132 PS6, Line 132: void DeleteRow Why not to make this method returning Status and use RETURN_NOT_OK() instead of CHECK_OK()? If necessary, you could wrap this DeleteRow() into CHECK_OK() at the call site. http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@161 PS6, Line 161: // Stringify Rows from the response. Is it possible to re-use TabletServerTestBase::StringifyRowsFromResponse()? You could make it static and try to re-use here? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@168 PS6, Line 168: nit: fix the indent? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@172 PS6, Line 172: nit: fix the indent? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@206 PS6, Line 206: DCHECK_EQ ASSERT_EQ? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@219 PS6, Line 219: opts.extra_tserver_flags.emplace_back("--log_segment_size_bytes_for_tests=1024"); : opts.extra_tserver_flags.emplace_back("--log_max_segments_to_retain=2"); : opts.extra_tserver_flags.emplace_back("--maintenance_manager_polling_interval_ms=10"); : opts.extra_tserver_flags.emplace_back("--maintenance_manager_num_threads=4"); : opts.extra_tserver_flags.emplace_back("--flush_threshold_secs=1"); : opts.extra_tserver_flags.emplace_back("--log_compression_codec=no_compression"); nit: use initializers list for opts.extra_tserver_flags instead? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@238 PS6, Line 238: cluster_->tserver_proxy(0)->ListTablets(req, &resp, &rpc); Wrap this into ASSERT_OK()? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@239 PS6, Line 239: DCHECK_EQ(1, resp.status_and_schema_size()); Why not to use ASSERT_EQ() here instead? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@240 PS6, Line 240: string tablet_uuid nit: could be 'const string& tablet_uuid'? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@245 PS6, Line 245: + nit: separate '+' from the arguments with spaces http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@267 PS6, Line 267: cluster_->Restart() Wrap Restart() into ASSERT_OK()? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@284 PS6, Line 284: TEST_F(AutoIncrementingItest, BootstrapNoWalsNoData) { Is it possible to separate the majority of the code in this BootstrapNoWalsNoData and prior BootstrapWithNoWals scenario into a method/function to avoid duplicating the code? Alternatively, you could consider adding a new parameterized test based on AutoIncrementingItest. http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@287 PS6, Line 287: opts.extra_tserver_flags.emplace_back("--log_segment_size_bytes_for_tests=1024"); : opts.extra_tserver_flags.emplace_back("--log_max_segments_to_retain=2"); : opts.extra_tserver_flags.emplace_back("--maintenance_manager_polling_interval_ms=10"); : opts.extra_tserver_flags.emplace_back("--maintenance_manager_num_threads=4"); : opts.extra_tserver_flags.emplace_back("--flush_threshold_secs=1"); : opts.extra_tserver_flags.emplace_back("--log_compression_codec=no_compression"); nit: use initializers list for opts.extra_tserver_flags instead? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@306 PS6, Line 306: cluster_->tserver_proxy(0)->ListTablets(req, &resp, &rpc); Wrap this into ASSERT_OK()? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@307 PS6, Line 307: DCHECK_EQ Why not to use ASSERT_EQ() instead? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@334 PS6, Line 334: cluster_->Restart(); Wrap this into ASSERT_OK()? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@380 PS6, Line 380: raft nit: Raft http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@381 PS6, Line 381: InsertData Wrap into CHECK_OK()? Alternatively, store the status per thread and check for failures upon threads' completion. http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@381 PS6, Line 381: 5); Why to sleep between each row, at all? Would be great to comment on the reasoning behind. http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@383 PS6, Line 383: // Ensure at least a couple of rows are attempted to be : // written before shutting down. : SleepFor(MonoDelta::FromMilliseconds(10)); : // Shutdown the tablet servers hosting followers while the data is still being written. : for (int i = 0; i < kNumTabletServers; i++) { : if (i != leader_ts_index) { : cluster_->tablet_server(i)->Shutdown(); : } : } Would it be easier to run these in their own threads, but run the writer in the main test thread? Shutdown() doesn't return a status, so need to check out on the results at least given the signature of the method? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@394 PS6, Line 394: leader replica nit: the leader replica http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@394 PS6, Line 394: // At this point tablet server hosting leader replica would be having a replicate op without : // a corresponding COMMIT message Is there a way to ensure that's exactly the case? Maybe, use smallest possible WAL segments and count their number at different tablet servers? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@400 PS6, Line 400: CHECK_OK Why not to use ASSERT_OK() here? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@405 PS6, Line 405: ASSERT_OK(InsertData(kNumRows, kNumRows * 2)); Is there a way to check for http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@410 PS6, Line 410: CHECK_OK Why not to use ASSERT_OK() here? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@411 PS6, Line 411: leader_ts_index + kNumTabletServers + 1 Could this part be shorten to leader_ts_index + 1? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/integration-tests/auto_incrementing-itest.cc@421 PS6, Line 421: ASSERT_EQ(results[i].size(), results[i+1].size()); : for (int k = 0; k < results[i].size(); k++) { : ASSERT_EQ(results[i][k], results[i+1][k]); : } Does it make sense to verify this across all the tablet replicas, not just the first two? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/tablet/tablet.cc@437 PS6, Line 437: if (schema()->has_auto_incrementing()) { Consider moving this code out into a separate method. http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/tablet/tablet.cc@442 PS6, Line 442: TODO(achennaka): Materialize only That's good point! Is it possible to do so in this patch or you are planning to post a follow-up one? http://gerrit.cloudera.org:8080/#/c/19445/6/src/kudu/tablet/tablet.cc@466 PS6, Line 466: LOG_WITH_PREFIX(INFO) << "Fetching the auto incrementing counter took " : << delta.ToMilliseconds() << "ms"; It's better to use Stopwatch(Stopwatch::THIS_THREAD) for this purpose -- it provides more information on the system, user, and real time. -- To view, visit http://gerrit.cloudera.org:8080/19445 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61b305efd7d5a065a2976327567163956c0c2184 Gerrit-Change-Number: 19445 Gerrit-PatchSet: 6 Gerrit-Owner: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Tue, 28 Mar 2023 01:45:23 +0000 Gerrit-HasComments: Yes
