Andrew Wong has posted comments on this change. Change subject: disk failure: tests for disk failure recovery ......................................................................
Patch Set 7: (34 comments) http://gerrit.cloudera.org:8080/#/c/7031/3//COMMIT_MSG Commit Message: PS3, Line 16: exercises > typo Done http://gerrit.cloudera.org:8080/#/c/7031/3/src/kudu/integration-tests/disk_failure-itest.cc File src/kudu/integration-tests/disk_failure-itest.cc: Line 26: #include "kudu/gutil/map-util.h" > warning: #includes are not sorted properly [llvm-include-order] Done Line 38: > warning: using decl 'map' is unused [misc-unused-using-decls] Done PS3, Line 61: > nit: indent These have been removed to use TestWorkload instead PS3, Line 62: > nit: using... same Line 81: NO_FATALS(CreateTable(table_id)); > warning: redundant 'test_info_' declaration [readability-redundant-declarat seems harmless? PS3, Line 82: WaitForTSAndReplicas(table_id); : } : } : : // Sets up a cluster with three servers with two disks each. : vector<ExternalTabletServer*> SetupDefaultCluster() { : FLAGS_num_tablet_servers = 3; : CreateCluster("survivable_cluster", kDefaultFlags, {}, /* num_data_dirs */ 2); : CreateClient(&client_); : vector<TServerDetails*> tservers; : AppendValuesFromMap(tablet_servers_, &tservers); : CHECK_EQ(3, tservers.size()); : vector<ExternalTabletServer*> ext_tservers; : for (auto* details : tservers) { : ext_tservers.push_back(cluster_->tablet_server_by_uuid(details->uuid())); : } : return ext_tservers; > nit move this comment to before the test declaration, or at least give an i Done PS3, Line 150: if (counts_after - counts_before > 0) { : dirs_written->push_back(e.first); : } else { : dirs_not_written->push_back(e.first); : } : } : } > this is a bit funky (depending on the number of files) any way we can be mo Done. I've changed it to count the files before and compare it with the counts after running whatever function. PS3, Line 214: d on tw > tablet servers Done http://gerrit.cloudera.org:8080/#/c/7031/6/src/kudu/integration-tests/disk_failure-itest.cc File src/kudu/integration-tests/disk_failure-itest.cc: Line 28: #include "kudu/integration-tests/test_workload.h" > warning: #includes are not sorted properly [llvm-include-order] Done Line 66: string GlobForBlockFileInDataDir(const string& data_dir) { > warning: invalid case style for global constant 'ts_flags' [readability-ide Done Line 101: void SetServerSurvivalFlags(vector<ExternalTabletServer*>& ext_tservers) { > warning: non-const reference parameter 'ext_tservers', make it const or use Done Line 126: const std::function<void(void)>& f, > warning: the parameter 'f' is copied for each invocation but only used as a Done Line 165: void GetDataDirsWrittenToByWorkload(const vector<ExternalTabletServer*> ext_tservers, > warning: the const qualified parameter 'ext_tservers' is copied for each in Done Line 166: TestWorkload* workload, > warning: non-const reference parameter 'workload', make it const or use a p Done http://gerrit.cloudera.org:8080/#/c/7031/7/src/kudu/integration-tests/disk_failure-itest.cc File src/kudu/integration-tests/disk_failure-itest.cc: Line 30: #include "kudu/util/path_util.h" > warning: #includes are not sorted properly [llvm-include-order] Done PS7, Line 51: uint32_t > Simple int would be fine here, I think. Done Line 67: if (FLAGS_block_manager == "file") { > Would it be useful to parameterize these tests and run them on both block m Done PS7, Line 68: ** > Isn't this middle entry also '??'? I thought block paths were data/ab/cd/ab Done Line 91: vector<TServerDetails*> tservers; > Maybe omit this and iterate on the tablet_servers_ map directly? Done Line 101: void SetServerSurvivalFlags(vector<ExternalTabletServer*>& ext_tservers) { > warning: non-const reference parameter 'ext_tservers', make it const or use Done Line 150: if (counts_after - counts_before > 0) { > Would if counts_after > counts_before be simpler? Done Line 165: void GetDataDirsWrittenToByWorkload(const vector<ExternalTabletServer*> ext_tservers, > warning: the const qualified parameter 'ext_tservers' is copied for each in Done Line 176: workload->StopAndJoin(); > IIUC, we're writing just enough to figure out who is writing where, and the Yes, exactly Line 184: ext_tserver->GetInt64Metric(&METRIC_ENTITY_server, nullptr, &METRIC_data_dirs_failed, > Need ASSERT_OK here too? Done Line 200: ASSERT_OK(row->SetInt32(0, i + start_row)); > Got some tabs here. Done Line 204: ignore_result(session->Flush()); > Flushing one row at a time? You sure you don't want to flush the whole thin Done Line 229: vector<ExternalTabletServer*> ext_tservers = SetupDefaultCluster(); > Need NO_FATALS here? Done in SetupDefaultCluster() Line 233: workload_a.set_write_pattern(TestWorkload::WritePattern::INSERT_SEQUENTIAL_ROWS); > Why is sequential important? Applying sequentially means there are only MRS flushes. PS7, Line 238: this avoid > this to avoid Done Line 243: GetDataDirsWrittenToByWorkload(ext_tservers, &workload_a, 3, &dirs_with_A_data, > Need NO_FATALS on these calls, right? Done PS7, Line 256: MonoDelta::FromSeconds(120) > Maybe define this delta once and reuse it? Done Line 259: SetServerSurvivalFlags(ext_tservers); > NO_FATALS here. Done Line 271: WaitForDiskFailures(ext_tservers[0]); > NO_FATALS on these too. Done -- To view, visit http://gerrit.cloudera.org:8080/7031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8ff63ec71ab718866484b9f3ec7264bc72ecfe97 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes