Andrew Wong has posted comments on this change. Change subject: disk failure: don't open tablets on failed disks ......................................................................
Patch Set 3: (25 comments) This patch is dependent on the disk state patch; currently working out the status of that first and then will push these changes. http://gerrit.cloudera.org:8080/#/c/7766/3//COMMIT_MSG Commit Message: PS3, Line 21: Testing is done by loading data into a cluster with multi-disk : servers, failing a single directory of one of the servers, and ensuring : that the tablets spread across the failed disk get replicated upon the : next startup. > how about: Testing is done by loading data into a cluster configured to use Done http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: Line 986: if (PREDICT_FALSE(!failed_dirs.empty())) { > Why do we need this condition? Won't the for loop no-op if failed_dirs is e Done http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: PS3, Line 581: // Don't report failed directories. > shoudln't we _report_ failed directories? ok if you want to do it in a foll Right, the intent here was to not report a failed directory as healthy. I agree it should somehow show itself in the FsReport. For another patch though, I think. PS3, Line 581: // Don't report failed directories. > There's some confusion about this, I think. The FsReport is intended for th Thanks for explaining. I'll leave this for now and leave any additional reporting out of scope. PS3, Line 693: VLOG(3) << Substitute("Block $0 is in a failed directory; not deleting", : block_id.ToString()); > how about using LOG_EVERY_N at INFO level, that way we'd get something in t Done http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1694: // Ignore attempts to delete blocks in failed directories. > Shouldn't this precede the RemoveLogBlockUnlocked? This is deleting the blo Done PS3, Line 1700: VLOG(3) << Substitute("Block $0 is in a failed directory; not deleting", : block_id.ToString()); > same It's unique to this patch in that we'll only ever try to delete a block in a failed disk. This snippet here has the property that we won't even try to write to the failed directory, and in that sense, disk failures on DeleteBlock() are non-fatal. Line 1702: return Status::OK(); > Agreed with Mike: if return an error, don't callers handle it appropriately Done Line 1702: return Status::OK(); > I'm not sure why we are returning OK here. Also, new API semantics should b Yeah, you make a good point. Failing to delete blocks is non-fatal anyway, so we should be able to just return a disk error here. Line 1702: return Status::OK(); > also: if these are the semantics we want (not sure), shouldn't we have a un You're right, I've changed the semantics to return an error. Added a test too. http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/integration-tests/disk_failure-itest.cc File src/kudu/integration-tests/disk_failure-itest.cc: PS3, Line 43: TabletServerIntegrationTestBase > Would you mind inheriting from ExternalMiniClusterITestBase instead in this Ah, it's all the rage, huh. Done PS3, Line 56: if (GetParam() == "file") { : flags.emplace_back("--block_manager=file"); : } else { : flags.emplace_back("--block_manager=log"); : } > maybe: flags.emplace_back(Substitute("--block_manager=$0", GetParam())); Done PS3, Line 65: void SetupDefaultCluster(vector<ExternalTabletServer*>* ext_tservers) { : FLAGS_num_tablet_servers = 3; : vector<string> flags = GetClusterFlags(); : NO_FATALS(CreateCluster("survivable_cluster", flags, {}, /* num_data_dirs */ 2)); : NO_FATALS(CreateClient(&client_)); : CHECK_EQ(FLAGS_num_tablet_servers, tablet_servers_.size()); : ext_tservers->clear(); : for (const auto& e : tablet_servers_) { : ext_tservers->push_back(cluster_->tablet_server_by_uuid(e.second->uuid())); : } : } > can you add more disks? 2 is kind of special-casy because of tablet overlap Yep, and that's not really necessary in this case because we're spreading across all directories. PS3, Line 96: server > a tablet server Done PS3, Line 97: server > tablet server Done PS3, Line 98: . > while it is shut down. Done Line 109: write_workload.Setup(); > This creates a table. Why are you creating the table yourself above? Good point, removed. Line 110: write_workload.Start(); > You should call workload.stopAndJoin() at some point during the test to shu Done PS3, Line 114: WaitForTSAndReplicas > what is the purpose of calling this function? Wanted to ensure that the tablets were visible ready, but I think WaitForTabletsRunning() should take care of that. PS3, Line 124: NO_FATALS(SetServerSurvivalFlags(ext_tservers)); > why is this not set on boot? Ah, good point. This gets set below for the relevant server. PS3, Line 124: NO_FATALS(SetServerSurvivalFlags(ext_tservers)); > agree Done PS3, Line 129: Substitute("--block_manager=$0", GetParam() > why are you re-settting the block manager? can it change? It will use the default block_manager flag otherwise. This is needed so a server created with "file" will be opened again with "file". PS3, Line 138: ASSERT_OK(cluster_->WaitForTabletsRunning(cluster_->tablet_server(0), : kNumTablets, kAgreementTimeout)); > so remind me again what happened to the failed tablets? were they replicate Yes, the tablets should be FAILED, evicted, and reassigned to the same server (since there are only three servers). PS3, Line 137: // Ensure again that the tablets get to a running, consistent state. : ASSERT_OK(cluster_->WaitForTabletsRunning(cluster_->tablet_server(0), : kNumTablets, kAgreementTimeout)); : ASSERT_OK(WaitForServersToAgree(kAgreementTimeout, tablet_servers_, tablet_id, 1)); > not sure I get this. you inject a failure, wait for it to happen and then m Done. I think it's sufficient to just ensure the tablets are running. Not expecting weird consensus issues here. http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 765: LOG(ERROR) << "Exiting bootstrapping early; tablet is in a failed directory"; > how about: LOG(ERROR) << LogPrefix(tablet_id) << "aborting tablet bootstrap Done -- To view, visit http://gerrit.cloudera.org:8080/7766 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes