Andrew Wong has posted comments on this change.

Change subject: disk failure: don't open tablets on failed disks
......................................................................


Patch Set 3:

(25 comments)

This patch is dependent on the disk state patch; currently working out the 
status of that first and then will push these changes.

http://gerrit.cloudera.org:8080/#/c/7766/3//COMMIT_MSG
Commit Message:

PS3, Line 21: Testing is done by loading data into a cluster with multi-disk
            : servers, failing a single directory of one of the servers, and 
ensuring
            : that the tablets spread across the failed disk get replicated 
upon the
            : next startup.
> how about: Testing is done by loading data into a cluster configured to use
Done


http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

Line 986:   if (PREDICT_FALSE(!failed_dirs.empty())) {
> Why do we need this condition? Won't the for loop no-op if failed_dirs is e
Done


http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/file_block_manager.cc
File src/kudu/fs/file_block_manager.cc:

PS3, Line 581: // Don't report failed directories.
> shoudln't we _report_ failed directories? ok if you want to do it in a foll
Right, the intent here was to not report a failed directory as healthy. I agree 
it should somehow show itself in the FsReport. For another patch though, I 
think.


PS3, Line 581: // Don't report failed directories.
> There's some confusion about this, I think. The FsReport is intended for th
Thanks for explaining. I'll leave this for now and leave any additional 
reporting out of scope.


PS3, Line 693: VLOG(3) << Substitute("Block $0 is in a failed directory; not 
deleting",
             :                             block_id.ToString());
> how about using LOG_EVERY_N at INFO level, that way we'd get something in t
Done


http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

Line 1694:   // Ignore attempts to delete blocks in failed directories.
> Shouldn't this precede the RemoveLogBlockUnlocked? This is deleting the blo
Done


PS3, Line 1700:       VLOG(3) << Substitute("Block $0 is in a failed directory; 
not deleting",
              :                             block_id.ToString());
> same
It's unique to this patch in that we'll only ever try to delete a block in a 
failed disk. This snippet here has the property that we won't even try to write 
to the failed directory, and in that sense, disk failures on DeleteBlock() are 
non-fatal.


Line 1702:       return Status::OK();
> Agreed with Mike: if return an error, don't callers handle it appropriately
Done


Line 1702:       return Status::OK();
> I'm not sure why we are returning OK here. Also, new API semantics should b
Yeah, you make a good point. Failing to delete blocks is non-fatal anyway, so 
we should be able to just return a disk error here.


Line 1702:       return Status::OK();
> also: if these are the semantics we want (not sure), shouldn't we have a un
You're right, I've changed the semantics to return an error. Added a test too.


http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/integration-tests/disk_failure-itest.cc
File src/kudu/integration-tests/disk_failure-itest.cc:

PS3, Line 43: TabletServerIntegrationTestBase
> Would you mind inheriting from ExternalMiniClusterITestBase instead in this
Ah, it's all the rage, huh. Done


PS3, Line 56:     if (GetParam() == "file") {
            :       flags.emplace_back("--block_manager=file");
            :     } else {
            :       flags.emplace_back("--block_manager=log");
            :     }
> maybe: flags.emplace_back(Substitute("--block_manager=$0", GetParam()));
Done


PS3, Line 65:   void SetupDefaultCluster(vector<ExternalTabletServer*>* 
ext_tservers) {
            :     FLAGS_num_tablet_servers = 3;
            :     vector<string> flags = GetClusterFlags();
            :     NO_FATALS(CreateCluster("survivable_cluster", flags, {}, /* 
num_data_dirs */ 2));
            :     NO_FATALS(CreateClient(&client_));
            :     CHECK_EQ(FLAGS_num_tablet_servers, tablet_servers_.size());
            :     ext_tservers->clear();
            :     for (const auto& e : tablet_servers_) {
            :       
ext_tservers->push_back(cluster_->tablet_server_by_uuid(e.second->uuid()));
            :     }
            :   }
> can you add more disks? 2 is kind of special-casy because of tablet overlap
Yep, and that's not really necessary in this case because we're spreading 
across all directories.


PS3, Line 96: server
> a tablet server
Done


PS3, Line 97: server
> tablet server
Done


PS3, Line 98: .
> while it is shut down.
Done


Line 109:   write_workload.Setup();
> This creates a table. Why are you creating the table yourself above?
Good point, removed.


Line 110:   write_workload.Start();
> You should call workload.stopAndJoin() at some point during the test to shu
Done


PS3, Line 114: WaitForTSAndReplicas
> what is the purpose of calling this function?
Wanted to ensure that the tablets were visible ready, but I think 
WaitForTabletsRunning() should take care of that.


PS3, Line 124:   NO_FATALS(SetServerSurvivalFlags(ext_tservers));
> why is this not set on boot?
Ah, good point. This gets set below for the relevant server.


PS3, Line 124:   NO_FATALS(SetServerSurvivalFlags(ext_tservers));
> agree
Done


PS3, Line 129: Substitute("--block_manager=$0", GetParam()
> why are you re-settting the block manager? can it change?
It will use the default block_manager flag otherwise. This is needed so a 
server created with "file" will be opened again with "file".


PS3, Line 138:   
ASSERT_OK(cluster_->WaitForTabletsRunning(cluster_->tablet_server(0),
             :                                             kNumTablets, 
kAgreementTimeout));
> so remind me again what happened to the failed tablets? were they replicate
Yes, the tablets should be FAILED, evicted, and reassigned to the same server 
(since there are only three servers).


PS3, Line 137:   // Ensure again that the tablets get to a running, consistent 
state.
             :   
ASSERT_OK(cluster_->WaitForTabletsRunning(cluster_->tablet_server(0),
             :                                             kNumTablets, 
kAgreementTimeout));
             :   ASSERT_OK(WaitForServersToAgree(kAgreementTimeout, 
tablet_servers_, tablet_id, 1));
> not sure I get this. you inject a failure, wait for it to happen and then m
Done. I think it's sufficient to just ensure the tablets are running. Not 
expecting weird consensus issues here.


http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

Line 765:     LOG(ERROR) << "Exiting bootstrapping early; tablet is in a failed 
directory";
> how about: LOG(ERROR) << LogPrefix(tablet_id) << "aborting tablet bootstrap
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7766
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to