Adar Dembo has posted comments on this change.

Change subject: disk failure: add persistent disk states
......................................................................


Patch Set 6:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/7270/6/src/kudu/fs/block_manager_util-test.cc
File src/kudu/fs/block_manager_util-test.cc:

PS6, Line 65:     ASSERT_EQ(kUuid, path_set.all_paths(0).uuid());
            :     ASSERT_EQ(PathHealthStatePB::HEALTHY, 
path_set.all_paths(0).health_state());
            :     ASSERT_EQ(kPath, path_set.all_paths(0).path());
Nit: maybe extract a "const PathPB& path" so you're not repeating 
path_set.all_paths(0) over and over?


http://gerrit.cloudera.org:8080/#/c/7270/6/src/kudu/fs/block_manager_util.cc
File src/kudu/fs/block_manager_util.cc:

Line 53:       healthy_instance_ = false; \
Maybe we should store the Status that made us unhealthy instead? Then we could 
return that in various places where we need to create a bad Status to reflect 
!healthy_instance().


Line 88:   // Set up the path set.
Should we still populate all_uuids, to support downgrades? Or does that not 
work?


Line 91:   new_path_set->set_timestamp_us(1);
Shouldn't this be a value of now?


PS6, Line 97:     new_path_set->add_all_paths();
            :     *new_path_set->mutable_all_paths(i) = new_path;
Can you combine:

  new_path_set->add_all_paths(std::move(new_path))

Or some such?


Line 149: Status PathInstanceMetadataFile::Lock() {
Do we expect disk failure errors in LockFile or UnlockFile? Should we use 
RETURN_NOT_OK_FAIL_INSTANCE_PREPEND there too?


http://gerrit.cloudera.org:8080/#/c/7270/6/src/kudu/fs/block_manager_util.h
File src/kudu/fs/block_manager_util.h:

PS6, Line 109: PathInstanceMetadataPB*
Pass by cref?


Line 116:   bool healthy_instance_;
Nit: how about just healthy_ (and healthy() for the accessor)?


http://gerrit.cloudera.org:8080/#/c/7270/6/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

Line 212:   if (instance()->healthy_instance()) {
A silent no-op if the instance isn't healthy seems error-prone.  Can we make 
this crash instead?


Line 256:           metadata_file_->SetInstanceFailed();
This is weird; it's the DataDir that failed, not the metadata file. Why not 
just call DataDirManager::MarkDataDirFailed()?


Line 359:   uuid_by_path_.swap(uuid_by_path);
Why do this before SyncDir()? Shouldn't it be safer to do after we're 
guaranteed there can be no failures?


PS6, Line 406:           if (!new_status.IsDiskFailure()) {
             :             return new_status;
             :           }
Oh, so I think I get this.

1. For PathInstanceMetadataFile::Lock() and Unlock(), the disk failure is 
externalized (i.e. callers should check the returned status for disk failure 
and SetInstanceFailed() if it's detected.
2. For PathInstanceMetadataFile::Create() and LoadFromDisk(), the disk failure 
is internalized: those functions will SetInstanceFailed() themselves and return 
no error.

This inconsistency is confusing; is there a particular reason for it?


Line 419:     if (PREDICT_TRUE(instance->healthy_instance())) {
Why not create the pool either way? It's very low overhead and will mean you 
don't need if statements around the existence of the pool.


Line 450:       dd->mutable_instance()->SetInstanceFailed();
Here's another case where the PathInstanceMetadataFile disk failure is detected 
and effected externally. I think we should treat this consistently: either 
PathInstanceMetadataFile healthy is totally internal and not set by outside 
callers, or totally external and PathInstanceMetadataFile is "dumber".


PS6, Line 460: &path_set_
Let's not make changes to member state until we can't fail anymore.


PS6, Line 472:       return Status::IOError(Substitute("Need to write $0 new 
instance files but filesystem opened "
             :                                         "in read-only mode", 
updated_indices.size()));
Why is this fatal? If we're in read-only mode, can't we move forward despite 
this inconsistency?


PS6, Line 476: ;
Nit: extra semicolon


Line 487:       dd->WaitOnClosures();
This should be safe regardless of dd health.


Line 677:     Status s = 
candidate->RefreshIsFull(DataDir::RefreshMode::EXPIRED_ONLY);
Likewise.


Line 724:     Status s = e.second->RefreshIsFull(DataDir::RefreshMode::ALWAYS);
Why aren't we returning RefreshIsFull errors anymore?


Line 771:   CHECK(AccessMode::READ_WRITE == mode_) << "Cannot handle disk 
failures; filesystem is "
Use CHECK_EQ.


Line 796: void DataDirManager::WritePathHealthStatesUnlocked() {
Should DCHECK on the lock that needs to be held.

Also, this function does IO, so we shouldn't hold any spinlocks during 
UpdateOnDisk().


http://gerrit.cloudera.org:8080/#/c/7270/6/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

PS6, Line 339: dirty_idx_list
you mean 'instances'?


PS6, Line 343:   // This only affects the DataDirManager state, so it shouldn't 
be used if
             :   // state out of its jurisdiction (e.g. tablets) needs to be 
updated.
Not exactly sure what guidance this provides.


Line 352:   const AccessMode mode_;
Nit: group with the other const members (i.e. remove the blank line separating 
them).


http://gerrit.cloudera.org:8080/#/c/7270/6/src/kudu/fs/fs.proto
File src/kudu/fs/fs.proto:

Line 70:   repeated bytes all_uuids = 2; // Deprecated field used before Kudu 
1.5.0.
Another convention is to rename deprecated proto fields by prepending 
"DEPRECATED_" to them. It's OK to rename fields; it's only their type and order 
that matter for backwards compatibility.


PS6, Line 75: There is no PathSetPB with a UUID outside the set of UUIDs in
            :   //    'all_paths' of the PathSetPB with the highest 
'timestamp_us'
This is a pretty dense sentence and is difficult to understand. Reword?


http://gerrit.cloudera.org:8080/#/c/7270/6/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

Line 1493:       continue;
Won't this lead to a crash when we look up this dd's limit in 
block_limits_by_data_dir_ (see LogBlockContainer's constructor)?


Line 1538:       continue;
This won't increment 'i'; are we missing some test coverage here? Seems like 
that should have been caught by something.


-- 
To view, visit http://gerrit.cloudera.org:8080/7270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to