Adar Dembo has posted comments on this change.

Change subject: WIP disk failure: add persistent disk states
......................................................................


Patch Set 2:

(38 comments)

I didn't finish reviewing this, but hopefully these comments will still be 
useful.

http://gerrit.cloudera.org:8080/#/c/7270/2/src/kudu/fs/block_manager_util.h
File src/kudu/fs/block_manager_util.h:

PS2, Line 62:  Must be called while the file is
            :   // unlocked.
This conflates file locking with our unlocked suffix convention for in-memory 
locking. Besides, I don't think you need to worry about removing a POSIX 
advisory lock in order to modify the file.


Line 64:   Status WriteToDiskUnlocked(pb_util::CreateMode create_mode,
Nit: maybe rename to Flush() or FlushToDisk() to emphasize that the contents of 
in-memory state (i.e. metadata_) are synchronized with the disk.


Line 65:                              PathInstanceMetadataPB* pb = nullptr);
Why? Why not just always write out what's in metadata_?

OK, I see now that this function plays double duty:
1. For external callers, create_mode is always OVERWRITE and pb  is always 
nullptr.
2. When called from inside Create(), create_mode is NO_OVERWRITE and pb is not 
nullptr.

So, consider making this private, and providing a public no-arg function that 
chains into it with the right arguments.

Also, since the number of calls to the (now private) function is so low, 
consider not using a default argument value for pb.


http://gerrit.cloudera.org:8080/#/c/7270/2/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

I don't understand the need to store lock_mode_ or to unlock an instance file 
while updating it. These locks are advisory (look for "Advisory Record Locking" 
in man fcntl(2)), so it should be perfectly legal for any process to write to 
the file while it is locked in this way. The only purpose of the lock is to 
allow cooperating Kudu processes to not stomp on one another by checking for 
the lock before doing a destructive operation (i.e. before opening a file in 
read-write mode).


Line 179: #define WARN_NOT_OK_CONTINUE(status_expr, msg) { \
Belongs in status.h, I think.


Line 295:       read_only_(mode == AccessMode::READ_ONLY),
May as well store the AccessMode directly instead of adapting 
FsManager.read_only_ into this enum and then back into a boolean.


Line 299:   path_set_.set_uuid("");
Why bother initializing path_set_ in this way? You need to load them all from 
disk anyway.


Line 347:       LOG(ERROR) << Substitute("Could not create directory $0", p);
If this fails, how can we proceed with CheckHolePunch and metadata.Create?


PS2, Line 363:     if (PREDICT_FALSE(!metadata.valid_instance())) {
             :       idx++;
             :       continue;
             :     }
             :     delete_on_failure.push_front(new ScopedFileDeleter(env_, 
instance_filename));
             :     idx++;
Consolidate this.


Line 375:       Status s = env_->SyncDir(dir);
If this fails, shouldn't it affect the validity of the metadata file on 'dir'?


Line 377:         LOG(ERROR) << s.ToString() << Substitute("Unable to 
synchronize directory $0", dir); 
Trailing whitespace.

Also, our convention is to prepend text to s.Tostring() rather than append.


PS2, Line 390:   // The issue is that if we don't Create() the DataDirManager, 
we'll need some
             :   // way of determining which UUIDs are on each disk.
I don't understand this.


Line 411:           DCHECK(read_only_);
Why enforce this tight coupling?


Line 436:     unique_ptr<DataDir> dd(new DataDir(
Why bother creating a DataDir for a failed disk?


Line 460:     uuid_by_path_.swap(uuid_by_path);
There's no chance of failure here, so why not just update uuid_by_path_ 
directly?


Line 462:   // Update any instances that need to be synced with the main path 
set.
Nit: add empty line before.


Line 485:   // Build uuid index and data directory maps.
Update (and perhaps generalize, to avoid updating again).


PS2, Line 510:       LOG(INFO) << "idx" << idx;
             :       LOG(INFO) << "SIZE: " << path_set_.all_uuids_size();
             :       LOG(INFO) << "DS: " << path_set_.all_disk_states_size();
             :       LOG(INFO) << "PATHS: " << path_set_.all_paths_size();
Remove? Got some other log statements like this elsewhere.


Line 766:   LOG(INFO) << "ASDJASLDJASDJA";
?


http://gerrit.cloudera.org:8080/#/c/7270/2/src/kudu/fs/fs.proto
File src/kudu/fs/fs.proto:

PS2, Line 41: parent disk.
What's a "parent disk"? I think you're actually trying to get at the underlying 
filesystem health, no? Which may include the filesystem's disk's health too, 
though it's not exactly 1-1 (i.e. FS corruption due to a software bug is still 
something we'd care about even if the disk itself is healthy).


PS2, Line 60: a UUID
UUIDs (to agree with the plural PathSetPBs)


PS2, Line 60: unknown to the
Be more specific here, like "inside the set" or whatever.


Line 64:   // Paths that correspond to the UUIDs.
How about "All paths...", and also mention that the order matches that of 
all_uuids (or something along those lines).


PS2, Line 67: Timestamp
This is microseconds since the epoch? Can you clarify in the comment?


PS2, Line 67: the
omit


Line 70:   // PathDiskStatePBs corresponding to UUIDs in 'all_uuids'.
There are a lot of repeated fields here whose orders all match. How about 
creating a new record for the path itself (i.e. PathPB), putting the uuid, 
path, and state inside it, and using just one repeated here? You can deprecate 
all_uuids.


http://gerrit.cloudera.org:8080/#/c/7270/2/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

Line 244: Status FsManager::Open(FsReport* report) {
What about some of these read calls? And the 
CleanTmpFiles()/CheckAndFixPermissions() calls? Should we check for disk 
failures there too?


Line 281: Status 
FsManager::CreateInitialFileSystemLayout(boost::optional<string> uuid) {
So what happens if we see disk failures across every single disk? Shouldn't 
that be fatal?


Line 314:       LOG(ERROR) << "Unable to create FSManager root";
Should include s.ToString(). Below too.


Line 356:         if (!IsDiskFailure(s) || DirName(GetWalsRootDir()) == dir ||
Nit: move the second condition to a new line to better emphasize its similarity 
to the third condition.


http://gerrit.cloudera.org:8080/#/c/7270/2/src/kudu/integration-tests/disk_failure-itest.cc
File src/kudu/integration-tests/disk_failure-itest.cc:

Line 78:   void SetupDefaultTables(const vector<string>& table_ids = { kTableId 
}) {
I don't see this called with anything but the default argument.

Besides, can't we let the TestWorkloads create the tables?


Line 362: TEST_F(DiskFailureITest, TestFailDuringServerStartup) {
Probably need some NO_FATALS in here.


Line 374:  
Got some whitespace here.


Line 376:   SetServerSurvivalFlags(ext_tservers);
If we're only injecting failures to one tserver,
1. Why bother setting the survival flags on all of them?
2. Why bother restarting the entire cluster and not just that tserver?


Line 383:   cluster_->Restart();
ASSERT_OK


http://gerrit.cloudera.org:8080/#/c/7270/2/src/kudu/tserver/ts_disk_failure-test.cc
File src/kudu/tserver/ts_disk_failure-test.cc:

Line 154:   string glob_for_dir = JoinPathSegments(dir_with_tablet->dir(), 
"**");
Can this be moved to the FLAGS_env_inject_eio_globs section?


PS2, Line 163:   mini_server_->options()->master_addresses.clear();
             :   
mini_server_->options()->master_addresses.emplace_back("255.255.255.255", 1);
Why this?


PS2, Line 165:   Status s = mini_server_->Start();
             :   LOG(INFO) << s.ToString();
             :   ASSERT_OK(s);
Combine.


-- 
To view, visit http://gerrit.cloudera.org:8080/7270
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to