Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14760 )

Change subject: KUDU-2993: don't require update_dirs to fix directory 
inconsistencies
......................................................................


Patch Set 10:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util-test.cc
File src/kudu/fs/block_manager_util-test.cc:

http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util-test.cc@a98
PS10, Line 98:
Removed because it wasn't necessary for the test?


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.h
File src/kudu/fs/block_manager_util.h:

http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.h@41
PS10, Line 41:   PathInstanceMetadataFile(Env* env, std::string uuid, 
std::string dir_type,
At first blush, the addition of a UUID here is weird if our intent is to call 
LoadFromDisk. That means that the caller has to invent some UUID knowing full 
well that it's going to be discarded at LoadFromDisk time.

Reading through the rest of the review, I understand how this makes sense in 
context (i.e. it's the "new UUID" if our intent is to call Create, and a 
"backup UUID"  if LoadFromDisk fails). But it also feels like a hole in the 
PIMF abstraction as it only really makes sense if you know some stuff about how 
PIMFs are used by DataDirManager.

To address this, perhaps you can doc here the expectations around 'uuid_'. That 
is, how it's initialized in the constructor, used in Create, may be overwritten 
on successful LoadFromDisk, etc. And use that to explain the new invariant: 
every PIMF, healthy or otherwise, has a valid UUID.


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.h@55
PS10, Line 55:   Status Create(const std::set<std::string>& all_uuids, bool* 
created_dir = nullptr);
I'll probably get this answered when I review the rest of the change, but why a 
set?

Hmm, no I didn't figure it out. At the end of the day a protobuf repeated field 
looks more like a vector than a set, so why not preserve that abstraction in 
the PIMF vs. forcing the PIMF to do the conversion?


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.h@105
PS10, Line 105:   // The directory type for this instance file.
              :   const std::string dir_type_;
Why this rename?


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.cc
File src/kudu/fs/block_manager_util.cc:

http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.cc@151
PS10, Line 151:   RETURN_NOT_OK_FAIL_INSTANCE_PREPEND(
Do we need to delete the created directory if we fail further down? In the 
spirit of restoring the filesystem to be the way we found it?


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.cc@154
PS10, Line 154:   if (created_dir) {
              :     *created_dir = created;
              :   }
Don't you want to enforce the (usual) contract where if a function fails, its 
OUT parameters aren't modified?

OK, taken with the comment I left just above, the goal here is for the caller 
to enforce cleanup even if the stuff below fails. It works, but it's a bit of a 
layering violation (in that PIMF::Create alone can't handle cleanup). I think 
we can address it though:
- Move L154-L156 to the end of the function.
- Set up a ScopedCleanup to remove this directory.
- If the function is successful, cancel the ScopedCleanup, with the 
understanding that it's now the caller's responsibility to clean up the created 
directory.


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.cc@243
PS10, Line 243:       "Could not lock instance file. Make sure that Kudu is not 
already running");
Why did you remove the "different user" part?


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.h@454
PS10, Line 454:       bool* has_existing_instances = nullptr);
Is it strictly necessary for this OUT parameter to be optional? Could we 
require it?


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc
File src/kudu/fs/data_dirs.cc:

http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@a377
PS10, Line 377:
Do we need to enforce !read-only if one of the UPDATE_AND_ behaviors was chosen?


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@416
PS10, Line 416:   CHECK(opts_.update_instances != 
UpdateInstanceBehavior::DONT_UPDATE);
Use CHECK_NE instead.


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@448
PS10, Line 448: anything
This was a little ambiguous; does it refer to "any instance whatsoever"? Or to 
"anything in the particular unhealthy instance we're talking about?" I think 
you mean the latter; could you reword to make that more explicit?


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@550
PS10, Line 550: udpate
update


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@608
PS10, Line 608: laod
load


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@779
PS10, Line 779:       } else {
In keeping with the existing behavior, there's no reason to go down this branch 
if not fs_check.ok().


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs.proto
File src/kudu/fs/fs.proto:

http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs.proto@60
PS10, Line 60: message PathInstanceMetadataPB {
Won't this break old readers that try to deserialize a message with the (now 
optional) fields missing?


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs_manager.h
File src/kudu/fs/fs_manager.h:

http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs_manager.h@289
PS10, Line 289:   FRIEND_TEST(fs::FsManagerTestBase, TestDuplicatePaths);
              :   FRIEND_TEST(fs::FsManagerTestBase, 
TestEIOWhileRunningUpdateDirsTool);
              :   FRIEND_TEST(fs::FsManagerTestBase, TestIsolatedMetadataDir);
              :   FRIEND_TEST(fs::FsManagerTestBase, TestMetadataDirInWALRoot);
              :   FRIEND_TEST(fs::FsManagerTestBase, TestMetadataDirInDataRoot);
              :   FRIEND_TEST(fs::FsManagerTestBase, 
TestOpenWithDuplicateInstanceFiles);
Want to just "friend class fs::FsManagerTestBase"?


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs_manager.cc@a163
PS10, Line 163:
Don't we still want to enforce that if you picked UPDATE_AND, you can't use 
read_only?


http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs_manager.cc@396
PS10, Line 396: "unable to create missing filesystem roots"
Deduplicate this string.



--
To view, visit http://gerrit.cloudera.org:8080/14760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic3027e7edb5c60e96ced6160fec1a380b38353a5
Gerrit-Change-Number: 14760
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: YangSong <[email protected]>
Gerrit-Comment-Date: Wed, 04 Dec 2019 19:21:03 +0000
Gerrit-HasComments: Yes

Reply via email to