Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14760 )
Change subject: KUDU-2993: don't require update_dirs to fix directory inconsistencies ...................................................................... Patch Set 10: (18 comments) http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util-test.cc File src/kudu/fs/block_manager_util-test.cc: http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util-test.cc@a98 PS10, Line 98: Removed because it wasn't necessary for the test? http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.h File src/kudu/fs/block_manager_util.h: http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.h@41 PS10, Line 41: PathInstanceMetadataFile(Env* env, std::string uuid, std::string dir_type, At first blush, the addition of a UUID here is weird if our intent is to call LoadFromDisk. That means that the caller has to invent some UUID knowing full well that it's going to be discarded at LoadFromDisk time. Reading through the rest of the review, I understand how this makes sense in context (i.e. it's the "new UUID" if our intent is to call Create, and a "backup UUID" if LoadFromDisk fails). But it also feels like a hole in the PIMF abstraction as it only really makes sense if you know some stuff about how PIMFs are used by DataDirManager. To address this, perhaps you can doc here the expectations around 'uuid_'. That is, how it's initialized in the constructor, used in Create, may be overwritten on successful LoadFromDisk, etc. And use that to explain the new invariant: every PIMF, healthy or otherwise, has a valid UUID. http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.h@55 PS10, Line 55: Status Create(const std::set<std::string>& all_uuids, bool* created_dir = nullptr); I'll probably get this answered when I review the rest of the change, but why a set? Hmm, no I didn't figure it out. At the end of the day a protobuf repeated field looks more like a vector than a set, so why not preserve that abstraction in the PIMF vs. forcing the PIMF to do the conversion? http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.h@105 PS10, Line 105: // The directory type for this instance file. : const std::string dir_type_; Why this rename? http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.cc File src/kudu/fs/block_manager_util.cc: http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.cc@151 PS10, Line 151: RETURN_NOT_OK_FAIL_INSTANCE_PREPEND( Do we need to delete the created directory if we fail further down? In the spirit of restoring the filesystem to be the way we found it? http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.cc@154 PS10, Line 154: if (created_dir) { : *created_dir = created; : } Don't you want to enforce the (usual) contract where if a function fails, its OUT parameters aren't modified? OK, taken with the comment I left just above, the goal here is for the caller to enforce cleanup even if the stuff below fails. It works, but it's a bit of a layering violation (in that PIMF::Create alone can't handle cleanup). I think we can address it though: - Move L154-L156 to the end of the function. - Set up a ScopedCleanup to remove this directory. - If the function is successful, cancel the ScopedCleanup, with the understanding that it's now the caller's responsibility to clean up the created directory. http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/block_manager_util.cc@243 PS10, Line 243: "Could not lock instance file. Make sure that Kudu is not already running"); Why did you remove the "different user" part? http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.h@454 PS10, Line 454: bool* has_existing_instances = nullptr); Is it strictly necessary for this OUT parameter to be optional? Could we require it? http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@a377 PS10, Line 377: Do we need to enforce !read-only if one of the UPDATE_AND_ behaviors was chosen? http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@416 PS10, Line 416: CHECK(opts_.update_instances != UpdateInstanceBehavior::DONT_UPDATE); Use CHECK_NE instead. http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@448 PS10, Line 448: anything This was a little ambiguous; does it refer to "any instance whatsoever"? Or to "anything in the particular unhealthy instance we're talking about?" I think you mean the latter; could you reword to make that more explicit? http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@550 PS10, Line 550: udpate update http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@608 PS10, Line 608: laod load http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/data_dirs.cc@779 PS10, Line 779: } else { In keeping with the existing behavior, there's no reason to go down this branch if not fs_check.ok(). http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs.proto File src/kudu/fs/fs.proto: http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs.proto@60 PS10, Line 60: message PathInstanceMetadataPB { Won't this break old readers that try to deserialize a message with the (now optional) fields missing? http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs_manager.h File src/kudu/fs/fs_manager.h: http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs_manager.h@289 PS10, Line 289: FRIEND_TEST(fs::FsManagerTestBase, TestDuplicatePaths); : FRIEND_TEST(fs::FsManagerTestBase, TestEIOWhileRunningUpdateDirsTool); : FRIEND_TEST(fs::FsManagerTestBase, TestIsolatedMetadataDir); : FRIEND_TEST(fs::FsManagerTestBase, TestMetadataDirInWALRoot); : FRIEND_TEST(fs::FsManagerTestBase, TestMetadataDirInDataRoot); : FRIEND_TEST(fs::FsManagerTestBase, TestOpenWithDuplicateInstanceFiles); Want to just "friend class fs::FsManagerTestBase"? http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs_manager.cc File src/kudu/fs/fs_manager.cc: http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs_manager.cc@a163 PS10, Line 163: Don't we still want to enforce that if you picked UPDATE_AND, you can't use read_only? http://gerrit.cloudera.org:8080/#/c/14760/10/src/kudu/fs/fs_manager.cc@396 PS10, Line 396: "unable to create missing filesystem roots" Deduplicate this string. -- To view, visit http://gerrit.cloudera.org:8080/14760 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic3027e7edb5c60e96ced6160fec1a380b38353a5 Gerrit-Change-Number: 14760 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: YangSong <[email protected]> Gerrit-Comment-Date: Wed, 04 Dec 2019 19:21:03 +0000 Gerrit-HasComments: Yes
