Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14654 )
Change subject: KUDU-2975: Spread WAL across multiple directories ...................................................................... Patch Set 22: (30 comments) http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs.proto File src/kudu/fs/fs.proto: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs.proto@121 PS22, Line 121: // Tablet log is spread across a specified WAL directorie. This PB is represented : // by the UUID of the WAL directorie it consists of. nit: maybe to make it clearer that a single tablet will have a single WAL directory, reword this as: "A tablet's WAL exists in a single WAL directory. This directory is represented by the UUID." http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.h File src/kudu/fs/fs_manager.h: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.h@218 PS22, Line 218: : Status GetOrphanedWalDir(const std::string& tablet_id, std::string* wal_dir); : : Status GetOrphanedWalRecoveryDir(const std::string& tablet_id, : std::string* wal_recovery_dir); The idea of "orphaned" WAL directories is new in this patch. Could you comment what these mean? When would we expect there to be an orphaned WAL directory? Or an orphaned recovery directory? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.cc File src/kudu/fs/fs_manager.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.cc@84 PS22, Line 84: : DEFINE_string(fs_wal_dir, "", : "Directory with write-ahead logs. If this is not specified, the " : "program will not start. May be the same as fs_data_dirs"); : TAG_FLAG(fs_wal_dir, stable); : DEFINE_string(fs_wal_dirs, "", : "Directory with write-ahead logs. If this is not specified, the " : "program will not start. May be the same as fs_data_dirs"); : TAG_FLAG(fs_wal_dirs, stable); Can you add a group flag validator like ValidateUpdateTabletStatsInterval in ts_tablet_manager.cc that verifies that exactly one of these is empty? That should simplify some of the behavior below. http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/fs_manager.cc@735 PS22, Line 735: Status FsManager::GetOrphanedWalDir(const string& tablet_id, string* wal_dir) { : if (wal_dir == nullptr) { : return Status::InvalidArgument("wal_dir cannot be nullptr"); : } : Status s = wd_manager_->FindTabletDirFromDisk(tablet_id, "", wal_dir); : if (!s.ok()) { : LOG(WARNING) << "cannot find tablet:" << tablet_id << " wal directory." << : s.message().ToString(); : } : return s; : } : : Status FsManager::GetOrphanedWalRecoveryDir(const string& tablet_id, : string* wal_recovery_dir) { : if (wal_recovery_dir == nullptr) { : return Status::InvalidArgument("wal_recovery_dir cannot be nullptr"); : } : Status s = wd_manager_->FindTabletDirFromDisk(tablet_id, kWalsRecoveryDirSuffix, : wal_recovery_dir); : if (!s.ok()) { : LOG(WARNING) << "cannot find tablet:" << tablet_id << " wal directory." << : s.message().ToString(); : } : return s; : } : As written, I think these mean that: - a WAL directory hasn't already been registered with the WAL directory manager - we'll search for a WAL directory from disk. http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.h File src/kudu/fs/wal_dirs.h: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.h@210 PS22, Line 210: // Results in an error if the tablet has not a WAL dir associated with : // it. If returning with an error, the : // WalDirManager will be unchanged. nit: reword and add a bit extra: "Looks into each WAL directory for an existing WAL for 'tablet_id'. Results in an error if no such WAL exists, or if there are multiple associated with the given tablet ID." http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.h@353 PS22, Line 353: dir_by_tablet_map_ nit: can you rename this so it's clear whether this is referring to the UUID or the directory name? Like 'dir_name_by_tablet_' or 'uuid_by_tablet_' http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc File src/kudu/fs/wal_dirs.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@75 PS22, Line 75: DEFINE_bool(fs_wal_dirs_consider_available_space, true, : "Whether to consider available space when selecting a wal " : "directory during tablet or wal block creation."); : TAG_FLAG(fs_wal_dirs_consider_available_space, runtime); : TAG_FLAG(fs_wal_dirs_consider_available_space, evolving); This isn't used, is it? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@605 PS22, Line 605: r *wal_ nit: move the * so it's next to the type name, i.e. const WalDir* wal_dir = ... http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@614 PS22, Line 614: if (other != nullptr) { nit: this can be written as: if (other) { ... http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@616 PS22, Line 616: "tried to load directory for tablet $0 but one is already registered", : tablet_id)); nit: maybe also log the existing directory? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@664 PS22, Line 664: postfix nit: we usually call this a "suffix" http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@675 PS22, Line 675: if (!postfix.empty()) { : tablet_path.append(postfix); : } We don't need to condition on postfix.empty(). If it's empty, tablet_path.append(postfix) will be a no-op. http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@710 PS22, Line 710: if (tablets_by_uuid.second.size() < min_tablets_num) { : min_tablets_num = tablets_by_uuid.second.size(); : target_uuid = tablets_by_uuid.first; : } The fact that we're always using the least-populated directory is different than what we do for data directories. Is that intentional? Why did you choose this heuristic instead of the PO2C algorithm? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/fs/wal_dirs.cc@808 PS22, Line 808: : bool WalDirManager::IsWalDirFailed(const string& uuid) const { : shared_lock<rw_spinlock> lock(dir_lock_.get_lock()); : return ContainsKey(failed_wal_dirs_, uuid); : } This isn't used, right? Same with MarkWalDirFailedByUuid http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata-test.cc File src/kudu/tablet/tablet_metadata-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata-test.cc@47 PS22, Line 47: class TestTabletMetadata : public KuduTabletTest { : public: : TestTabletMetadata() : : KuduTabletTest(GetSimpleTestSchema()) { : } : : virtual void SetUp() OVERRIDE { : KuduTabletTest::SetUp(); : writer_.reset(new LocalTabletWriter(harness_->tablet().get(), : &client_schema_)); : } : : void BuildPartialRow(int key, int intval, const char* strval, : gscoped_ptr<KuduPartialRow>* row); : : protected: : gscoped_ptr<LocalTabletWriter> writer_; : }; TODO(awong): can we also add a test that, if we load a superblock that _doesn't_ have a WAL UUID, then we'll assign one based on what we find on disk? And if that fails, then we'll get an error. http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc@463 PS22, Line 463: // An error loading the WAL dir is non-fatal, it just means the : // tablet will fail to bootstrap later. TODO(awong): make sure this is true http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc@470 PS22, Line 470: 1.10.0 before 1.11, since this new code would land in 1.12 http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc@477 PS22, Line 477: // Here cannot flush directly. May causes a deadlock in the tool code. : // If we switch 'fs_wal_dir' to 'fs_wal_dirs', we should try to find the directory : // where tablet's WAL is located. : // If we found, we should record WAL directory UUID into metadata, and flush metadata. : // otherwise we do nothing here. At bootstrap we will check the tablet's WAL directory. : // And it will failed, this tablet will be marked failed. : //RETURN_NOT_OK(Flush()); Do we need this anymore? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tablet/tablet_metadata.cc@741 PS22, Line 741: DataDirGroupPB nit: extend this, like: "Serialize the tablet's data directory group and WAL directory, if they exist. They may.." http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/kudu-tool-test.cc@5219 PS22, Line 5219: // master use "fs_wal_dir", tserver use "fs_wal_dirs". Is there a good reason to do this here? I know that in a real deployment, we'd probably use a single WAL directory. But this test doesn't really have much to do with the fs_wal_dir flags specifically; --fs_wal_dir and --fs_wal_dirs aren't special when it comes to gflags. Could we just leave it as is? Or could we change the ExternalMiniCluster to always use fs_wal_dirs, even if the number of WAL directories is 1? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_fs.cc File src/kudu/tools/tool_action_fs.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_fs.cc@342 PS22, Line 342: // data dir group for the tablet, and the staged directory config won't : // affect this tablet. : DCHECK(s1.IsNotFound() || s2.IsNotFound()) << s1.ToString(); : continue; : } : RETURN_NOT_OK_PREPEND(s1, "at least one tablet is configured to use removed data directory. " : "Retry with --force to override this"); : RETURN_NOT_OK_PREPEND(s2, "at least one tablet is configured to use removed WAL directory. " : "Retry with --force to override this"); : } TODO(awong): is this tested http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_local_replica.cc File src/kudu/tools/tool_action_local_replica.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_local_replica.cc@367 PS22, Line 367: // For new version, need load metadata, find directory from metadata. But we just find from : // wal directory first. : RETURN_NOT_OK(fs_manager.wd_manager()->FindAndRegisterWalDir(tablet_id)); Doesn't this already happen in TabletMetadata::Load()? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tools/tool_action_local_replica.cc@524 PS22, Line 524: // For new version, need load metadata, find directory from metadata. : // But we just find the wal dir for tablet from disk first. : RETURN_NOT_OK(fs_manager->wd_manager()->FindAndRegisterWalDir(tablet_id)); Should we call TabletMetadata::Load() instead? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server-test.cc File src/kudu/tserver/mini_tablet_server-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server-test.cc@41 PS22, Line 41: : TEST_F(MiniTabletServerTest, TestMultiDataDirServer) { : // Specifying the number of data directories will create subdirectories under the test root. : unique_ptr<MiniTabletServer> mini_server; : FsManager* fs_manager; : : int kNumDataDirs = 3; : mini_server.reset(new MiniTabletServer(GetTestPath("TServer"), : HostPort("127.0.0.1", 0), kNumDataDirs)); : ASSERT_OK(mini_server->Start()); : fs_manager = mini_server->server()->fs_manager(); : ASSERT_EQ(1, fs_manager->GetWalRootDirs().size()); : ASSERT_STR_CONTAINS(DirName(fs_manager->GetWalRootDirs()[0]), "wal"); : ASSERT_EQ(kNumDataDirs, fs_manager->GetDataRootDirs().size()); : } : : TEST_F(MiniTabletServerTest, TestMultiWalDirServer) { : // Specifying the number of data directories will create subdirectories under the test root. : unique_ptr<MiniTabletServer> mini_server; : FsManager* fs_manager; : : int kNumWalDirs = 3; : mini_server.reset(new MiniTabletServer(GetTestPath("TServer"), : HostPort("127.0.0.1", 0), 1, kNumWalDirs)); : ASSERT_OK(mini_server->Start()); : fs_manager = mini_server->server()->fs_manager(); : ASSERT_EQ(1, fs_manager->GetDataRootDirs().size()); : ASSERT_STR_CONTAINS(DirName(fs_manager->GetDataRootDirs()[0]), "data"); : ASSERT_EQ(kNumWalDirs, fs_manager->GetWalRootDirs().size()); : } nit: maybe combine these tests so we have a single test that has multiple WAL directories and multiple data directories? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server.cc File src/kudu/tserver/mini_tablet_server.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/mini_tablet_server.cc@76 PS22, Line 76: // This place needs to be consistent with the original logic. : // Otherwise, some test cases will not pass. : // If there just one wal dir and data dir, we just use fs_root. nit: Thanks for the note! This comment should be removed before we merge this. http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_copy_client-test.cc File src/kudu/tserver/tablet_copy_client-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_copy_client-test.cc@507 PS22, Line 507: fs_manager_->wd_manager()->CreateWalDir(GetTabletId()); Why do we need this? Shouldn't this have already happened when we setup the test? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-stress-test.cc File src/kudu/tserver/tablet_server-stress-test.cc: PS22: Instead of copying the existing test, how about making this a parameterized test, parameterized by the number of directories to use. Search around for "TEST_P" and "INSTANTIATE_TEST_CASE_P" for examples of parameterized tests. http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-test.cc@206 PS22, Line 206: /*num_data_dirs=*/1, /*num_wal_dirs=*/3 Why not leave it using the default of 1 data directory and 1 WAL directory? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/tablet_server-test.cc@3735 PS22, Line 3735: string tablet_meta_path = JoinPathSegments(JoinPathSegments( : GetTestPath("TabletServerTest-fsroot"), "wal-0"), "tablet-meta"); I'm kind of surprised by this. Why did the metadata directory change? http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/14654/22/src/kudu/tserver/ts_tablet_manager.cc@1460 PS22, Line 1460: meta->fs_manager()->GetTabletWalDir(tablet_id, &wal_dir); : meta->fs_manager()->GetTabletWalRecoveryDir(tablet_id, &wal_recovery_dir); Should we WARN_NOT_OK here? Or maybe we should check to look if the status is an error and LOG(FATAL) if so? -- To view, visit http://gerrit.cloudera.org:8080/14654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied496804421d91ff1fa63d49979fde971071506e Gerrit-Change-Number: 14654 Gerrit-PatchSet: 22 Gerrit-Owner: YangSong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: YangSong <[email protected]> Gerrit-Comment-Date: Wed, 11 Dec 2019 01:10:59 +0000 Gerrit-HasComments: Yes
