YangSong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14654 )
Change subject: KUDU-2975: Spread WAL across multiple directories ...................................................................... Patch Set 13: (14 comments) http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs.proto File src/kudu/fs/fs.proto: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs.proto@58 PS13, Line 58: // A filesystem instance can contain multiple paths. One of these structures : // is persisted in each path when the filesystem instance is created. : message PathInstanceMetadataPB { : // Describes this path instance as well as all of the other path instances : // that, taken together, describe a complete set. : required PathSetPB path_set = 1; : : // Textual representation of the block manager that formatted this path. : required string block_manager_type = 2; : : // Block size on the filesystem where this instance was created. If the : // instance (and its data) are ever copied to another location, the block : // size in that location must be the same. : required uint64 filesystem_block_size_bytes = 3; : } > I'm not convinced we have to reuse this either, especially if we end up not If we don't neet to do the consistency check for WAL directories, we don't need to record the UUIDs of all WAL directories. If the "wal_manager_instance" file is not accessible, we can't know the UUID of this directory any more. We must make a new UUID for it. Maybe we just need to restore the file with the original UUID. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.h File src/kudu/fs/fs_manager.h: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.h@98 PS13, Line 98: // The directory root where WALs will be stored. Cannot be empty. : std::vector<std::string> wal_roots; > Rather than keeping track of two sets of WAL roots (wal_root _and_ wal_root Yes, we can put them in "wal_roots". Just make sure that only one gflag is used. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc File src/kudu/fs/fs_manager.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc@195 PS13, Line 195: if (!opts_.wal_root.empty() && !opts_.wal_roots.empty()) { : return Status::IOError("Write-ahead log directory (fs_wal_dir and fs_wal_dirs)" : " provided conflictly"); : } > nit: I think this would be nicer as a gflag validator. I don't know how to use gflag validator. Here use "FLAGS_fs_wal_dir" and "FLAGS_fs_wal_dirs" instead of "opts_.wal_root" and "opts_.wal_roots"? http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc@392 PS13, Line 392: if (opts_.consistency_check != ConsistencyCheckBehavior::UPDATE_ON_DISK) { > Why this check? If we use "fs_wal_dirs" instead of "fs_wal_dir", maybe we add a WAL dir, the new dir has not a subdir named "wals". The following code will report an error when we do "kudu fs update_dirs". http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc@474 PS13, Line 474: report->wal_dir = canonicalized_wal_fs_roots_[0].path; > We should probably allow the FsReport to accept multiple WAL directories to Yes http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc@534 PS13, Line 534: vector<string> ancillary_dirs = JoinPathSegmentsV( : WalDirManager::GetRootNames(canonicalized_wal_fs_roots_), kWalDirName); > How about let's have the WalDirManager manage the creation of WAL directori Yes, it's required. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc@735 PS13, Line 735: : string FsManager::GetTabletWalDir(const std::string& tablet_id) const { : string dir; : Status s = wd_manager_->FindWalDirByTabletId(tablet_id, &dir); : if (!s.ok()) { : LOG(WARNING) << "cannot find tablet:" << tablet_id << " wal directory." << : s.message().ToString(); : return ""; : } : return JoinPathSegments(dir, tablet_id); : } > I think we'll eventually want to have this return a Status and have a strin Yes http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/wal_dirs.cc File src/kudu/fs/wal_dirs.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/wal_dirs.cc@436 PS13, Line 436: consistency_check > I'm not convinced that we need the consistency checking behavior for WALs. this is for "kudu fs update_dirs" tool. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/wal_dirs.cc@605 PS13, Line 605: std::lock_guard<percpu_rwlock> write_lock(dir_lock_); > We only need to hold the lock in read mode for ContainsKey(), and we don't OK http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tablet/tablet_bootstrap-test.cc File src/kudu/tablet/tablet_bootstrap-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tablet/tablet_bootstrap-test.cc@121 PS13, Line 121: fs_manager_->wd_manager()->DeleteWalDir(log::kTestTablet); > Why did we have to do this? This class is based on "LogTestBase", in "LogTestBase" SetUp() function, I add a line code like: ASSERT_OK(fs_manager_->wd_manager()->CreateWalDir(kTestTablet)); so here I first delete the tablet's WAL dir message in WalDirManager. Let it created or loaded by "TabletMetadata::LoadOrCreate()". http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tools/tool_action_local_replica.cc File src/kudu/tools/tool_action_local_replica.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tools/tool_action_local_replica.cc@524 PS13, Line 524: CreateWalDir > Aren't we operating on an existing tablet replica? Why are we creating a br Yes, There's a problem here. I should load the tablet's metadata, and get the WAL dir from metedata, if cannot found, I should use "TryCreateWalDir". I will modify it later. Above is the same. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tserver/tablet_copy_client-test.cc File src/kudu/tserver/tablet_copy_client-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tserver/tablet_copy_client-test.cc@298 PS13, Line 298: fs_manager_->wd_manager()->CreateWalDir(GetTabletId()); > Why are we creating a WAL directory from scratch here? Shouldn't the WAL di Yes, I will modify it later. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tserver/tablet_server-test.cc@2453 PS13, Line 2453: const int kLimit = rand() % (kMaxLimit + 1) + 1; > This isn't related to this patch, right? If not, could you put it in a sepa Yes, not related to this patch. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/util/env_util.cc File src/kudu/util/env_util.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/util/env_util.cc@362 PS13, Line 362: Status ListDirsInDir(Env* env, : const string& path, : vector<string>* entries) { : RETURN_NOT_OK(env->GetChildren(path, entries)); : auto iter = entries->begin(); : while (iter != entries->end()) { : if (*iter == "." || *iter == ".." || iter->find(kTmpInfix) != string::npos) { : iter = entries->erase(iter); : continue; : } : bool is_dir = false; : Status s = env->IsDirectory(JoinPathSegments(path, *iter), &is_dir); : if (!s.ok() || !is_dir) { : iter = entries->erase(iter); : continue; : } : ++iter; : } : return Status::OK(); : } : > We don't need this anymore, right? It's used in "mini_cluster_fs_inspector.cc". Go through all the files under the "wals" directory to get all tablets in old version, but new version we should go through the dirs, because we add a "wal_manager_instance" file under wals dir. -- To view, visit http://gerrit.cloudera.org:8080/14654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied496804421d91ff1fa63d49979fde971071506e Gerrit-Change-Number: 14654 Gerrit-PatchSet: 13 Gerrit-Owner: YangSong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: YangSong <[email protected]> Gerrit-Comment-Date: Fri, 15 Nov 2019 09:28:43 +0000 Gerrit-HasComments: Yes
