Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14654 )
Change subject: KUDU-2975: Spread WAL across multiple directories ...................................................................... Patch Set 13: (20 comments) I think we could get rid of a lot of reused code from the consistency checking. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs.proto File src/kudu/fs/fs.proto: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs.proto@58 PS13, Line 58: // A filesystem instance can contain multiple paths. One of these structures : // is persisted in each path when the filesystem instance is created. : message PathInstanceMetadataPB { : // Describes this path instance as well as all of the other path instances : // that, taken together, describe a complete set. : required PathSetPB path_set = 1; : : // Textual representation of the block manager that formatted this path. : required string block_manager_type = 2; : : // Block size on the filesystem where this instance was created. If the : // instance (and its data) are ever copied to another location, the block : // size in that location must be the same. : required uint64 filesystem_block_size_bytes = 3; : } I'm not convinced we have to reuse this either, especially if we end up not using the consistency check for UUIDs. We definitely don't need the block_manager_type for WALs, and I don't think we need the block size for WALs either. We probably also don't need all_uuids if we end up not doing the consistency check. What if the wal_manager_instance were just a simple file that had the WAL UUID in it? Something like: message WalDirInstancePB { optional bytes uuid = 1; } That way we can evolve it separately from the block_manager_instance files, while only including what we need for now. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs.proto@121 PS13, Line 121: // Tablet log is spread across a specified WAL directorie. This PB is represented : // by the UUID of the WAL directorie it consists of. nit reword a little bit: A tablet's WAL is placed in a single WAL directory. This directory is represented by a UUID. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.h File src/kudu/fs/fs_manager.h: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.h@98 PS13, Line 98: // The directory root where WALs will be stored. Cannot be empty. : std::vector<std::string> wal_roots; Rather than keeping track of two sets of WAL roots (wal_root _and_ wal_roots), let's just use a single vector<string> wal_roots. That way, there will be no difference between using fs_wal_dir and inputting a single directory with fs_wal_dirs. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc File src/kudu/fs/fs_manager.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc@195 PS13, Line 195: if (!opts_.wal_root.empty() && !opts_.wal_roots.empty()) { : return Status::IOError("Write-ahead log directory (fs_wal_dir and fs_wal_dirs)" : " provided conflictly"); : } nit: I think this would be nicer as a gflag validator. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc@392 PS13, Line 392: if (opts_.consistency_check != ConsistencyCheckBehavior::UPDATE_ON_DISK) { Why this check? http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc@474 PS13, Line 474: report->wal_dir = canonicalized_wal_fs_roots_[0].path; We should probably allow the FsReport to accept multiple WAL directories too. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc@534 PS13, Line 534: vector<string> ancillary_dirs = JoinPathSegmentsV( : WalDirManager::GetRootNames(canonicalized_wal_fs_roots_), kWalDirName); How about let's have the WalDirManager manage the creation of WAL directories, like we do for the DataDirManager. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/fs_manager.cc@735 PS13, Line 735: : string FsManager::GetTabletWalDir(const std::string& tablet_id) const { : string dir; : Status s = wd_manager_->FindWalDirByTabletId(tablet_id, &dir); : if (!s.ok()) { : LOG(WARNING) << "cannot find tablet:" << tablet_id << " wal directory." << : s.message().ToString(); : return ""; : } : return JoinPathSegments(dir, tablet_id); : } I think we'll eventually want to have this return a Status and have a string* out-parameter. Returning special values like "" is a bit brittle. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/wal_dirs.cc File src/kudu/fs/wal_dirs.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/wal_dirs.cc@302 PS13, Line 302: Status WalDirManager::UpdateInstances( If we don't do the consistency check for WAL directories, I don't think we'll need this. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/wal_dirs.cc@436 PS13, Line 436: consistency_check I'm not convinced that we need the consistency checking behavior for WALs. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/wal_dirs.cc@582 PS13, Line 582: Status WalDirManager::LoadWalDirFromPB(const std::string& tablet_id, : const WalDirPB& pb) { It's important to remember that we won't always have a WalDirPB, for example in cases where the tablets are from older versions of Kudu. I know you haven't begun working on the migration path, but you should think about it before going too far, so you can make sure it makes sense. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/wal_dirs.cc@604 PS13, Line 604: TryCreateWalDir nit: maybe call this FindAndRegisterWalDir or something? That way it's clear we're looking for an existing WAL directory. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/fs/wal_dirs.cc@605 PS13, Line 605: std::lock_guard<percpu_rwlock> write_lock(dir_lock_); We only need to hold the lock in read mode for ContainsKey(), and we don't have to hold the lock at all when we're doing the FileExists() checks. Could you rewrite this as: { read lock(dir_lock_); check if dir already exists } // Check to see if any of our WAL directories have the given tablet ID. set<string> wd_uuids; for (wd : wal_dirs) { if (WAL exists in wd) { wd_uuids.emplace_back(wd.uuid()); } } // If anything is wrong (e.g. if there are no directories or multiple directories with the WAL), return an error. if (wd_uuids.size() != 1) { return an error } // Now that we know we have a WAL, insert it into the map. { write lock(dir_lock_); insert wd_uuids.begin() into the map } return OK http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tablet/tablet_bootstrap-test.cc File src/kudu/tablet/tablet_bootstrap-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tablet/tablet_bootstrap-test.cc@121 PS13, Line 121: fs_manager_->wd_manager()->DeleteWalDir(log::kTestTablet); Why did we have to do this? http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tablet/tablet_metadata.cc@481 PS13, Line 481: // Here cannot flush directly. May causes a deadlock in the tool code. : // If we switch 'fs_wal_dir' to 'fs_wal_dirs', we should try to find the directory : // where tablet's WAL is located. : // If we found, we should record WAL directory UUID into metadata, and flush metadata. : // otherwise maybe we should crash. : //RETURN_NOT_OK(Flush()); Yeah, the same is true for data directories. Next time we flush metadata, we'll flush the new WAL directory UUID. http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tools/tool_action_local_replica.cc File src/kudu/tools/tool_action_local_replica.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tools/tool_action_local_replica.cc@367 PS13, Line 367: // For new version, need load metadata, find directory from metadata. : RETURN_NOT_OK(fs_manager.wd_manager()->CreateWalDir(tablet_id)); Won't this happen when we load the tablet metadata? http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tools/tool_action_local_replica.cc@524 PS13, Line 524: CreateWalDir Aren't we operating on an existing tablet replica? Why are we creating a brand new WAL directory from scratch? Don't we want to use TryCreateWalDir here to find the existing WAL directory? http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tserver/tablet_copy_client-test.cc File src/kudu/tserver/tablet_copy_client-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tserver/tablet_copy_client-test.cc@298 PS13, Line 298: fs_manager_->wd_manager()->CreateWalDir(GetTabletId()); Why are we creating a WAL directory from scratch here? Shouldn't the WAL directory already know about the tablet? http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/tserver/tablet_server-test.cc@2453 PS13, Line 2453: const int kLimit = rand() % (kMaxLimit + 1) + 1; This isn't related to this patch, right? If not, could you put it in a separate patch so that this patch is focused on only the WAL? http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/util/env_util.cc File src/kudu/util/env_util.cc: http://gerrit.cloudera.org:8080/#/c/14654/13/src/kudu/util/env_util.cc@362 PS13, Line 362: Status ListDirsInDir(Env* env, : const string& path, : vector<string>* entries) { : RETURN_NOT_OK(env->GetChildren(path, entries)); : auto iter = entries->begin(); : while (iter != entries->end()) { : if (*iter == "." || *iter == ".." || iter->find(kTmpInfix) != string::npos) { : iter = entries->erase(iter); : continue; : } : bool is_dir = false; : Status s = env->IsDirectory(JoinPathSegments(path, *iter), &is_dir); : if (!s.ok() || !is_dir) { : iter = entries->erase(iter); : continue; : } : ++iter; : } : return Status::OK(); : } : We don't need this anymore, right? -- To view, visit http://gerrit.cloudera.org:8080/14654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied496804421d91ff1fa63d49979fde971071506e Gerrit-Change-Number: 14654 Gerrit-PatchSet: 13 Gerrit-Owner: YangSong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: YangSong <[email protected]> Gerrit-Comment-Date: Fri, 15 Nov 2019 07:06:49 +0000 Gerrit-HasComments: Yes
