Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19327 )
Change subject: [Tool] KUDU-3318 Compact container metadata file manually ...................................................................... Patch Set 8: (21 comments) http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/fs/log_block_manager.h File src/kudu/fs/log_block_manager.h: http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/fs/log_block_manager.h@215 PS8, Line 215: static bool ValidateContainerFile(Env* env, const std::string& container_path, Add some comments for this function. http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/fs/log_block_manager.cc@957 PS8, Line 957: Status LogBlockContainer::ProcessRecordsAndGetOffset( It's similar to LogBlockContainerNativeMeta::ProcessRecords(), how about reuse it? http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/fs/log_block_manager.cc@2789 PS8, Line 2789: total_blocks == 0 ? 1 : Just skip the compact procedure if compact_metadata is 0. http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/fs/log_block_manager.cc@2791 PS8, Line 2791: FLAGS_log_container_live_metadata_before_compact_ratio Add this parameter to compact_metadata tool, then --help will print it. http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/fs/log_block_manager.cc@2801 PS8, Line 2801: RETURN_NOT_OK(GenerateOffsetFile(env, dir, container_name, offset)); I guess the .compacted is needed to be deleted if the .offset file generate failed, right? http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/fs/log_block_manager.cc@2930 PS8, Line 2930: const string& backup_metadata_file = StrCat(metadata_file, kMetadataBackupFileSuffix); : // Delete exsited metadata backup file. : if (env->FileExists(backup_metadata_file)) { : RETURN_NOT_OK_PREPEND(env->DeleteFile(backup_metadata_file), : "delete metadatda backup file failed"); : } : : const string& offset_file = StrCat(JoinPathSegments(dir->dir(), container_name), : kMetadataOffsetFileSuffix); : // Delete exsited offset file. : if (env->FileExists(offset_file)) { : RETURN_NOT_OK_PREPEND(env->DeleteFile(offset_file), "delete .offset file failed"); : } : : const string& compacted_file = StrCat(JoinPathSegments(dir->dir(), container_name), : kMetadataCompactedFileSuffix); : // Delete exsited compacted file. : if (env->FileExists(compacted_file)) { : RETURN_NOT_OK_PREPEND(env->DeleteFile(compacted_file), "delete .compacted file failed"); : } There are a bit of duplicate, how about adding the suffixes to a list, and then iterate the list to delete the files? http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/fs/log_block_manager.cc@2987 PS8, Line 2987: Status LogBlockManager::GenerateCompactedFile(Env* env, Dir* dir, const string& container_name, How about do a small refactor for LogBlockManager::RewriteMetadataFile() and reuse it? http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/fs/log_block_manager.cc@3028 PS8, Line 3028: GetContainerNames There is a similar code piece in LogBlockManager::OpenDataDir(), is it possible to reuse it? http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc File src/kudu/tools/tool_action_fs.cc: http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@151 PS8, Line 151: value == "print" || value == "compact" || : value == "merge" || value == "clear" nit: use a std::set/std::unordered_set for fast lookup, and JoinStrings can be used to construct message below. http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1036 PS8, Line 1036: 1 Why it's 1 at the beginning? The same to cur_finished_containers_count and finished_containers_count. http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1033 PS8, Line 1033: atomic<uint64_t> total_blocks = 0; : atomic<uint64_t> total_live_blocks = 0; : atomic<bool> seen_fatal_error = false; : uint64_t finished_directories_count = 1; : uint64_t total_directories = fs_manager.dd_manager()->dirs().size(); nit: Move them closer to the place where use it. http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1040 PS8, Line 1040: unique_ptr<Dir> nit: auto http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1050 PS8, Line 1050: unique_ptr<Dir> nit: auto http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1056 PS8, Line 1056: Status s = LogBlockManager::GetContainerNames(env, dd->dir(), &container_names); The container_names has been obtained in line 1043, how about reuse it? http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1061 PS8, Line 1061: // Check whether the number of metadata files is equal to the data files's. : // If the two is not the same, tablet server can not be started. It is no need : // to do compacting operations. While the tserver is running, the container_names get in previous step may be outdate, the data file and metadata file may be deleted already here. How about just skip the container if finding this kind of "inconsistent"? http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1073 PS8, Line 1073: container_name, &total_blocks, &total_live_blocks, : &total_directories, finished_directories_count, : finished_containers_count, container_names, : &seen_fatal_error, cur_finished_containers_count, : total_containers, &dd, &env The parameter list is too long, how about encapsulating some counters into a structure? http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1091 PS8, Line 1091: return Why skip only when execute "merge" operation? http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1102 PS8, Line 1102: Finish to clear There is just an INFO log even if there is an error occurred ? And the log seems too verbos if there are thousands of containers. http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1113 PS8, Line 1113: result For the container failed to process, is the result correct and meaningful to be print? How about just skip them? http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1132 PS8, Line 1132: finished_directories_count++; It is in expect to consider the directory "finished" if skipped from line 1069? http://gerrit.cloudera.org:8080/#/c/19327/8/src/kudu/tools/tool_action_fs.cc@1297 PS8, Line 1297: It supports 4 operations:\n" : "--print: Statistic the total blocks and alive blocks " : "for every container.\n" : "--compact: Compact metadata files when Kudu is still running.\n" : "--merge: Stop Kudu and write incremental records into compacted " : "metadata files. Then backup old metadata files.\n" : "--clear: Delete metadata backup files.\n It would be more helpful if moving these descriptions to metadata_op_type gflag, it is shown when execute "kudu fs compact_metadata --help". The help info is also needed to simplify describe how to use it, in which order, what's the impact of each step. And a detail document is needed to be added to website if this patch merged. -- To view, visit http://gerrit.cloudera.org:8080/19327 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id61cb6c88d7a093a8448e2497639f5c5c094efa4 Gerrit-Change-Number: 19327 Gerrit-PatchSet: 8 Gerrit-Owner: Wang Xixu <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Comment-Date: Wed, 31 Jan 2024 08:13:52 +0000 Gerrit-HasComments: Yes
