Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19327 )
Change subject: [Tool] Compact metadata file for old versions of Kudu ...................................................................... Patch Set 5: (26 comments) http://gerrit.cloudera.org:8080/#/c/19327/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19327/5//COMMIT_MSG@7 PS5, Line 7: old versions Why mention ”old version“, it can be used for any version of Kudu, right? http://gerrit.cloudera.org:8080/#/c/19327/5//COMMIT_MSG@7 PS5, Line 7: metadata nit: container metadata http://gerrit.cloudera.org:8080/#/c/19327/5//COMMIT_MSG@20 PS5, Line 20: # nit: remove http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.h File src/kudu/fs/log_block_manager.h: http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.h@207 PS5, Line 207: Add some comments for these public functions. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.cc@563 PS5, Line 563: Add some comments. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.cc@571 PS5, Line 571: static Status ProcessRecord( Add some comments. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.cc@860 PS5, Line 860: Status LogBlockContainer::ProcessRecordsAndGetOffset( It is similar to ProcessRecords(), is it possible to refactor and reuse some code? http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.cc@2626 PS5, Line 2626: BlockRecordMap live_block_records; : uint64_t offset = 0; : uint64_t total_blocks = 0; : string metadata_path = StrCat(JoinPathSegments(dir->dir(), container_name), : kContainerMetadataFileSuffix); : RETURN_NOT_OK(LogBlockContainer::ProcessRecordsAndGetOffset( : env, metadata_path, &live_block_records, &total_blocks, &offset)); : *live_block_num = live_block_records.size(); Is it the same to PrintLowLiveRatioBlockContainer()? http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.cc@2634 PS5, Line 2634: total_blocks = total_blocks == 0 ? 1 : total_blocks; If total_blocks is 0, return directly? http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.cc@2637 PS5, Line 2637: low nit: low live http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.cc@2651 PS5, Line 2651: Status LogBlockManager::PrintLowLiveRatioBlockContainer(Env* env, Dir* dir, There is not any "print" operation, "Print" can be removed. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.cc@2761 PS5, Line 2761: / nit: use JoinPathSegments. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/fs/log_block_manager.cc@2763 PS5, Line 2763: kMetadataBackupFileSuffix If user don't want to do merge operation, make it possible to clear ".offset" and ".compact" files. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/tools/tool_action_fs.cc File src/kudu/tools/tool_action_fs.cc: http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/tools/tool_action_fs.cc@123 PS5, Line 123: , nit: and http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/tools/tool_action_fs.cc@125 PS5, Line 125: return (value == "print" || value == "compact" || value == "merge" || value == "clear"); How about add some error logs when validate failed? http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/tools/tool_action_fs.cc@125 PS5, Line 125: "print" It would be greate if use the declared variables below (i.e. kMetadata*Op) and add the valid values to a set to validate flags. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/tools/tool_action_fs.cc@867 PS5, Line 867: Use file lock It would be better to describe why to add the file lock. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/tools/tool_action_fs.cc@869 PS5, Line 869: filename nit: lock_filename? http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/tools/tool_action_fs.cc@945 PS5, Line 945: LOG(INFO) << "Finish to print container: " << container_name : << ", live block count: " << live_block_count : << ", total block count: " << total_block_count : << ", live block ratio: " << (live_block_count / total_block_count) : << ", result: " << s.ToString(); : total_blocks.fetch_add(total_block_count); : total_live_blocks.fetch_add(live_block_count); There maybe hundreds of contianers, the output is not convenient to read. How about use HdrHistogram to statistics the distributon of all containers' size, live block ratio, live block count and etc, out of the loop? http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/tools/tool_action_fs.cc@954 PS5, Line 954: WARN_NOT_OK(s, "Container handle failed"); Since it's a "fatal" error, the process will exit, so at least ERROR level log would be better. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/tools/tool_action_fs.cc@970 PS5, Line 970: return Status::RuntimeError("Some tasks failed."); It would be helpful if pass the error detail (Status) out and log it here. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/tools/tool_action_fs.cc@972 PS5, Line 972: LOG(INFO) << "Operation finished, all tasks succeed."; Is it necesary? the error will be logged above, we can check whether it's success or not. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/tools/tool_action_fs.cc@1094 PS5, Line 1094: .Description("Compact the metadata file of all log block containers") Add some description of how to use the tool. For example, 'compact' should be run before 'merge'. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/util/pb_util.h File src/kudu/util/pb_util.h: http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/util/pb_util.h@325 PS5, Line 325: void SetOffset(uint64_t offset); Add some necessary descriptions. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/util/pb_util.h@448 PS5, Line 448: void SetOffset(uint64_t offset); Add some necessary descriptions. http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/util/pb_util.cc File src/kudu/util/pb_util.cc: http://gerrit.cloudera.org:8080/#/c/19327/5/src/kudu/util/pb_util.cc@786 PS5, Line 786: DCHECK_LE(offset_, offset); Should be protected by offset_lock_? -- To view, visit http://gerrit.cloudera.org:8080/19327 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id61cb6c88d7a093a8448e2497639f5c5c094efa4 Gerrit-Change-Number: 19327 Gerrit-PatchSet: 5 Gerrit-Owner: Wang Xixu <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Comment-Date: Wed, 15 Mar 2023 12:41:36 +0000 Gerrit-HasComments: Yes
