Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/24092 )
Change subject: [fs] add metrics for untracked orphaned blocks ...................................................................... Patch Set 3: (13 comments) http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/fs/log_block_manager.cc@2528 PS2, Line 2528: return deletion_transaction->CommitDeletedBlocks(nullptr); > Why not to move this report/warning piece closer to the source of the non-O Actually, RemoveLogBlocks already has such log message that makes this log redundant. Removed. http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/fs/log_block_manager.cc@2533 PS2, Line 2533: > There might be many blocks failing to delete if they are on a failed data d This log message is removed now so the comment is not applicable here but it is definitely applicable to the similar log message in RemoveLogBlocks(). Have applied rate limit to the log message in RemoveLogBlocks(). http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/fs/log_block_manager.cc@2534 PS2, Line 2534: > I'm not sure this is always due to a failure writing metadata: e.g., what a No applicable now that whole log message has been removed. http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.h File src/kudu/tablet/tablet_metadata.h: http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.h@189 PS2, Line 189: struct OrphanBlockCleanupStats { : // Number of orphaned blocks successfully deleted from dis > nit: why to mention this here at all when this belongs rather to docs on pa Removed. http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.h@192 PS2, Line 192: // Number of orphaned blocks that could not be deleted (e.g. I/O error). : // These are removed from the superblock regardless (see DeleteOrphanedBlocks), : // so any unreclaimed disk space must be recovered via 'kudu fs check --repair'. : int64_t blocks_failed = 0; : }; > What's the reason behind using signed integers for counters when using unsi Instead of 'int', it should be int64_t to make this exactly same as input parameter data type of IncrementBy() method, so no implicit conversion is required. That also aligns with what is done at other places in the code (e.g. tablet_bytes_deleted). http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.cc@565 PS2, Line 565: begin(), block_ids.end()); > nit: remove the enclosing parentheses? Done http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.cc@572 PS2, Line 572: } > style nit: always use scope braces for 'if', similar to 'for' cycle Done http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.h File src/kudu/tablet/tablet_metrics.h: http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.h@87 PS2, Line 87: underlying error. > nit: does the related code even capable of differentiating between I/O and Not really. Replaced with: "Number of orphaned blocks that could not be deleted due to underlying errors." http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.h@88 PS2, Line 88: ped_refptr<Counter> orphaned_block_cleanup_failures > This is confusing. Sustained by which means? These counters are reset to This is a leftover from PS1 where orphaned blocks were kept in the set for next flush call to retry the delete on those. PS2 removed that logic and made this change only address the non-functional entity of adding metrics and no change in behavior. However, even with PS1, using 'sustained' might sound confusing. I have removed this line to avoid any ambiguity and keep the purpose of this metric clear. http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.h@92 PS2, Line 92: > delta_file_lookups_per_op; > If introducing this pairing metric for orphaned_block_cleanup_failures, why I have removed this metric because of the reason stated here: https://gerrit.cloudera.org/#/c/24092/2/src/kudu/tablet/tablet_metadata-test.cc@441 http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.h@94 PS2, Line 94: scoped_refptr<Histogram> commit_wait_duration; > Does it make sense to introduce a gauge to reflect on the size of the block It would certainly align with the semantics of data_retained_bytes_ in maintenance manager. I will try to address this in 24125 patch. http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.cc File src/kudu/tablet/tablet_metrics.cc: http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.cc@241 PS2, Line 241: if the tablet is not fully compacted. High frequency of " : "high values may indicate that compaction is falling behind.", : kudu::MetricLevel::kDebug, : 20, 2); > Move this out and merge into the description of the orphaned_block_cleanup_ I have removed orphaned_block_cleanup_failures_bytes metric altogether. Refer to this comment for more details: https://gerrit.cloudera.org/#/c/24092/2/src/kudu/tablet/tablet_metadata-test.cc@441 http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.cc@245 PS2, Line 245: > FWIW, it's already enough to have orphaned_block_cleanup_failures as a metr Same response as stated above. -- To view, visit http://gerrit.cloudera.org:8080/24092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id386d9fc8d0900839e229e66772f35299b3ef2e9 Gerrit-Change-Number: 24092 Gerrit-PatchSet: 3 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Zoltan Martonka <[email protected]> Gerrit-Comment-Date: Thu, 26 Mar 2026 11:14:03 +0000 Gerrit-HasComments: Yes
