Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/24092 )
Change subject: [fs] add metrics for untracked orphaned blocks ...................................................................... Patch Set 5: (16 comments) http://gerrit.cloudera.org:8080/#/c/24092/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/24092/5//COMMIT_MSG@35 PS5, Line 35: severly severely http://gerrit.cloudera.org:8080/#/c/24092/5//COMMIT_MSG@53 PS5, Line 53: backgrounnd background http://gerrit.cloudera.org:8080/#/c/24092/5//COMMIT_MSG@57 PS5, Line 57: warning For some reason, those are ERROR logs, not WARNING. Is it intentional? http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@a3343 PS5, Line 3343: This comment is removed in PS5, but the corresponding metrics aren't present still. IIUC, the only added metrics track failures of the blocks' GC-ing, but the blocks that failed to be added into the metadata intent records aren't tracked as of PS5. http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3350 PS5, Line 3350: // Blocks whose deletion metadata could not be written are permanently : // orphaned i.e., their data occupies space on disk. They will not be cleaned : // up automatically and may cause unclaimed disk space accumulation over time. Shouldn't we track those using metrics as well? http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3357 PS5, Line 3357: int Can we ever expect an overflow here with 32-bit integers? http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3360 PS5, Line 3360: "Block $0 in container $1 could not have its deletion record " : "committed and is now an orphaned block that occupies " : "untracked disk space: $2. Run 'kudu fs check --repair' to " : "reclaim this disk space. Could you make this message more concise, but still actionable? http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3366 PS5, Line 3366: // Always emit a summary so the total orphaned count is never suppressed, : // even when the per-block messages above are rate-limited. What if there are thousands of containers there since that was a failed data directory? http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3369 PS5, Line 3369: "$0 block(s) in container $1 could not have their deletion records " : "committed and are now orphaned blocks occupying unreclaimed disk space: " : "$2. Run 'kudu fs check --repair' to reclaim this disk space.", nit: make this more concise, but still actionable. http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3417 PS5, Line 3417: KLOG_EVERY_N_SECS(INFO, 1) << Substitute("Block $0 is in a failed directory; not deleting", Is this update also related to this patch? If not, avoid introducing such updates since these increase risk of textual conflicts when trying to cherry-pick/merge logically independent patches. http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata-test.cc File src/kudu/tablet/tablet_metadata-test.cc: http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata-test.cc@454 PS2, Line 454: // therefore unaffected. Also, Tablet::Flush() still returns OK overall. nit: if possible, please avoid updating non-related code since it introduces higher risk of textual conflicts when cherry-picking or merging patches http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc@562 PS5, Line 562: Scheduling $0 block(s) for orphaned cleanup nit: ... scheduling $0 orphaned block(s) for cleanup ... ? http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc@575 PS5, Line 575: LOG_WITH_PREFIX(WARNING) << "Not deleting " << blocks.size() : << " orphaned block(s) from disk. Block deletion disabled via " : << "--enable_tablet_orphaned_block_deletion=false"; nit: if updating this message, please switch to using Substitute() here as well http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc@581 PS5, Line 581: OrphanBlockCleanupStats stats; nit: move this closer to the place of the usage http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc@595 PS5, Line 595: ERROR There used to be a log message of WARNING priority. Is the change to the ERROR priority intentional? http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc@596 PS5, Line 596: "Failed to clean up $0/$1 orphaned block(s): $2. The failed block(s) are removed from " : "the superblock but their disk space may be unreclaimed. Run 'kudu fs check --repair' " : "to manually reclaim the disk space. nit: please make it more concise while keeping it actionable, e.g. "$2: failed cleaning up $0 of $1 removed blocks; running 'kudu fs check --repair' can help to reclaim disk space" -- To view, visit http://gerrit.cloudera.org:8080/24092 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id386d9fc8d0900839e229e66772f35299b3ef2e9 Gerrit-Change-Number: 24092 Gerrit-PatchSet: 5 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Zoltan Martonka <[email protected]> Gerrit-Comment-Date: Tue, 07 Apr 2026 22:20:40 +0000 Gerrit-HasComments: Yes
