Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24092 )

Change subject: [fs] add metrics for untracked orphaned blocks
......................................................................


Patch Set 5:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/24092/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/24092/5//COMMIT_MSG@35
PS5, Line 35: severly
severely


http://gerrit.cloudera.org:8080/#/c/24092/5//COMMIT_MSG@53
PS5, Line 53: backgrounnd
background


http://gerrit.cloudera.org:8080/#/c/24092/5//COMMIT_MSG@57
PS5, Line 57: warning
For some reason, those are ERROR logs, not WARNING.  Is it intentional?


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@a3343
PS5, Line 3343:
This comment is removed in PS5, but the corresponding metrics aren't present 
still.  IIUC, the only added metrics track failures of the blocks' GC-ing, but 
the blocks that failed to be added into the metadata intent records aren't 
tracked as of PS5.


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3350
PS5, Line 3350:       // Blocks whose deletion metadata could not be written 
are permanently
              :       // orphaned i.e., their data occupies space on disk. They 
will not be cleaned
              :       // up automatically and may cause unclaimed disk space 
accumulation over time.
Shouldn't we track those using metrics as well?


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3357
PS5, Line 3357: int
Can we ever expect an overflow here with 32-bit integers?


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3360
PS5, Line 3360:             "Block $0 in container $1 could not have its 
deletion record "
              :             "committed and is now an orphaned block that 
occupies "
              :             "untracked disk space: $2. Run 'kudu fs check 
--repair' to "
              :             "reclaim this disk space.
Could you make this message more concise, but still actionable?


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3366
PS5, Line 3366:       // Always emit a summary so the total orphaned count is 
never suppressed,
              :       // even when the per-block messages above are 
rate-limited.
What if there are thousands of containers there since that was a failed data 
directory?


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3369
PS5, Line 3369:           "$0 block(s) in container $1 could not have their 
deletion records "
              :           "committed and are now orphaned blocks occupying 
unreclaimed disk space: "
              :           "$2. Run 'kudu fs check --repair' to reclaim this 
disk space.",
nit: make this more concise, but still actionable.


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/fs/log_block_manager.cc@3417
PS5, Line 3417:       KLOG_EVERY_N_SECS(INFO, 1) << Substitute("Block $0 is in 
a failed directory; not deleting",
Is this update also related to this patch?  If not, avoid introducing such 
updates since these increase risk of textual conflicts when trying to 
cherry-pick/merge logically independent patches.


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata-test.cc
File src/kudu/tablet/tablet_metadata-test.cc:

http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata-test.cc@454
PS2, Line 454:     // therefore unaffected. Also, Tablet::Flush() still returns 
OK overall.
nit: if possible, please avoid updating non-related code since it introduces 
higher risk of textual conflicts when cherry-picking or merging patches


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc@562
PS5, Line 562: Scheduling $0 block(s) for orphaned cleanup
nit: ... scheduling $0 orphaned block(s) for cleanup ... ?


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc@575
PS5, Line 575:     LOG_WITH_PREFIX(WARNING) << "Not deleting " << blocks.size()
             :         << " orphaned block(s) from disk. Block deletion 
disabled via "
             :         << "--enable_tablet_orphaned_block_deletion=false";
nit: if updating this message, please switch to using Substitute() here as well


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc@581
PS5, Line 581:   OrphanBlockCleanupStats stats;
nit: move this closer to the place of the usage


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc@595
PS5, Line 595: ERROR
There used to be a log message of WARNING priority.  Is the change to the ERROR 
priority intentional?


http://gerrit.cloudera.org:8080/#/c/24092/5/src/kudu/tablet/tablet_metadata.cc@596
PS5, Line 596:         "Failed to clean up $0/$1 orphaned block(s): $2. The 
failed block(s) are removed from "
             :         "the superblock but their disk space may be unreclaimed. 
Run 'kudu fs check --repair' "
             :         "to manually reclaim the disk space.
nit: please make it more concise while keeping it actionable, e.g.

"$2: failed cleaning up $0 of $1 removed blocks; running 'kudu fs check 
--repair' can help to reclaim disk space"



--
To view, visit http://gerrit.cloudera.org:8080/24092
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id386d9fc8d0900839e229e66772f35299b3ef2e9
Gerrit-Change-Number: 24092
Gerrit-PatchSet: 5
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Zoltan Martonka <[email protected]>
Gerrit-Comment-Date: Tue, 07 Apr 2026 22:20:40 +0000
Gerrit-HasComments: Yes

Reply via email to