Ashwani Raina has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24092 )

Change subject: [fs] add metrics for untracked orphaned blocks
......................................................................


Patch Set 3:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/fs/log_block_manager.cc@2528
PS2, Line 2528:   return deletion_transaction->CommitDeletedBlocks(nullptr);
> Why not to move this report/warning piece closer to the source of the non-O
Actually, RemoveLogBlocks already has such log message that makes this log 
redundant. Removed.


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/fs/log_block_manager.cc@2533
PS2, Line 2533:
> There might be many blocks failing to delete if they are on a failed data d
This log message is removed now so the comment is not applicable here but it is 
definitely applicable to the similar log message in RemoveLogBlocks().

Have applied rate limit to the log message in RemoveLogBlocks().


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/fs/log_block_manager.cc@2534
PS2, Line 2534:
> I'm not sure this is always due to a failure writing metadata: e.g., what a
No applicable now that whole log message has been removed.


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.h
File src/kudu/tablet/tablet_metadata.h:

http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.h@189
PS2, Line 189:   struct OrphanBlockCleanupStats {
             :     // Number of orphaned blocks successfully deleted from dis
> nit: why to mention this here at all when this belongs rather to docs on pa
Removed.


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.h@192
PS2, Line 192:     // Number of orphaned blocks that could not be deleted (e.g. 
I/O error).
             :     // These are removed from the superblock regardless (see 
DeleteOrphanedBlocks),
             :     // so any unreclaimed disk space must be recovered via 'kudu 
fs check --repair'.
             :     int64_t blocks_failed = 0;
             :   };
> What's the reason behind using signed integers for counters when using unsi
Instead of 'int', it should be int64_t to make this exactly same as input 
parameter data type of IncrementBy() method, so no implicit conversion is 
required. That also aligns with what is done at other places in the code (e.g. 
tablet_bytes_deleted).


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.cc
File src/kudu/tablet/tablet_metadata.cc:

http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.cc@565
PS2, Line 565: begin(), block_ids.end());
> nit: remove the enclosing parentheses?
Done


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metadata.cc@572
PS2, Line 572: }
> style nit: always use scope braces for 'if', similar to 'for' cycle
Done


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.h
File src/kudu/tablet/tablet_metrics.h:

http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.h@87
PS2, Line 87: underlying error.
> nit: does the related code even capable of differentiating between I/O and
Not really. Replaced with:
"Number of orphaned blocks that could not be deleted due to  underlying errors."


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.h@88
PS2, Line 88: ped_refptr<Counter> orphaned_block_cleanup_failures
> This is confusing.  Sustained by which means?  These counters are reset to
This is a leftover from PS1 where orphaned blocks were kept in the set for next 
flush call to retry the delete on those.   PS2 removed that logic and made this 
change only address the non-functional entity of adding metrics and no change 
in behavior.

However, even with PS1, using 'sustained' might sound confusing. I have removed 
this line to avoid any ambiguity and keep the purpose of this metric clear.


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.h@92
PS2, Line 92: > delta_file_lookups_per_op;
> If introducing this pairing metric for orphaned_block_cleanup_failures, why
I have removed this metric because of the reason stated here:
https://gerrit.cloudera.org/#/c/24092/2/src/kudu/tablet/tablet_metadata-test.cc@441


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.h@94
PS2, Line 94:   scoped_refptr<Histogram> commit_wait_duration;
> Does it make sense to introduce a gauge to reflect on the size of the block
It would certainly align with the semantics of data_retained_bytes_ in 
maintenance manager. I will try to address this in 24125 patch.


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.cc
File src/kudu/tablet/tablet_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.cc@241
PS2, Line 241: if the tablet is not fully compacted. High frequency of "
             :                         "high values may indicate that 
compaction is falling behind.",
             :                         kudu::MetricLevel::kDebug,
             :                         20, 2);
> Move this out and merge into the description of the orphaned_block_cleanup_
I have removed orphaned_block_cleanup_failures_bytes metric altogether. Refer 
to this comment for more details:
https://gerrit.cloudera.org/#/c/24092/2/src/kudu/tablet/tablet_metadata-test.cc@441


http://gerrit.cloudera.org:8080/#/c/24092/2/src/kudu/tablet/tablet_metrics.cc@245
PS2, Line 245:
> FWIW, it's already enough to have orphaned_block_cleanup_failures as a metr
Same response as stated above.



--
To view, visit http://gerrit.cloudera.org:8080/24092
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id386d9fc8d0900839e229e66772f35299b3ef2e9
Gerrit-Change-Number: 24092
Gerrit-PatchSet: 3
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Zoltan Martonka <[email protected]>
Gerrit-Comment-Date: Thu, 26 Mar 2026 11:14:03 +0000
Gerrit-HasComments: Yes

Reply via email to