Ashwani Raina has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22079 )

Change subject: [compaction] Add memory tracking for better OOM issues triaging
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/22079/3/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/22079/3/src/kudu/tablet/delta_compaction.cc@221
PS3, Line 221:                << CompactionInputRowToString(*input_row);
> At L142, we check the value of mem_consumed before updating the memory trac
The difference is to find out the memory in bytes that was allocated during 
this iteration of ApplyMutations. For all iterations of 
ApplyMutationsAndGenerateUndos per rowblock of base data, the memory arena 
remains the same. Therefore, the footprint keeps on increasing from previous 
value after each ApplyMutationsAndGenerateUndos iteration for a rowblock.

The main purpose of updating the memory tracker frequently is to ensure that 
granularity of jumps in space consumption are recorded as early as possible. If 
we do single update after calling ApplyMutationsAndGenerateUndos, we may have 
already crossed the limit and even worse hit the OOM condition as well without 
logging the warning message. In such a case, recording the memory usage becomes 
moot.

I also noticed that delta iterator memory footprint also gets accumulated after 
each new rowblock iteration. That means, I need to update the memory tracker 
there (at L190) using the difference instead of absolute value. I will rev the 
patch with that update.


http://gerrit.cloudera.org:8080/#/c/22079/4/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/22079/4/src/kudu/tablet/delta_compaction.cc@132
PS4, Line 132: MajorDeltaCompaction op may not complete due to lack of 
sufficient memory
> It is not very clear what this warning message is for.
It may be possible that major delta compaction ops caused most of the memory 
consumption that resulted in making it insufficient.

The point is to log the fact that memory may become insufficient for a 
MajorDelta compaction to complete so that when we hit OOM issues, we know which 
major delta compaction op faced this error. As the commit message clearly 
states, this will help in debugging OOM issues.



--
To view, visit http://gerrit.cloudera.org:8080/22079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2037582d433730884212e83359bb75bad0d5394
Gerrit-Change-Number: 22079
Gerrit-PatchSet: 4
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Comment-Date: Wed, 18 Dec 2024 14:03:52 +0000
Gerrit-HasComments: Yes

Reply via email to