Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22079 )

Change subject: [compaction] Add memory tracking for better OOM issues triaging
......................................................................


Patch Set 9:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/compaction-highmem-test.cc
File src/kudu/tablet/compaction-highmem-test.cc:

http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/compaction-highmem-test.cc@261
PS9, Line 261: } // namespace tablet
Do you think it makes sense adding a test scenario to make sure the usage of 
the 'major_compaction' global memtracker behaves as expected?

At least, making sure the following holds true:
  * once a major compaction operation starts and running, the usage of 
'major_compaction' increases
  * after a single major compaction is completed and corresponding objects are 
destroyed, the usage of the 'major_compaction' global memtracker is the same as 
it was before starting the op


http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@132
PS9, Line 132: op may not complete
Is it really possible to have a major delta compaction operation left 
incomplete due to insufficient memory while the process (kudu-master or 
kudu-tserver) continues chugging along?  May a major compaction operation be 
aborted when it runs under memory pressure, or the idea was to warn about 
looming OOM condition where the whole process might crash or be killed by the 
OS?

I'm trying to understand how actionable this message is.  I don't know much 
about compaction internals, and seems a little vague to me.  What cluster 
operators are supposed to do when they see this in the logs?


http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@144
PS9, Line 144: mem_consumed == 0
nit: do we expect this method called with 0 usage at all?  If not, maybe 
DCHECK_NE(0, mem_consumed) is a better way to handle this?  If yes, then I'd 
expect that's quite a rare occurrence, so maybe it make sense to wrap this into 
PREDICT_FALSE then?

>From the other side, looking at MemTracker::Consume() and 
>MemTracker::Release(), I can see the case of zero usage is legit and already 
>handled appropriately -- if so, then why to have this extra check at a higher 
>level?


http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@193
PS9, Line 193: if (delta_iter_->memory_footprint() - 
delta_blocks_mem_before_prepare_batch > 0)
Here and below: why to have this comparison with 0 here if UpdateMemTracker is 
already checking that usage isn't 0?  Or the idea was to protect against 
negative values?


http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@271
PS9, Line 271:     // We only create a new redo delta file if we need to.
What about tracking the usage of memory arena after the call to 
FilterColumnIdsAndCollectDeltas()?


http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_store.h
File src/kudu/tablet/delta_store.h:

http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_store.h@479
PS9, Line 479: sizeof(PreparedDelta);
AFAIK, PreparedDelta is a structure that contains Slice:

struct PreparedDelta {
  DeltaKey key;
  Slice val;
};

sizeof(Slice) might be minuscule compared with the size of the data that it 
actually points to, and DeltaKey is also small.  Just do double check: so, here 
the idea is to track the memory footprint of the vector containing such 
structures, while the memory footprint of the actual data that Slices point at 
is accounted for somewhere else, right?



--
To view, visit http://gerrit.cloudera.org:8080/22079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2037582d433730884212e83359bb75bad0d5394
Gerrit-Change-Number: 22079
Gerrit-PatchSet: 9
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Comment-Date: Fri, 17 Jan 2025 08:04:05 +0000
Gerrit-HasComments: Yes

Reply via email to