Ashwani Raina has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22079 )

Change subject: [compaction] Add memory tracking for better OOM issues triaging
......................................................................


Patch Set 9:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@132
PS9, Line 132: op may not complete
> Is it really possible to have a major delta compaction operation left incom
This patch does not take any action to avoid OOM condition or suggest operators 
to take any action.
This is more of a warning message about looming OOM condition where fate of the 
kudu-tserver is decided by OOM killer that eventually ends up with getting 
killed by the OS if the memory consumption kept on growing.
The goal of adding this warning is to record the memory consumption by 
compaction operations to aid quicker triaging of such issues.


http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@144
PS9, Line 144: mem_consumed == 0
> nit: do we expect this method called with 0 usage at all?  If not, maybe DC
Yes, zero usage is legit in some rare erroneous cases. Removed this condition 
as Consume() and Release() are already taking care of this.


http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@193
PS9, Line 193: if (delta_iter_->memory_footprint() - 
delta_blocks_mem_before_prepare_batch > 0)
> Here and below: why to have this comparison with 0 here if UpdateMemTracker
Yes, the idea is to protect against negative values. Now that I think about it, 
I guess it is ok to just get rid of this check and just rely on 
Consume()/Release() to take care of negative values.

Calling MemoryExceededWarnMsgs unconditionally also seems fine. If server is 
still under memory pressure after a  Release call, warning will keep getting 
logged and that should be fine.


http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@268
PS9, Line 268: vector<DeltaKeyAndUpdate> out;
> Now with DeltaPreparer::delta_blocks_mem_size(), the memory footprint of th
This is not the general case and happens after Reset at line 267 potentially 
reducing memory footprint. In most of the compaction runs, this short-lived 
consumption is either 0 or, in  limited number of cases, a few bytes. Tracking 
this would make sense when prepared deltas quite often contain delete and 
reinsert mutations.


http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@271
PS9, Line 271:     // We only create a new redo delta file if we need to.
> What about tracking the usage of memory arena after the call to FilterColum
Memory arena usage is pretty low as compared to other places objects like delta 
blocks where consumption form big percentage of total usage. Arena memory used 
to allocate delta key and updates is for just reinsert and deletes and it also 
gets de-initialised soon for new base row processing.


http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_store.h
File src/kudu/tablet/delta_store.h:

http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_store.h@479
PS9, Line 479: sizeof(PreparedDelta);
> AFAIK, PreparedDelta is a structure that contains Slice:
That's a good point!
In my tests, I may have neglected data size inside Slice because of it being 
very less compared to size of PreparedDelta.
But it makes sense to include this as well. This small number can accumulate to 
a high value if number of deltas to be added is high.


http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_store.h@479
PS9, Line 479: repared_deltas_.size()
> nit: if tracking just the memory footprint of the vector itself (as impleme
prepared_deltas_ is a double ended queue which doesn't have capacity like 
std::vector.



--
To view, visit http://gerrit.cloudera.org:8080/22079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2037582d433730884212e83359bb75bad0d5394
Gerrit-Change-Number: 22079
Gerrit-PatchSet: 9
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Comment-Date: Mon, 20 Jan 2025 17:37:15 +0000
Gerrit-HasComments: Yes

Reply via email to