Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22079 )

Change subject: [compaction] Add memory tracking for better OOM issues triaging
......................................................................


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_compaction.cc@130
PS10, Line 130: if (process_memory::UnderMemoryPressure(&capacity_pct)) {
> My thought process is from the perspective that we want to emit warnings be
AFAIK, in non-cgroup case, the OOM killer isn't about the exact amount of 
memory vs some threshold, but it's more about finding a process with highest 
memory consumption. So that's more about memory over/under provisioning for a 
node considering its 'regular' workload (assuming no cgroup-based limits 
present).  The OOM killer picks one of the processes with highest memory 
consumption, and it could happen that kudu-tserver is only at 50% of its hard 
memory limit but still it's a target because no other process consumed so much 
memory.  IIRC, even in case of cgroup-controlled environments there is some 
margin, but I might be missing something.

>From that perspective, I'd think that option 2 is the better one, and I'd keep 
>logging interval at 1 second.


http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc@133
PS11, Line 133:           "MajorDeltaCompaction op may not complete due to lack 
of sufficient memory "
              :           "(at %.2f%% of capacity), current compaction 
consumption: %ld, total consumption "
              :           "for all running major delta compaction ops: %ld, 
tablet: %s"
> How about this?
Looks better, but I'd make it a bit more concise.  Maybe, something similar to 
the following:

"Beyond hard memory limit of {} with current consumption at {}; OOM condition 
is possible. Major delta compaction ops consumption: tablet-{} {}, total {}".

>From the other side, in this context we don't know much about total amount of 
>memory, over/under provision of memory compared with the amount required, 
>cgroup-based limits (if any), etc.  So, maybe we better drop the OOM part and 
>just report on what we definitely know about, i.e. something like

"Beyond hard memory limit of {} with current consumption at {}. Major delta 
compaction ops consumption: tablet-{} {}, total {}".


BTW, do you plan to add similar memory accounting (via MemTracker?) and logging 
for merge rowset compaction as well?



--
To view, visit http://gerrit.cloudera.org:8080/22079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2037582d433730884212e83359bb75bad0d5394
Gerrit-Change-Number: 22079
Gerrit-PatchSet: 11
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Comment-Date: Sat, 25 Jan 2025 06:28:51 +0000
Gerrit-HasComments: Yes

Reply via email to