Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/22079 )
Change subject: [compaction] Add memory tracking for better OOM issues triaging ...................................................................... Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_compaction.cc@130 PS10, Line 130: if (process_memory::UnderMemoryPressure(&capacity_pct)) { > My thought process is from the perspective that we want to emit warnings be AFAIK, in non-cgroup case, the OOM killer isn't about the exact amount of memory vs some threshold, but it's more about finding a process with highest memory consumption. So that's more about memory over/under provisioning for a node considering its 'regular' workload (assuming no cgroup-based limits present). The OOM killer picks one of the processes with highest memory consumption, and it could happen that kudu-tserver is only at 50% of its hard memory limit but still it's a target because no other process consumed so much memory. IIRC, even in case of cgroup-controlled environments there is some margin, but I might be missing something. >From that perspective, I'd think that option 2 is the better one, and I'd keep >logging interval at 1 second. http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc@133 PS11, Line 133: "MajorDeltaCompaction op may not complete due to lack of sufficient memory " : "(at %.2f%% of capacity), current compaction consumption: %ld, total consumption " : "for all running major delta compaction ops: %ld, tablet: %s" > How about this? Looks better, but I'd make it a bit more concise. Maybe, something similar to the following: "Beyond hard memory limit of {} with current consumption at {}; OOM condition is possible. Major delta compaction ops consumption: tablet-{} {}, total {}". >From the other side, in this context we don't know much about total amount of >memory, over/under provision of memory compared with the amount required, >cgroup-based limits (if any), etc. So, maybe we better drop the OOM part and >just report on what we definitely know about, i.e. something like "Beyond hard memory limit of {} with current consumption at {}. Major delta compaction ops consumption: tablet-{} {}, total {}". BTW, do you plan to add similar memory accounting (via MemTracker?) and logging for merge rowset compaction as well? -- To view, visit http://gerrit.cloudera.org:8080/22079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2037582d433730884212e83359bb75bad0d5394 Gerrit-Change-Number: 22079 Gerrit-PatchSet: 11 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Comment-Date: Sat, 25 Jan 2025 06:28:51 +0000 Gerrit-HasComments: Yes
