Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/22079 )
Change subject: [compaction] Add memory tracking for better OOM issues triaging ...................................................................... Patch Set 9: (7 comments) http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@132 PS9, Line 132: op may not complete > Is it really possible to have a major delta compaction operation left incom This patch does not take any action to avoid OOM condition or suggest operators to take any action. This is more of a warning message about looming OOM condition where fate of the kudu-tserver is decided by OOM killer that eventually ends up with getting killed by the OS if the memory consumption kept on growing. The goal of adding this warning is to record the memory consumption by compaction operations to aid quicker triaging of such issues. http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@144 PS9, Line 144: mem_consumed == 0 > nit: do we expect this method called with 0 usage at all? If not, maybe DC Yes, zero usage is legit in some rare erroneous cases. Removed this condition as Consume() and Release() are already taking care of this. http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@193 PS9, Line 193: if (delta_iter_->memory_footprint() - delta_blocks_mem_before_prepare_batch > 0) > Here and below: why to have this comparison with 0 here if UpdateMemTracker Yes, the idea is to protect against negative values. Now that I think about it, I guess it is ok to just get rid of this check and just rely on Consume()/Release() to take care of negative values. Calling MemoryExceededWarnMsgs unconditionally also seems fine. If server is still under memory pressure after a Release call, warning will keep getting logged and that should be fine. http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@268 PS9, Line 268: vector<DeltaKeyAndUpdate> out; > Now with DeltaPreparer::delta_blocks_mem_size(), the memory footprint of th This is not the general case and happens after Reset at line 267 potentially reducing memory footprint. In most of the compaction runs, this short-lived consumption is either 0 or, in limited number of cases, a few bytes. Tracking this would make sense when prepared deltas quite often contain delete and reinsert mutations. http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@271 PS9, Line 271: // We only create a new redo delta file if we need to. > What about tracking the usage of memory arena after the call to FilterColum Memory arena usage is pretty low as compared to other places objects like delta blocks where consumption form big percentage of total usage. Arena memory used to allocate delta key and updates is for just reinsert and deletes and it also gets de-initialised soon for new base row processing. http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_store.h File src/kudu/tablet/delta_store.h: http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_store.h@479 PS9, Line 479: sizeof(PreparedDelta); > AFAIK, PreparedDelta is a structure that contains Slice: That's a good point! In my tests, I may have neglected data size inside Slice because of it being very less compared to size of PreparedDelta. But it makes sense to include this as well. This small number can accumulate to a high value if number of deltas to be added is high. http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_store.h@479 PS9, Line 479: repared_deltas_.size() > nit: if tracking just the memory footprint of the vector itself (as impleme prepared_deltas_ is a double ended queue which doesn't have capacity like std::vector. -- To view, visit http://gerrit.cloudera.org:8080/22079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2037582d433730884212e83359bb75bad0d5394 Gerrit-Change-Number: 22079 Gerrit-PatchSet: 9 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Comment-Date: Mon, 20 Jan 2025 17:37:15 +0000 Gerrit-HasComments: Yes
