Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/22079 )
Change subject: [compaction] Add memory tracking for better OOM issues triaging ...................................................................... Patch Set 11: (9 comments) http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_compaction.h File src/kudu/tablet/delta_compaction.h: http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_compaction.h@102 PS10, Line 102: void MemoryExceededWarnMsgs(); > nit: add a comment for the new method, so it's documented similar to the re Done http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_compaction.h@104 PS10, Line 104: enum MemTrackerAction { : CONSUME, : RELEASE : }; > Is this still needed? Somehow missed to remove this. Removed now. http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_compaction.h@109 PS10, Line 109: // Update memory tracker for compaction operation for the tablet under process. > nit: does it make sense to mention about logging warning messages as well? Done http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_compaction.h@111 PS10, Line 111: FsManager* const fs_manager_; > nit: add an empty line to separate the declaration of the UpdateMemTracker( Done http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: http://gerrit.cloudera.org:8080/#/c/22079/9/src/kudu/tablet/delta_compaction.cc@193 PS9, Line 193: > s/them// Ack http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_compaction.cc@130 PS10, Line 130: if (process_memory::UnderMemoryPressure(&capacity_pct)) { > > The main reason behind comparing the current consumption against memory_l My thought process is from the perspective that we want to emit warnings before crossing the hard limit because once process memory crosses the hard limit, it may not even reach significant margin limit before getting killed by OOM killer. If that happens, all the information about memory usage numbers from current compaction, all compaction ops, etc. would be missing from the logs for post-mortem analysis. However, you have a valid point here from the perspective of limiting the warning messages as much as possible. To address this, there are two options: 1. With current logic, limit the number of warning messages by increasing the logginginterval from 1 second to say 5 seconds. 2. Instead of significant margin (25%), it could be small margin like 5% along with logging interval of 2 seconds. How does that sound? http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc@133 PS11, Line 133: "MajorDeltaCompaction op may not complete due to lack of sufficient memory " : "(at %.2f%% of capacity), current compaction consumption: %ld, total consumption " : "for all running major delta compaction ops: %ld, tablet: %s" > Maybe, be more direct of the expected OOM condition in this message? The p How about this? "Approaching towards an out-of-memory condition that may impact the running operations requiring more " "memory in order to proceed, including current MajorDeltaCompaction op. " "Memory consumption is at %.2f%% of capacity, current compaction consumption: %ld for tablet: %s, total " "consumption for all running major delta compaction ops: %ld", http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc@176 PS11, Line 176: // 1) Get the next batch of base data for the columns we're compacting. > Does it make sense to UpdateMemTracker() with the arena's memory usage when Yes, I missed to add UpdateMemTracker call here when made changes to address this comment: https://gerrit.cloudera.org/c/22079/9/src/kudu/tablet/delta_compaction.cc#271 by adding UpdateMemTracker call at line 263. Since this arena memory is short-lived and is almost negligible most of the times, I had skipped calling UpdateMemTracker() here and on line 263. I guess it is better to account for each update even if the allocation size is negligible. http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_store.h File src/kudu/tablet/delta_store.h: http://gerrit.cloudera.org:8080/#/c/22079/10/src/kudu/tablet/delta_store.h@479 PS10, Line 479: int64_t slice_data_size_total = 0; : int64_t delta_obj_size_total = prepared_deltas_.size() * sizeof(PreparedDelta); > nit: could we compute the result of this function/method using just one tem Done -- To view, visit http://gerrit.cloudera.org:8080/22079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2037582d433730884212e83359bb75bad0d5394 Gerrit-Change-Number: 22079 Gerrit-PatchSet: 11 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Comment-Date: Fri, 24 Jan 2025 13:22:41 +0000 Gerrit-HasComments: Yes
