Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/22079 )
Change subject: [compaction] Add memory tracking for better OOM issues triaging ...................................................................... Patch Set 14: (9 comments) http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc File src/kudu/tablet/compaction-highmem-test.cc: http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc@195 PS14, Line 195: LOG(INFO) << Substitute("MajorDeltaCompaction complete. Timing: $0 Metrics: $1", : sw.elapsed().ToString(), : trace->MetricsAsJSON()); > Do you think we still need this INFO output and the stopwatch timings for t Just for additional information as we would see at the end of compaction operations normally . But not vital to the test as such. Removed. http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc@252 PS14, Line 252: int64_t > nit: can this be constexpr? Done http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc@133 PS11, Line 133: process_memory::HardLimit(), process_memory::CurrentConsumption(), : tablet_id_.c_str(), tracker_->consumption(), parent_tracker_->consumption()); : KLOG_EVERY_N_SECS(WARNING, 1) << msg > That's nice, thanks! It would be great good to have it covered in all the r Ack http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc@177 PS14, Line 177: UpdateMemTracker(-mem.arena.memory_footprint()); > UBSAN failures in pre-commit tests (http://dist-test.cloudera.org/job?job_i Oh yes! I guess this needs to be dealt with wherever applicable. I wonder why this error didn't pop-up during last run when line 255 change was added in previous PS. http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc@189 PS14, Line 189: delta_iter_->memory_footprint() - delta_blocks_mem_before_prepare_batch > If thinking in the lines of the UBSAN warning at line 177, does it make sen Currently, this value going negative doesn't seem to be a possibility, however, it may happen in future if PrepareBatch does any de-allocation of memory from arena. It is better to add explicit static_cast instead of switching to signed integers because that will introduce additional changes related to separate handling of allocate and release when calling UpdateMemTracker. http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc File src/kudu/util/process_memory.cc: http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69 PS14, Line 69: double > The rest of the flags for similar thresholds have 1/100 granularity, and it For normal scenarios, 1/100 granularity may suffice but this is mainly for unit test computation. Since unit test is working on small data generation when compared to 1GB of hard limit set in the binary, setting this flag to 0.29296875 makes it hit the threshold condition in order for test to verify the logging, etc. There is another way to deal with this by having this test moved to a different binary or a new one altogether where hard limit flag can be lowered down further in order to keep percentage at 1/100 granularity. I would prefer not to spend too much time on refining this further. http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69 PS14, Line 69: compact > IIUC, the utilities in the 'process_memory' namespace and corresponding fla I just want to point out that this flag is used for logging warning messages during compaction op only. Adding a TODO doesn't harm though. http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69 PS14, Line 69: 105.0 > Does it make sense to add a validator for this flag? I don't think that is necessary in case we need to change this, maybe make it lower than 100%. That is why I kept the definition as generic and not specifically mention that this is always going to be over hard limit. http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@72 PS14, Line 72: advanced > Should we consider this a temporary flag that will be removed once compacti Yes, this is a runtime flag by nature. Changed to experimental. -- To view, visit http://gerrit.cloudera.org:8080/22079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2037582d433730884212e83359bb75bad0d5394 Gerrit-Change-Number: 22079 Gerrit-PatchSet: 14 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Comment-Date: Tue, 28 Jan 2025 08:44:30 +0000 Gerrit-HasComments: Yes
