Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/22079 )
Change subject: [compaction] Add memory tracking for better OOM issues triaging ...................................................................... Patch Set 14: (9 comments) http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc File src/kudu/tablet/compaction-highmem-test.cc: http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc@195 PS14, Line 195: LOG(INFO) << Substitute("MajorDeltaCompaction complete. Timing: $0 Metrics: $1", : sw.elapsed().ToString(), : trace->MetricsAsJSON()); Do you think we still need this INFO output and the stopwatch timings for the automated test? http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc@252 PS14, Line 252: int64_t nit: can this be constexpr? http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc@133 PS11, Line 133: process_memory::HardLimit(), process_memory::CurrentConsumption(), : tablet_id_.c_str(), tracker_->consumption(), parent_tracker_->consumption()); : KLOG_EVERY_N_SECS(WARNING, 1) << msg > Yes, similar accounting and logging for merge rowset compaction in a separa That's nice, thanks! It would be great good to have it covered in all the relevant places we know of, indeed. http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc File src/kudu/tablet/delta_compaction.cc: http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc@177 PS14, Line 177: UpdateMemTracker(-mem.arena.memory_footprint()); UBSAN failures in pre-commit tests (http://dist-test.cloudera.org/job?job_id=jenkins-slave.1737984868.4093513): src/kudu/tablet/delta_compaction.cc:177:22: runtime error: negation of 32768 cannot be represented in type 'size_t' (aka 'unsigned long') I guess similar conditions happen elsewhere in the code below. http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc@189 PS14, Line 189: delta_iter_->memory_footprint() - delta_blocks_mem_before_prepare_batch If thinking in the lines of the UBSAN warning at line 177, does it make sense to switch to signed integers here or add explicit static_cast to protect against underflow if by chance delta_iter_->memory_footprint() turns to be lower than delta_blocks_mem_before_prepare_batch? http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc File src/kudu/util/process_memory.cc: http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69 PS14, Line 69: 105.0 Does it make sense to add a validator for this flag? http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69 PS14, Line 69: double The rest of the flags for similar thresholds have 1/100 granularity, and it's been enough. Do you think that 1/100 granularity isn't enough in this case? That's a bit unexpected to me given we were talking about going over the threshold with some solid margin. http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69 PS14, Line 69: compact IIUC, the utilities in the 'process_memory' namespace and corresponding flags don't differentiate between compaction, processing RPCs, and other activity, so seeing this compaction-specific name for the flag is a bit strange. If you really want to introduce such specifics in here, maybe add a TODO note on removing this flag once compaction logic starts honoring the hard memory limit setting? http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@72 PS14, Line 72: advanced Should we consider this a temporary flag that will be removed once compaction honors the hard memory limit? If yes, then also add 'experimental' tag as well. And it's a runtime flag by its nature, correct? -- To view, visit http://gerrit.cloudera.org:8080/22079 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2037582d433730884212e83359bb75bad0d5394 Gerrit-Change-Number: 22079 Gerrit-PatchSet: 14 Gerrit-Owner: Ashwani Raina <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Comment-Date: Tue, 28 Jan 2025 03:34:55 +0000 Gerrit-HasComments: Yes
