Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22079 )

Change subject: [compaction] Add memory tracking for better OOM issues triaging
......................................................................


Patch Set 14:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc
File src/kudu/tablet/compaction-highmem-test.cc:

http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc@195
PS14, Line 195:   LOG(INFO) << Substitute("MajorDeltaCompaction complete. 
Timing: $0 Metrics: $1",
              :                           sw.elapsed().ToString(),
              :                           trace->MetricsAsJSON());
Do you think we still need this INFO output and the stopwatch timings for the 
automated test?


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc@252
PS14, Line 252: int64_t
nit: can this be constexpr?


http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc@133
PS11, Line 133:         process_memory::HardLimit(), 
process_memory::CurrentConsumption(),
              :         tablet_id_.c_str(), tracker_->consumption(), 
parent_tracker_->consumption());
              :     KLOG_EVERY_N_SECS(WARNING, 1) << msg
> Yes, similar accounting and logging for merge rowset compaction in a separa
That's nice, thanks! It would be great good to have it covered in all the 
relevant places we know of, indeed.


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc@177
PS14, Line 177:     UpdateMemTracker(-mem.arena.memory_footprint());
UBSAN failures in pre-commit tests 
(http://dist-test.cloudera.org/job?job_id=jenkins-slave.1737984868.4093513):

  src/kudu/tablet/delta_compaction.cc:177:22: runtime error: negation of 32768 
cannot be represented in type 'size_t' (aka 'unsigned long')

I guess similar conditions happen elsewhere in the code below.


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc@189
PS14, Line 189: delta_iter_->memory_footprint() - 
delta_blocks_mem_before_prepare_batch
If thinking in the lines of the UBSAN warning at line 177, does it make sense 
to switch to signed integers here or add explicit static_cast to protect 
against underflow if by chance delta_iter_->memory_footprint() turns to be 
lower than delta_blocks_mem_before_prepare_batch?


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc
File src/kudu/util/process_memory.cc:

http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69
PS14, Line 69: 105.0
Does it make sense to add a validator for this flag?


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69
PS14, Line 69: double
The rest of the flags for similar thresholds have 1/100 granularity, and it's 
been enough.  Do you think that 1/100 granularity isn't enough in this case?  
That's a bit unexpected to me given we were talking about going over the 
threshold with some solid margin.


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69
PS14, Line 69: compact
IIUC, the utilities in the 'process_memory' namespace and corresponding flags 
don't differentiate between compaction, processing RPCs, and other activity, so 
seeing this compaction-specific name for the flag is a bit strange.  If you 
really want to introduce such specifics in here, maybe add a TODO note on 
removing this flag once compaction logic starts honoring the hard memory limit 
setting?


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@72
PS14, Line 72: advanced
Should we consider this a temporary flag that will be removed once compaction 
honors the hard memory limit?  If yes, then also add 'experimental' tag as 
well.  And it's a runtime flag by its nature, correct?



--
To view, visit http://gerrit.cloudera.org:8080/22079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2037582d433730884212e83359bb75bad0d5394
Gerrit-Change-Number: 22079
Gerrit-PatchSet: 14
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Comment-Date: Tue, 28 Jan 2025 03:34:55 +0000
Gerrit-HasComments: Yes

Reply via email to