Ashwani Raina has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22079 )

Change subject: [compaction] Add memory tracking for better OOM issues triaging
......................................................................


Patch Set 14:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc
File src/kudu/tablet/compaction-highmem-test.cc:

http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc@195
PS14, Line 195:   LOG(INFO) << Substitute("MajorDeltaCompaction complete. 
Timing: $0 Metrics: $1",
              :                           sw.elapsed().ToString(),
              :                           trace->MetricsAsJSON());
> Do you think we still need this INFO output and the stopwatch timings for t
Just for additional information as we would see at the end of compaction 
operations normally . But not vital to the test as such.
Removed.


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/compaction-highmem-test.cc@252
PS14, Line 252: int64_t
> nit: can this be constexpr?
Done


http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/22079/11/src/kudu/tablet/delta_compaction.cc@133
PS11, Line 133:         process_memory::HardLimit(), 
process_memory::CurrentConsumption(),
              :         tablet_id_.c_str(), tracker_->consumption(), 
parent_tracker_->consumption());
              :     KLOG_EVERY_N_SECS(WARNING, 1) << msg
> That's nice, thanks! It would be great good to have it covered in all the r
Ack


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc
File src/kudu/tablet/delta_compaction.cc:

http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc@177
PS14, Line 177:     UpdateMemTracker(-mem.arena.memory_footprint());
> UBSAN failures in pre-commit tests (http://dist-test.cloudera.org/job?job_i
Oh yes! I guess this needs to be dealt with wherever applicable. I wonder why 
this error didn't pop-up during last run when line 255 change was added in 
previous PS.


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/tablet/delta_compaction.cc@189
PS14, Line 189: delta_iter_->memory_footprint() - 
delta_blocks_mem_before_prepare_batch
> If thinking in the lines of the UBSAN warning at line 177, does it make sen
Currently, this value going negative doesn't seem to be a possibility, however, 
it may happen in future if PrepareBatch does any de-allocation of memory from 
arena.

It is better to add explicit static_cast instead of switching to signed 
integers because that will introduce additional changes related to separate 
handling of allocate and release when calling UpdateMemTracker.


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc
File src/kudu/util/process_memory.cc:

http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69
PS14, Line 69: double
> The rest of the flags for similar thresholds have 1/100 granularity, and it
For normal scenarios, 1/100 granularity may suffice but this is mainly for unit 
test computation. Since unit test is working on small data generation when 
compared to 1GB of hard limit set in the binary, setting this flag to 
0.29296875 makes it hit the threshold condition in order for test to verify the 
logging, etc.

There is another way to deal with this by having this test moved to a different 
binary or a new one altogether where hard limit flag can be lowered down 
further in order to keep percentage at 1/100 granularity. I would prefer not to 
spend too much time on refining this further.


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69
PS14, Line 69: compact
> IIUC, the utilities in the 'process_memory' namespace and corresponding fla
I just want to point out that this flag is used for logging warning messages 
during compaction op only.
Adding a TODO doesn't harm though.


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@69
PS14, Line 69: 105.0
> Does it make sense to add a validator for this flag?
I don't think that is necessary in case we need to change this, maybe make it 
lower than 100%.
That is why I kept the definition as generic and not specifically mention that 
this is always going to be over hard limit.


http://gerrit.cloudera.org:8080/#/c/22079/14/src/kudu/util/process_memory.cc@72
PS14, Line 72: advanced
> Should we consider this a temporary flag that will be removed once compacti
Yes, this is a runtime flag by nature.
Changed to experimental.



--
To view, visit http://gerrit.cloudera.org:8080/22079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2037582d433730884212e83359bb75bad0d5394
Gerrit-Change-Number: 22079
Gerrit-PatchSet: 14
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Comment-Date: Tue, 28 Jan 2025 08:44:30 +0000
Gerrit-HasComments: Yes

Reply via email to