Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19281 )

Change subject: WIP KUDU-3406 memory budgeting for CompactRowSetsOp
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc@199
PS2, Line 199: 8.0
> Do we need to consider different values for compressed and uncompressed dat
Unfortunately, that's not so straightforward -- the current size estimate is 
provided for all the data stored in the deltas, not differentiating between 
compressed/non-compressed.  Anyway, adding some sort of separate factors isn't 
very sound since the actual size also depends on the encoding for particular 
column and the actual data stored there.


http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc@200
PS2, Line 200: Euristic
> nit: Did you mean 'Heuristic' ?
Done


http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc@210
PS2, Line 210: avialable
> nit: available
Done


http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc@2344
PS2, Line 2344: undos_total_size
> Not really a comment but just for my understanding -
With the ultimate fix, we should bother about total size of the deltas.  Since 
the nature of the rowset merge compaction is about merging ordered lists and 
producing a single ordered list, that could be done allocating just a small 
chunk of memory to be able to fit several deltas that belong to different 
rowsets, and usually the operation picks up to 10 rowsets for compaction (that 
might become a parameter of the algorithm).


http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc@2361
PS2, Line 2361: quality
> I remember we were discussing about scenario wherein AdjustedPerfScore reli
Right: the original guess was that the rowset merge compactions weren't run and 
that would be the root cause of accumulating so many delta stores with a lot of 
UNDO deltas.

However, even if that was true, the amount of data accumulated in those rowsets 
isn't going to decrease when CompactRowSetsOp tasks are run more frequently, 
and if at some point the compacted rowset with huge amount of UNDO data is 
selected for next round of compaction with some other rowset, the OOM condition 
would occur.  The UNDO deltas are only GC-ed once they fall behind the AHM 
mark, but till then they are migrating from deltastores of input rowsets into 
the deltastores of the results rowsets.

So, I realized that instead of tweaking the compaction policy w.r.t. what ops 
to run to, ultimately there is a need to provide memory budgeting for 
CompactRowSetsOp since eventually it might read in a lot of data from 
not-yet-ancient UNDO deltas.

Alternatively, we could avoid picking rowsets which are too heavy w.r.t. amount 
of UNDO delta data and the available memory, but still running CompactRowSetsOp 
on the set of the smaller rowsets.  Probably, I should explore that approach as 
well.


http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet_mm_ops.cc
File src/kudu/tablet/tablet_mm_ops.cc:

http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet_mm_ops.cc@153
PS2, Line 153:     uint64_t new_num_mrs_flushed = 
metrics->flush_mrs_duration->TotalCount();
             :     uint64_t new_num_dms_flushed = 
metrics->flush_dms_duration->TotalCount();
             :     uint64_t new_num_rs_compacted = 
metrics->compact_rs_duration->TotalCount();
> I vaguely remember during my testing, I was getting these counter values as
Yes: they are changing if the ops are run on the same tablet.

The idea here is to check whether corresponding 'TotalCount' metrics have 
changed for the histograms.  If so, that's a sign that at least one of the 
operations have been run.  And if such operation has been run, it's a trigger 
to re-evaluate the score for CompactRowSetsOp or to change its 'is-op-runnable' 
flag since the input data has changed.

Here were are monitoring MRS, DMS flushes since those free up memory.  Also, 
various rowset compactions (they might change REDO into UNDO deltas, etc.).  
Also, if GC has run, it's time to check whether the ratio of ancient deltas has 
dropped below the configured threshold, and now the operation might become 
runnable.



--
To view, visit http://gerrit.cloudera.org:8080/19281
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c171284944831e95c45a993d85fbefe89048cf
Gerrit-Change-Number: 19281
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Abhishek Chennaka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Comment-Date: Wed, 30 Nov 2022 00:25:39 +0000
Gerrit-HasComments: Yes

Reply via email to