Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/19281 )
Change subject: WIP KUDU-3406 memory budgeting for CompactRowSetsOp ...................................................................... Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc@199 PS2, Line 199: 8.0 > Do we need to consider different values for compressed and uncompressed dat Unfortunately, that's not so straightforward -- the current size estimate is provided for all the data stored in the deltas, not differentiating between compressed/non-compressed. Anyway, adding some sort of separate factors isn't very sound since the actual size also depends on the encoding for particular column and the actual data stored there. http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc@200 PS2, Line 200: Euristic > nit: Did you mean 'Heuristic' ? Done http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc@210 PS2, Line 210: avialable > nit: available Done http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc@2344 PS2, Line 2344: undos_total_size > Not really a comment but just for my understanding - With the ultimate fix, we should bother about total size of the deltas. Since the nature of the rowset merge compaction is about merging ordered lists and producing a single ordered list, that could be done allocating just a small chunk of memory to be able to fit several deltas that belong to different rowsets, and usually the operation picks up to 10 rowsets for compaction (that might become a parameter of the algorithm). http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet.cc@2361 PS2, Line 2361: quality > I remember we were discussing about scenario wherein AdjustedPerfScore reli Right: the original guess was that the rowset merge compactions weren't run and that would be the root cause of accumulating so many delta stores with a lot of UNDO deltas. However, even if that was true, the amount of data accumulated in those rowsets isn't going to decrease when CompactRowSetsOp tasks are run more frequently, and if at some point the compacted rowset with huge amount of UNDO data is selected for next round of compaction with some other rowset, the OOM condition would occur. The UNDO deltas are only GC-ed once they fall behind the AHM mark, but till then they are migrating from deltastores of input rowsets into the deltastores of the results rowsets. So, I realized that instead of tweaking the compaction policy w.r.t. what ops to run to, ultimately there is a need to provide memory budgeting for CompactRowSetsOp since eventually it might read in a lot of data from not-yet-ancient UNDO deltas. Alternatively, we could avoid picking rowsets which are too heavy w.r.t. amount of UNDO delta data and the available memory, but still running CompactRowSetsOp on the set of the smaller rowsets. Probably, I should explore that approach as well. http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet_mm_ops.cc File src/kudu/tablet/tablet_mm_ops.cc: http://gerrit.cloudera.org:8080/#/c/19281/2/src/kudu/tablet/tablet_mm_ops.cc@153 PS2, Line 153: uint64_t new_num_mrs_flushed = metrics->flush_mrs_duration->TotalCount(); : uint64_t new_num_dms_flushed = metrics->flush_dms_duration->TotalCount(); : uint64_t new_num_rs_compacted = metrics->compact_rs_duration->TotalCount(); > I vaguely remember during my testing, I was getting these counter values as Yes: they are changing if the ops are run on the same tablet. The idea here is to check whether corresponding 'TotalCount' metrics have changed for the histograms. If so, that's a sign that at least one of the operations have been run. And if such operation has been run, it's a trigger to re-evaluate the score for CompactRowSetsOp or to change its 'is-op-runnable' flag since the input data has changed. Here were are monitoring MRS, DMS flushes since those free up memory. Also, various rowset compactions (they might change REDO into UNDO deltas, etc.). Also, if GC has run, it's time to check whether the ratio of ancient deltas has dropped below the configured threshold, and now the operation might become runnable. -- To view, visit http://gerrit.cloudera.org:8080/19281 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I89c171284944831e95c45a993d85fbefe89048cf Gerrit-Change-Number: 19281 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Comment-Date: Wed, 30 Nov 2022 00:25:39 +0000 Gerrit-HasComments: Yes
