[ https://issues.apache.org/jira/browse/KUDU-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248493#comment-17248493 ]
Andrew Wong commented on KUDU-2726: ----------------------------------- I'm a bit hesitant to entirely move consideration of the maintenance adjustments into stage 1 – it seems like these are used for prioritizing the ops that would have been done, rather than defining whether or not an op is worth performing. With that distinction, we should try to introduce a solution that tackles the latter without affecting the former. That said, I wouldn't be against introducing further improvements to stage 1. Introducing some manually-defined value similar to {{maintenance_priority}} and {{maintenance_op_multiplier}} sound like an OK solution in that some users may already be familiar the existing multipliers. I'm not personally a fan of it because picking correct values for these seems configurations unintuitive, but I know there are Kudu users who do find this configuration effective. Another solution would be to have stage 1 also account for the size of a tablet: if a tablet is very large, increase the compaction performance score. An observation here is that compacting 128MiB worth of data in a single 50GiB tablet may result in a compaction perf score of below 0.01, despite the average rowset height being relatively high. If instead we imagined the tablet were actually two 25GiB tablets, a 128MiB compaction may result in a higher perf score. Based on this observation, rather than running the budgeted compaction policy against the entire tablet, we could run it on multiple subsets of the tablet. For instance, if we have a 50GiB tablet, define some window W=25GiB such that before running the compaction scoring/selection, if the tablet is over size W, we split the input rowsets into 50/W = 2 separate sets of rowsets, run the compaction scoring/selection algorithm on both of these sets, and pick the best perf scores among the sets. This would mean {{compaction_minimum_improvement}} would no longer apply to the entire tablet, but rather it would apply to W-sized chunks of the tablet. If going down the route I'm describing, there needs to be more thought given to ensuring this doesn't introduce some never-ending compaction loop, but I think the solution is a somewhat elegant workaround for the fact that Kudu doesn't support tablet splits today. > Very large tablets defeat budgeted compaction > --------------------------------------------- > > Key: KUDU-2726 > URL: https://issues.apache.org/jira/browse/KUDU-2726 > Project: Kudu > Issue Type: Improvement > Affects Versions: 1.9.0 > Reporter: William Berkeley > Priority: Major > Labels: density, roadmap-candidate > > On very large tablets (50GB+), despite being very uncompacted with a large > average rowset height, a default budget (128MB) worth of compaction may not > reduce average rowset height enough to pass the minimum threshold. Thus the > tablet stays uncompacted forever. -- This message was sent by Atlassian Jira (v8.3.4#803005)