A custom CompactionStrategy is probably your best bet, I would think, since
you have very specific requirements.
You may also be interested in the work done by Keith Turner for 2.1.0 (not
yet released, as it is still under development) to add more control over
compactions. A preview of the javadoc for the features can be found at
https://github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/spi/compaction/package-info.java
(there may be a better doc... I'm not sure; perhaps these pending website
documentation updates:
https://github.com/apache/accumulo-website/pull/232/files)

On Tue, Nov 24, 2020 at 3:28 PM Rob Verkuylen <r...@verkuylen.net> wrote:

> We have a scenario where we use the SummingCombiner aggregating stats on
> high cardinality properties of a streaming dataset. Use-case is generating
> histograms over a certain period, so we age off these stats after a certain
> time.
>
> We run into some unexpected behaviour where the ageoff does not physically
> happen, unless we trigger a manual compaction using EverythingStrategy as
> opposed to the DefaultStrategy. This in combination with fairly large
> splitsizes(50-100G) to prevent tablets from splitting further.
>
> The default strategy with majc ratio of 3 and table.max.files=15 seem to
> result in a scenario where the tablet servers over time will contain one
> reasonable large file, ie 20G(A-*), and then several smaller files(C-*), of
> 1 to 5 and maybe 8GB. It will take a very long time before these C-* files
> will sum upto <ratio> x <largest file>, so the 20G file will almost never
> be considered for compaction and over time will hurt query performance
> because of all the aged-of data which needs to be skipped in a scan.
>
> Manual compaction will correct this, but it is a matter of time before we
> run into the same problem. What is the best approach to let accumulo handle
> this automatically? Is this a matter of lowering the ratio to get to the
> 20G quicker, fending against continuously running compactions? Or writing a
> custom CompactionStrategy?
>

Reply via email to