Re: Unbalanced tablets or extra rfiles

Mike Drob Tue, 07 Jun 2016 14:18:42 -0700

1) Is your Accumulo Garbage Collector process running? It will delete
un-referenced files.
2) I've heard it said that 200 tablets per tserver is the sweet spot, but
it depends a lot on your read and write patterns.
3)
https://accumulo.apache.org/1.7/accumulo_user_manual#_table_compaction_major_everything_idle


On Tue, Jun 7, 2016 at 4:03 PM, Andrew Hulbert <[email protected]> wrote:

> Hi all,
>
> A few questions on behavior if you have any time...
>
> 1. When looking in accumulo's HDFS directories I'm seeing a situation
> where "tablets" aka "directories" for a table have more than the default 1G
> split threshold worth of rfiles in them. In one large instance, we have
> 400G worth of rfiles in the default_tablet directory (a mix of A, C, and
> F-type rfiles). We took one of these tables and compacted it and now there
> are appropriately ~1G worth of files in HDFS. On an unrelated table we have
> tablets with 100+G of bulk imported rfiles in the tablet's HDFS directory.
>
> These seems to be common across multiple clouds. All the ingest is done
> via batch writing. Is anyone aware of why this would happen or if it is
> even important? Perhaps these are leftover rfiles from some process. Their
> timestamps cover large date ranges.
>
> 2. There's been some discussion on the number of files per tserver for
> efficiency. Are there any limits on the size of rfiles for efficiency? For
> instance, I assume that compacting all the files into a single rfile per 1G
> split is more efficient bc it avoids merging (but maybe decreases
> concurrency). However, would it be better to have 500 tablets per node on a
> table with 1G splits versus having 50 tablets with 10G splits. Assuming
> HDFS and Accumulo don't mind 10G files!
>
> 3. Is there any way to force idle tablets to actually major compact other
> than the shell? Seems like it never happens.
>
> Thanks!
>
> Andrew
>

Re: Unbalanced tablets or extra rfiles

Reply via email to