I did some work to explain these topics at https://github.com/medined/D4M_Schema/blob/master/docs/data_distribution.md. If you have the luxury of writing the ingest code you can use Cardinality Estimates using techniques described in https://github.com/medined/D4M_Schema/blob/master/docs/cardinality.md.
On Sat, Oct 4, 2014 at 12:23 AM, Dylan Hutchison <[email protected]> wrote: > This is for Accumulo 1.6. Suppose we have the table splits > > c > > g > > w > > > Does anyone know how to determine > > the number of tablets assigned to each table split range? > For this example, this is the number of tablets in the ranges (-Inf,c), > (c,g), (g,w), (w,Inf). Or is the design 1-1, that is, for each table split > range there is exactly one tablet? > the number of rows inside all the tablets occupying a table split range? > For this example, this is the total number of rows among all tablets in the > ranges (-Inf,c), (c,g), (g,w), (w,Inf). > > We use this count to verify how well manually set table splits are load > balancing in the tables. > > Some context: I wrote functions that found these numbers two years ago > working on D4M in Accumulo 1.5. I took the dark route of using non-public > Accumulo API to get TabletServer information, get TabletStats information, > and find the matchings to a table's splits by scanning the extents listed in > the METATABLE. I can share the code if anyone is curious. It's not pretty, > but it did the job. > > Moving forward as we aim to upgrade to Accumulo 1.6, we should determine the > tablet split information the right way, not by reverse engineering Accumulo. > Any suggestions? > > Thanks, > Dylan Hutchison > > -- > www.cs.stevens.edu/~dhutchis
