For N splits in a table, you will always have N+1 tablets.

As you put below, the tablets are defined as (-Inf,c], (c,g], (g,w], (w,Inf) as long as I'm remembering the inclusivity correctly.

The number of unique rows within a tablet are not explicitly tracked.

I'd encourage you to submit some suggestions about new methods that we could add that would fill any gaps about distribution of tablets. Letting clients have some notion of where data is coming from may be useful (even if it is subject to change at any moment -- table balancer, servers dying, etc).

Dylan Hutchison wrote:
This is for Accumulo 1.6.  Suppose we have the table splits

    c

    g

    w


Does anyone know how to determine

 1. *the number of tablets assigned to each table split range? *
    For this example, this is the number of tablets in the ranges
    (-Inf,c), (c,g), (g,w), (w,Inf).  Or is the design 1-1, that is, for
    each table split range there is exactly one tablet?
 2. *the number of rows inside all the tablets occupying a table split
    range? *
    For this example, this is the total number of rows among all tablets
    in the ranges (-Inf,c), (c,g), (g,w), (w,Inf).

We use this count to verify how well manually set table splits are load
balancing in the tables.

Some context: I wrote functions that found these numbers two years ago
working on D4M in Accumulo 1.5.  I took the dark route of using
non-public Accumulo API to get TabletServer information, get TabletStats
information, and find the matchings to a table's splits by scanning the
extents listed in the METATABLE.  I can share the code if anyone is
curious.  It's not pretty, but it did the job.

Moving forward as we aim to upgrade to Accumulo 1.6, we should determine
the tablet split information the right way, not by reverse engineering
Accumulo.  Any suggestions?

Thanks,
Dylan Hutchison

--
www.cs.stevens.edu/~dhutchis <http://www.cs.stevens.edu/~dhutchis>

Reply via email to