Hi Bruce, We don't have an API for forcing the balancer to rebalance, but I believe it automatically runs every couple of minutes. So, it should get frequent opportunities to rebalance. It shouldn't be necessary to force a rebalance, if your balancer logic takes into account all the factors you care about.
If you need to force it, killing a tserver and allowing it's tablets to be reassigned can be relatively unintrusive, provided you don't have a lot of ingest going on, and your tables are flushed (to avoid WAL recovery costs). Another way might be to take the table offline and back online again, but that feels more intrusive to me, because it would affect an entire table. You could also manipulate the metadata table for the tablet to remove the saved location information while it is offline (don't do this while it is online), in order to avoid tablets from just being reassigned back to their previous servers. Regarding the empty splits, Accumulo generally balances tablets without regard to their contents, because we can't know how the application intends to use the splits (I say generally, because a custom balancer could be written to do anything). It's expected that the application's schema and the user's choice of manual splits reflect their preferred distribution of data across tablets, so the balancer only has to care about the number of tablets without regard to what they contain. You can merge empty tablets away if you don't need them, especially for pre-splits that you didn't end up using, but this incurs a cost in terms of chop-compactions on adjacent tablets. This might be acceptable. There has been some discussion about a feature to avoid chop-compactions, which would be nice because it would make merges much more instantaneous and cost-free, but it is not implemented yet. On Sun, Oct 31, 2021 at 8:59 PM McClure, Bruce MR 2 <bruce.mcclu...@defence.gov.au> wrote: > > UNOFFICIAL > > Hi, > > > > I have a custom table balancer set on a particular table, and a cron job that > creates splits for the next-days data, each day. Normally it is all fine, > but after some problems happened, I found that for certain days all the > splits resided on a single tablet server – which then caused performance > problems with ingest. This was solved by temporarily taking the tablet > server out of the cluster (stopping the Accumulo service not HDFS) and then > (days) later putting it back. This caused a re-assignment of the tablets and > presumably triggered the table balancer as part of that. This seemed like a > very heavy-handed solution and brought about the question: > > > > What is the recommended (least intrusive) way to trigger the table balancer > in Accumulo for a known set of splits (tablets)? > > > > Additional information: whilst the cluster is well balanced in terms of > tablets-per-server, there is an imbalance in terms of entries (3-1 or 5-1 in > some cases). I noticed that the new (empty) pre-splits appeared to be on the > server or servers with significantly less entries. > > > > Thank you in advance. > > > > Bruce. > > > > > >