Hi Anand, The nature of the hash function (murmur3) should lead to a approximately uniform distribution of documents across sub-shards. Have you investigated why, if at all, the sub-shards are not balanced? Do you use composite keys e.g. abc!id1 which cause the imbalance?
I don't think there is a (cheap) way to implement what you are asking in the current scheme of things because unless we go through each id and calculate the hash, we have no way of knowing the optimal distribution. However, if we were to store the hash of the key as a separate field in the index then it should be possible to binary search for hash ranges which lead to approx. equal distribution of docs in sub-shards. We can implement something like that inside Solr. On Wed, May 6, 2015 at 4:42 PM, anand.mahajan <an...@zerebral.co.in> wrote: > Okay - Thanks for the confirmation Shalin. Could this be a feature request > in the Collections API - that we have a Split shard dry run API that > accepts > sub-shards count as a request param and returns the optimal shard ranges > for > the number of sub-shards requested to be created along with the respective > document counts for each of the sub-shards? The users can then use this > shard ranges for the actual split? > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204100.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Regards, Shalin Shekhar Mangar.