Hi Anand,

The nature of the hash function (murmur3) should lead to a approximately
uniform distribution of documents across sub-shards. Have you investigated
why, if at all, the sub-shards are not balanced? Do you use composite keys
e.g. abc!id1 which cause the imbalance?

I don't think there is a (cheap) way to implement what you are asking in
the current scheme of things because unless we go through each id and
calculate the hash, we have no way of knowing the optimal distribution.
However, if we were to store the hash of the key as a separate field in the
index then it should be possible to binary search for hash ranges which
lead to approx. equal distribution of docs in sub-shards. We can implement
something like that inside Solr.

On Wed, May 6, 2015 at 4:42 PM, anand.mahajan <an...@zerebral.co.in> wrote:

> Okay - Thanks for the confirmation Shalin.  Could this be a feature request
> in the Collections API - that we have a Split shard dry run API that
> accepts
> sub-shards count as a request param and returns the optimal shard ranges
> for
> the number of sub-shards requested to be created along with the respective
> document counts for each of the sub-shards? The users can then use this
> shard ranges for the actual split?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204100.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to