Yes - I'm using 2 level composite ids and that has caused the imbalance for
some shards.
Its cars data and the composite ids are of the form year-make!model-and
couple of other specifications. e.g. 2013Ford!Edge!123456 - but there are
just far too many Ford 2013 or 2011 cars that go and occupy the same shards.
This was done so as co-location of these docs is required as well for a few
of the search requirements - to avoid it hitting all shards all the time and
all queries do have the year and make combinations always specified and its
easier to work out the target shard for the query.

Regarding storing the hash against each document and then querying to find
out the optimal ranges - could it be done so that Solr maintains incremental
counters for each of the hash in the range for the shard - and then the
collections Splitshard API could use this internally to propose the optimal
shard ranges for the split? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204124.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to