On Fri, Jan 25, 2013 at 1:56 PM, davers <dboych...@improvementdirect.com> wrote: > When I used 4.0 I could use my DIH on any shard and the documents would be > distributed based on the internal hashing algorithm and end up distributed > evenly across my three shards. > > I have just upgraded to Solr 4.1 and I have noticed that my documents always > end up on the shard that I run the DIH on. I'm assuming this has something > to do with the changes to allow custom hashing.
It does... if you create your cluster using numShards=N, then ranges will be created for each shard and that will be respected when you index. If you don't pass numShards, then custom sharding is assumed (i.e. the user/client decides what document belongs on what shard). For 4.0, if you didn't pass numShards, then the document was hashed depending on the current number of shards at the time (but the range was never recorded anywhere). This was extremely error prone since adding another shard would then invalidate the mappings of all previous shards (but no one would know...). -Yonik http://lucidworks.com