On 8/8/2011 4:07 PM, simon wrote:
Only one should be returned, but it's non-deterministic. See http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
I had heard it was based on which one responded first. This is part of why we have a small index that contains the newest content and only distribute content to the other shards once a day. The hope is that the small index (less than 1GB, fits into RAM on that virtual machine) will always respond faster than the other larger shards (over 18GB each). Is this an incorrect assumption on our part?
The build system does do everything it can to ensure that periods of overlap are limited to the time it takes to commit a change across all of the shards, which should amount to just a few seconds once a day. There might be situations when the index gets out of whack and we have duplicate id values for a longer time period, but in practice it hasn't happened yet.
Thanks, Shawn