I have a solr index which is going to grow 3x in the near future. I'm considering using distributed search and was contemplating what would be the best approach to splitting the index. Since most of the searches performed on the index are sorted by date descending, I'm considering splitting the index based on the created date of the documents.
>From Yonik Seeley's blog post, http://yonik.wordpress.com/2008/02/27/distributed-search-for-solr/, I've read that there are two phases to sharding. The first phase collects matching ids and documents across the shards. Then the second phase collects the stored fields for the documents. I'm assuming that this second phase's execution is limited by the number of rows requested and the number of results. So let's say I have 2 shards. The first shard has docs with creation dates of this year. The Second shard contains documents from the previous year. I run a solr query requesting 10 rows sorted by date and get 11 from the first shard and 3 from the second. Will the initial query only execute the first phase on the second shard? If so, that should result in more optimum performance, right? Thanks, -Tim