: For example I have 4 shards. Finally, I need 2000 docs. Now, when I'm using
: 
&shards=127.0.0.1:8080/solr/shard1,127.0.0.1:8080/solr/shard2,127.0.0.1:8080/solr/shard3,127.0.0.1:8080/solr/shard4
: Solr gets 2000 docs from each shard (shard1,2,3,4, summary we have 8000
: docs) merge and sort it, for example, by default field (score), and returns
: me only 2000 rows (not all 8000), which I specified at request.
: So, my question was about, is any mechanism in Solr, which gets not 2000
: rows from each shard, and say, If I specified 2000 docs at request, Solr
: calculates how much shards I have (four shards), divides total rows onto
: shards (2000/4=500) and sends to each shard queries with rows=500, but not
: rows=2000, so finally, summary after merging and sorting I'll have 2000 rows
: (maybe less), but not 8000... That was my question.


Solr has to over-request to each of the shards in order to give you the 
"top" results from all shards (ie: sorted by score, or by whatever sort 
criteria you specify).

If Solr didn't do this, you could easily wind up in a situation where 
if you asked for the "top" 200 results across 10 shards, you would get the 
top 20 from *each* shard, but the top 200 across the whole index might 
really all be from a single shard.  Consider the case of an index sharded 
by date, and then you want to sort the documents by date -- or sorted by 
score, but the topic you are searching for is really preveland in "recent" 
documnts (ie: maybe it's collection of news articles sharded by decade, 
and you are searching for "twitter" which didn't exist 10 years ago) ... 
you would get terrible results if solr just divided rows/shard and asked 
each shard for that number of results.



-Hoss

Reply via email to