On 5/23/2013 1:51 AM, Luis Cappa Banda wrote: > I've query each Solr shard server one by one and the total number of > documents is correct. However, when I change rows parameter from 10 to 100 > the total numFound of documents change:
I've seen this problem on the list before and the cause has been determined each time to be caused by documents with the same uniqueKey value appearing in more than one shard. What I think happens here: With rows=10, you get the top ten docs from each of the three shards, and each shard sends its numFound for that query to the core that's coordinating the search. The coordinator adds up numFound, looks through those thirty docs, and arranges them according to the requested sort order, returning only the top 10. In this case, there happen to be no duplicates. With rows=100, you get a total of 300 docs. This time, duplicates are found and removed by the coordinator. I think that the coordinator adjusts the total numFound by the number of duplicate documents it removed, in an attempt to be more accurate. I don't know if adjusting numFound when duplicates are found in a sharded query is the right thing to do, I'll leave that for smarter people. Perhaps Solr should return a message with the results saying that duplicates were found, and if a config option is not enabled, the server should throw an exception and return a 4xx HTTP error code. One idea for a config parameter name would be allowShardDuplicates, but something better can probably be found. Thanks, Shawn