Joel, It needs to perform. Typically users will have 1 - 5 million rows in a query, returning 10 - 15 fields. Grouping reduces the return by 50% or more normally. Responses tend be less than a half second.
It sounds like the manipulation of docs at the collector level has been left to the single solr node implementations, and that your streaming API is the way forward for cloud implementations. Even if it does have some performance drawbacks. I can bear slower searches as long as they are not seconds slower. I could implement some business strategy that forks searching to either the AnalyticsQuery or the streaming API based on the shard count in the collection. Most of my customers will have single shard collections. A goal of mine is to keep each collection whole as long as possible. If one gets too big for the pond I'll move it to a bigger pond, until some heap limit is reached when it will have to be split. -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-documents-from-a-distributed-search-tp4226802p4227595.html Sent from the Solr - User mailing list archive at Nabble.com.