[
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557535#action_12557535
]
Yonik Seeley commented on SOLR-303:
-----------------------------------
> one solution i've seen to mitigate problems like this in the past is to
> compute a higher "limit" when querying the individual shards
Yep. Eventually should be configurable too. We should definitely do some
"over requesting" for very small limits. Expanding the limit too much can be
expensive though (CPU cost partially depends on the algorithm). I think users
should even be able to disable refinement queries if they just want an estimate.
Note that it's possible to tell if there even could be stealth terms out
there... we maintain the smallest count we get from each shard, so that serves
as the largest count any unknown term could have. Add all those together to
see if it's possible an unknown term could make it to the top terms. This
means you could do a request with a smaller limit, and then re-request with a
larger limit if necessary.
Beyond that, it becomes unclear what the best strategy is. Worst case
scenario: If the top N facets get down to a count of 1, then *any* unknown term
could bump another higher. Requesting all terms with count>=1 from each shard
isn't something I want to ponder.
Anyway, a colleague informs me that this is the way at least one other major
search vendor does things (counts are exact for terms shown, but it is
theoretically possible to miss a term).
> Distributed Search over HTTP
> ----------------------------
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
> Issue Type: New Feature
> Components: search
> Reporter: Sharad Agarwal
> Assignee: Yonik Seeley
> Attachments: distributed.patch, distributed.patch, distributed.patch,
> distributed.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch,
> fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch,
> fedsearch.stu.patch, fedsearch.stu.patch
>
>
> Searching over multiple shards and aggregating results.
> Motivated by http://wiki.apache.org/solr/DistributedSearch
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.