Re: Representative filtering of very large result sets

Jeremy Buckley - IQ-C Thu, 24 Mar 2022 08:00:24 -0700

Thanks, Michael. I think this will work, and it is the direction I am
heading.  We are collapsing for deduplication, sort of.

We do need to search over the full uncollapsed domain, but I am pretty sure
that nobody needs to see 40 million results, and if they're dumb enough to
enter a query that matches that many documents, they deserve whatever they
get.

So my strategy is:
1. Check the query to see if it looks "safe" based on some heuristics.
2. If (1) fails do a search to get only the result count with rows=0 and no
faceting or sorting. This is usually pretty fast.
3. If the count returned in (2)  is above a certain threshold, add my extra
filter query before executing the full faceted search

Thanks, everyone!

On Thu, Mar 24, 2022 at 10:04 AM Michael Gibney <[email protected]>
wrote:

> Are you determining your "top doc" for each collapsed group based on score?
> If your use case is such that you determine the "top doc" based on a static
> field with a manageable number of values, you may have other options
> available to you. (For some use cases it can be acceptable to "pre-filter"
> the domain with creative fq params. This works iff your "collapse" could be
> considered a type of "deduplication" with doc priority determined by a
> static field; but it's a non-starter if you know you need to search over
> the full uncollapsed domain.)
>
> Michael
>

Re: Representative filtering of very large result sets

Reply via email to