Thanks, Michael. I think this will work, and it is the direction I am heading. We are collapsing for deduplication, sort of.
We do need to search over the full uncollapsed domain, but I am pretty sure that nobody needs to see 40 million results, and if they're dumb enough to enter a query that matches that many documents, they deserve whatever they get. So my strategy is: 1. Check the query to see if it looks "safe" based on some heuristics. 2. If (1) fails do a search to get only the result count with rows=0 and no faceting or sorting. This is usually pretty fast. 3. If the count returned in (2) is above a certain threshold, add my extra filter query before executing the full faceted search Thanks, everyone! On Thu, Mar 24, 2022 at 10:04 AM Michael Gibney <[email protected]> wrote: > Are you determining your "top doc" for each collapsed group based on score? > If your use case is such that you determine the "top doc" based on a static > field with a manageable number of values, you may have other options > available to you. (For some use cases it can be acceptable to "pre-filter" > the domain with creative fq params. This works iff your "collapse" could be > considered a type of "deduplication" with doc priority determined by a > static field; but it's a non-starter if you know you need to search over > the full uncollapsed domain.) > > Michael >
