On Thu, Sep 3, 2015 at 4:45 PM, Jeff Wartes <jwar...@whitepages.com> wrote:
>
> I have a query like:
>
> q=<some complicated stuff>&fq=enabled:true
>
> For purposes of this conversation, "fq=enabled:true" is set for every query, 
> I never open a new searcher, and this is the only fq I ever use, so the 
> filter cache size is 1, and the hit ratio is 1.
> The fq=enabled:true clause matches about 15% of my documents. I have some 20M 
> documents per shard, in a 5.3 solrcloud cluster.
>
> Under these circumstances, this alternate version of the query averages about 
> 1/3 faster, consumes less CPU, and generates less garbage:
>
> q=<some complicated stuff> +enabled:true
>
> So it appears I have a case where using the cached fq result is more 
> expensive than just putting the same restriction in the query.
> Does someone have a clear mental model of how “q” and “fq” interact?

Lucene seems to always be changing it's execution model, so it can be
difficult to keep up.  What version of Solr are you using?
Lucene also changed how filters work,  so now, a filter is
incorporated with the query like so:

query = new BooleanQuery.Builder()
    .add(query, Occur.MUST)
    .add(pf.filter, Occur.FILTER)
    .build();

It may be that term queries are no longer worth caching... if this is
the case, we could automatically not cache them.

It also may be the structure of the query that is making the
difference.  Solr is creating

(complicated stuff) +(filter(enabled:true))

If you added +enabled:true directly to an existing boolean query, that
may be more efficient for lucene to process (flatter structure).

If you haven't already, could you try putting parens around your
(complicated stuff) to see if it makes any difference?

-Yonik

Reply via email to