Thank you all for collaborative thinking!

Ran additional benchmarks as proposed. Some results:

All solr caches are enabled (queryResultCache hit ratio = 0.02):

 
q
fq {!cache=false}
delta
original query
28
295
267
w/o grouping
58
325
267
w/o sort on date
28
293
265

All solr caches are disabled (except built in lucene field cache):

 
q
fq {!cache=false}
delta
original query
4113
4381
268
w/o grouping
131
407
276
w/o sort on date
4217
4400
183

*median runtime in ms

As you can see, disabling grouping and/or sorting does not affect the 
results much. That is, the difference between running with 
'fq{!cache=false}' or with 'q' is the same, while 'fq' performs slower in 
all cases.

Is it correct to assume then that the performance difference comes from 
computing the filter (traversing the posting lists and building the 
bitset)?
Does it also mean that not caching the filter does not affect grouping? 
I.e. perhaps the second pass of grouping uses the already computed filter, 
and does not attempt to fetch it from the cache?

As a general rule of thumb, at least in our case, would you please comment 
on the following assumptions/conclusions (note, all assuming that we don't 
want to cache filters, and the 'fq' part is only used to avoid scoring):

1) If the query sorts by any other field than score (e.g. date), we can 
put the 'fq' part in 'q'. Scoring won't be done, and we won't pay the cost 
of building the filter, and then discarding it when the query completes.

2) In fact, if we don't intend to cache the filter, we might as well just 
use only 'q'. At least, on our dataset (this may definitely *not* be a 
general statement).

3) If we sort by relevance, but want to avoid scoring of the 'filter' 
clauses, is there anything we can do on 4.7?
3.1) The ^= operator is only available in 5.1, which seems exactly what we 
need.
3.2) Adding the filter clauses to the query w/ boost 0 will still compute 
their score, only they won't affect the overall document score correct?

4) A more general question -- with the addition of ^= to query clauses in 
5.1 (resolved to ConstantScoreQuery down stream), what is the use case for 
using fq w/ !cache=false? As we understand it, users who use this want to 
compute a filter but not cache it. As we see, there is some added cost to 
building a filter, so if you pay this cost over and over, would it not be 
better to just use ^=?

Best regards,
Esther




From:
Erick Erickson <erickerick...@gmail.com>
To:
solr-user@lucene.apache.org
Date:
25/06/2015 02:38 AM
Subject:
Re: fq versus q



Tell us a bit more about your test setup. 1 or 2 tests
don't mean much. For instance, if the fq query has to
load the low-level caches from disk then the q-only
query is run and doesn't that could skew the results.
Or if somehow you're hitting the queryResultCache. Or....

Frankly I'd disable all my caches for running tests like
this, and be sure to mix-n-match the tests so I wasn't
getting bitten by caches.

And please tell us what the actual numbers are. 5-10X
doesn't mean much at all if it's 25ms .vs. 5 ms. It means
a lot (and something's very wrong) if it means
200ms .vs. 1,000ms.

Best,
Erick

On Wed, Jun 24, 2015 at 5:30 PM, Upayavira <u...@odoko.co.uk> wrote:
> Are you wanting to do no scoring at all, or just have a portion of the
> query not contribute to the score?
>
> If you don't want scoring at all, just sort by another field. If you
> don't have a field, I just tried "&sort=1 desc", and it worked! This
> should, if I'm right, pull documents out of the index in index order.
>
> Upayavira
>
> On Wed, Jun 24, 2015, at 08:26 PM, Shai Erera wrote:
>> Ah thanks. I see it was added in 5.1 - is there any other way prior to
>> that
>> (like 4.7)?
>>
>> if not, I guess the only option is to not use fq if we don't intend to
>> cache it, and on 5.1 use the ^= syntax.
>>
>> Shai
>>
>> On Wed, Jun 24, 2015 at 9:21 PM, Jack Krupansky
>> <jack.krupan...@gmail.com>
>> wrote:
>>
>> > Yonik added syntax to request a constant score query in Solr with the 
^=
>> > operator.
>> >
>> > For example: +color:blue^=1 text:shoes
>> >
>> > See:
>> > https://issues.apache.org/jira/browse/SOLR-7218
>> >
>> > -- Jack Krupansky
>> >
>> > On Wed, Jun 24, 2015 at 1:41 PM, Shai Erera <ser...@gmail.com> wrote:
>> >
>> > > Thanks Shawn,
>> > >
>> > > What's Solr equivalence to ConstantScoreQuery? I.e., what if you 
want to
>> > > run a query that does not score, but only filter. The rationale 
behind
>> > > using a non-cached 'fq' was just that.
>> > >
>> > > Shai
>> > >
>> > > On Wed, Jun 24, 2015 at 4:29 PM, Shawn Heisey <apa...@elyograg.org>
>> > wrote:
>> > >
>> > > > On 6/24/2015 5:28 AM, Esther Goldbraich wrote:
>> > > > > We are comparing the performance of fq versus q for queries 
that are
>> > > > > actually filters and should not be cached.
>> > > > > In part of queries we see strange behavior where q performs 
5-10x
>> > > better
>> > > > > than fq. The question is why?
>> > > > >
>> > > > > An example1:
>> > > > > q=maildate:{DATE1 to DATE2} COMPARED TO
>> > > fq={!cache=false}maildate:{DATE1
>> > > > > to DATE2}
>> > > > > sort=maildate_sort* desc
>> > > >
>> > > > <snip>
>> > > >
>> > > > > <field name="maildate" stored="true" indexed="true" 
type="tdate"/>
>> > > > > <field name="maildate_sort" stored="false" indexed="false"
>> > type="tdate"
>> > > > > docValues="true"/>
>> > > >
>> > > > For simplicity, I would probably just use one field for that, 
rather
>> > > > than a separate sort field.  The disk space required would 
probably be
>> > > > the same either way, but your interaction with the index will not 
be as
>> > > > complex.  There's nothing wrong with doing it the way you have, 
though.
>> > > >
>> > > > I'm not at all an expert, but I've been a member of this 
community for
>> > a
>> > > > long time.  Here's my guess about why your query is faster in the 
q
>> > > > parameter than a non-cached filter:
>> > > >
>> > > > The result of a standard query is the stored fields from the top 
N
>> > > > documents, where N is the value in the rows parameter.  The 
default for
>> > > > N is typically set to 10, and for most people will normally be 
200 or
>> > > less.
>> > > >
>> > > > The result of a filter is very different -- it is a bitset of all 
the
>> > > > documents in your entire index, with binary 0 for documents that 
don't
>> > > > match the filter and binary 1 for documents that do match.
>> > > >
>> > > > If your index has 100 million documents, every single one of 
those 100
>> > > > million documents must be checked against the filter query to 
produce a
>> > > > filter bitset, but when it's in the q parameter, shortcuts can be 
taken
>> > > > which will get the top N results quickly.
>> > > >
>> > > > The filterCache levels the playing field when filters are 
re-used.  If
>> > a
>> > > > requested filter is already in the cache, it can be retrieved and
>> > > > applied to a result VERY quickly.
>> > > >
>> > > > You have turned off the caching for your filter.  I'm not sure 
why you
>> > > > did this, but you know your use case a lot better than I do.  If 
it
>> > were
>> > > > me, I would use filter queries and do everything possible to 
re-use the
>> > > > same filters, and I would cache them.
>> > > >
>> > > > Thanks,
>> > > > Shawn
>> > > >
>> > > >
>> > >
>> >



Reply via email to