Re: fq versus q

Erick Erickson Thu, 25 Jun 2015 05:27:39 -0700

Side note on dates and fqs. If you're using NOW in your date
expressions you may be able to re-use fqs by using "date math",
see:
https://lucidworks.com/blog/date-math-now-and-filter-queries/
Of course this may not be applicable in your situation...


FWIW,
Erick

On Thu, Jun 25, 2015 at 8:03 AM, Shai Erera <ser...@gmail.com> wrote:
> The tables came across corrupt, here they are (times in ms):
>
> Caches enabled:
>
>                   q     fq     delta
> original query    28    295    267
> w/o grouping      58    325    267
> w/o sort on date  28    293    265
>
> Caches disabled:
>
>                   q     fq     delta
> original query    4113  4381   268
> w/o grouping      131   407    276
> w/o sort on date  4217  4400   183
>
> Shai
>
> On Thu, Jun 25, 2015 at 2:04 PM, Esther Goldbraich <estherg...@il.ibm.com>
> wrote:
>
>> Thank you all for collaborative thinking!
>>
>> Ran additional benchmarks as proposed. Some results:
>>
>> All solr caches are enabled (queryResultCache hit ratio = 0.02):
>>
>>
>> q
>> fq {!cache=false}
>> delta
>> original query
>> 28
>> 295
>> 267
>> w/o grouping
>> 58
>> 325
>> 267
>> w/o sort on date
>> 28
>> 293
>> 265
>>
>> All solr caches are disabled (except built in lucene field cache):
>>
>>
>> q
>> fq {!cache=false}
>> delta
>> original query
>> 4113
>> 4381
>> 268
>> w/o grouping
>> 131
>> 407
>> 276
>> w/o sort on date
>> 4217
>> 4400
>> 183
>>
>> *median runtime in ms
>>
>> As you can see, disabling grouping and/or sorting does not affect the
>> results much. That is, the difference between running with
>> 'fq{!cache=false}' or with 'q' is the same, while 'fq' performs slower in
>> all cases.
>>
>> Is it correct to assume then that the performance difference comes from
>> computing the filter (traversing the posting lists and building the
>> bitset)?
>> Does it also mean that not caching the filter does not affect grouping?
>> I.e. perhaps the second pass of grouping uses the already computed filter,
>> and does not attempt to fetch it from the cache?
>>
>> As a general rule of thumb, at least in our case, would you please comment
>> on the following assumptions/conclusions (note, all assuming that we don't
>> want to cache filters, and the 'fq' part is only used to avoid scoring):
>>
>> 1) If the query sorts by any other field than score (e.g. date), we can
>> put the 'fq' part in 'q'. Scoring won't be done, and we won't pay the cost
>> of building the filter, and then discarding it when the query completes.
>>
>> 2) In fact, if we don't intend to cache the filter, we might as well just
>> use only 'q'. At least, on our dataset (this may definitely *not* be a
>> general statement).
>>
>> 3) If we sort by relevance, but want to avoid scoring of the 'filter'
>> clauses, is there anything we can do on 4.7?
>> 3.1) The ^= operator is only available in 5.1, which seems exactly what we
>> need.
>> 3.2) Adding the filter clauses to the query w/ boost 0 will still compute
>> their score, only they won't affect the overall document score correct?
>>
>> 4) A more general question -- with the addition of ^= to query clauses in
>> 5.1 (resolved to ConstantScoreQuery down stream), what is the use case for
>> using fq w/ !cache=false? As we understand it, users who use this want to
>> compute a filter but not cache it. As we see, there is some added cost to
>> building a filter, so if you pay this cost over and over, would it not be
>> better to just use ^=?
>>
>> Best regards,
>> Esther
>>
>>
>>
>>
>> From:
>> Erick Erickson <erickerick...@gmail.com>
>> To:
>> solr-user@lucene.apache.org
>> Date:
>> 25/06/2015 02:38 AM
>> Subject:
>> Re: fq versus q
>>
>>
>>
>> Tell us a bit more about your test setup. 1 or 2 tests
>> don't mean much. For instance, if the fq query has to
>> load the low-level caches from disk then the q-only
>> query is run and doesn't that could skew the results.
>> Or if somehow you're hitting the queryResultCache. Or....
>>
>> Frankly I'd disable all my caches for running tests like
>> this, and be sure to mix-n-match the tests so I wasn't
>> getting bitten by caches.
>>
>> And please tell us what the actual numbers are. 5-10X
>> doesn't mean much at all if it's 25ms .vs. 5 ms. It means
>> a lot (and something's very wrong) if it means
>> 200ms .vs. 1,000ms.
>>
>> Best,
>> Erick
>>
>> On Wed, Jun 24, 2015 at 5:30 PM, Upayavira <u...@odoko.co.uk> wrote:
>> > Are you wanting to do no scoring at all, or just have a portion of the
>> > query not contribute to the score?
>> >
>> > If you don't want scoring at all, just sort by another field. If you
>> > don't have a field, I just tried "&sort=1 desc", and it worked! This
>> > should, if I'm right, pull documents out of the index in index order.
>> >
>> > Upayavira
>> >
>> > On Wed, Jun 24, 2015, at 08:26 PM, Shai Erera wrote:
>> >> Ah thanks. I see it was added in 5.1 - is there any other way prior to
>> >> that
>> >> (like 4.7)?
>> >>
>> >> if not, I guess the only option is to not use fq if we don't intend to
>> >> cache it, and on 5.1 use the ^= syntax.
>> >>
>> >> Shai
>> >>
>> >> On Wed, Jun 24, 2015 at 9:21 PM, Jack Krupansky
>> >> <jack.krupan...@gmail.com>
>> >> wrote:
>> >>
>> >> > Yonik added syntax to request a constant score query in Solr with the
>> ^=
>> >> > operator.
>> >> >
>> >> > For example: +color:blue^=1 text:shoes
>> >> >
>> >> > See:
>> >> > https://issues.apache.org/jira/browse/SOLR-7218
>> >> >
>> >> > -- Jack Krupansky
>> >> >
>> >> > On Wed, Jun 24, 2015 at 1:41 PM, Shai Erera <ser...@gmail.com> wrote:
>> >> >
>> >> > > Thanks Shawn,
>> >> > >
>> >> > > What's Solr equivalence to ConstantScoreQuery? I.e., what if you
>> want to
>> >> > > run a query that does not score, but only filter. The rationale
>> behind
>> >> > > using a non-cached 'fq' was just that.
>> >> > >
>> >> > > Shai
>> >> > >
>> >> > > On Wed, Jun 24, 2015 at 4:29 PM, Shawn Heisey <apa...@elyograg.org>
>> >> > wrote:
>> >> > >
>> >> > > > On 6/24/2015 5:28 AM, Esther Goldbraich wrote:
>> >> > > > > We are comparing the performance of fq versus q for queries
>> that are
>> >> > > > > actually filters and should not be cached.
>> >> > > > > In part of queries we see strange behavior where q performs
>> 5-10x
>> >> > > better
>> >> > > > > than fq. The question is why?
>> >> > > > >
>> >> > > > > An example1:
>> >> > > > > q=maildate:{DATE1 to DATE2} COMPARED TO
>> >> > > fq={!cache=false}maildate:{DATE1
>> >> > > > > to DATE2}
>> >> > > > > sort=maildate_sort* desc
>> >> > > >
>> >> > > > <snip>
>> >> > > >
>> >> > > > > <field name="maildate" stored="true" indexed="true"
>> type="tdate"/>
>> >> > > > > <field name="maildate_sort" stored="false" indexed="false"
>> >> > type="tdate"
>> >> > > > > docValues="true"/>
>> >> > > >
>> >> > > > For simplicity, I would probably just use one field for that,
>> rather
>> >> > > > than a separate sort field.  The disk space required would
>> probably be
>> >> > > > the same either way, but your interaction with the index will not
>> be as
>> >> > > > complex.  There's nothing wrong with doing it the way you have,
>> though.
>> >> > > >
>> >> > > > I'm not at all an expert, but I've been a member of this
>> community for
>> >> > a
>> >> > > > long time.  Here's my guess about why your query is faster in the
>> q
>> >> > > > parameter than a non-cached filter:
>> >> > > >
>> >> > > > The result of a standard query is the stored fields from the top
>> N
>> >> > > > documents, where N is the value in the rows parameter.  The
>> default for
>> >> > > > N is typically set to 10, and for most people will normally be
>> 200 or
>> >> > > less.
>> >> > > >
>> >> > > > The result of a filter is very different -- it is a bitset of all
>> the
>> >> > > > documents in your entire index, with binary 0 for documents that
>> don't
>> >> > > > match the filter and binary 1 for documents that do match.
>> >> > > >
>> >> > > > If your index has 100 million documents, every single one of
>> those 100
>> >> > > > million documents must be checked against the filter query to
>> produce a
>> >> > > > filter bitset, but when it's in the q parameter, shortcuts can be
>> taken
>> >> > > > which will get the top N results quickly.
>> >> > > >
>> >> > > > The filterCache levels the playing field when filters are
>> re-used.  If
>> >> > a
>> >> > > > requested filter is already in the cache, it can be retrieved and
>> >> > > > applied to a result VERY quickly.
>> >> > > >
>> >> > > > You have turned off the caching for your filter.  I'm not sure
>> why you
>> >> > > > did this, but you know your use case a lot better than I do.  If
>> it
>> >> > were
>> >> > > > me, I would use filter queries and do everything possible to
>> re-use the
>> >> > > > same filters, and I would cache them.
>> >> > > >
>> >> > > > Thanks,
>> >> > > > Shawn
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>>
>>
>>
>>

Re: fq versus q

Reply via email to