The tables came across corrupt, here they are (times in ms):

Caches enabled:

                  q     fq     delta
original query    28    295    267
w/o grouping      58    325    267
w/o sort on date  28    293    265

Caches disabled:

                  q     fq     delta
original query    4113  4381   268
w/o grouping      131   407    276
w/o sort on date  4217  4400   183

Shai

On Thu, Jun 25, 2015 at 2:04 PM, Esther Goldbraich <estherg...@il.ibm.com>
wrote:

> Thank you all for collaborative thinking!
>
> Ran additional benchmarks as proposed. Some results:
>
> All solr caches are enabled (queryResultCache hit ratio = 0.02):
>
>
> q
> fq {!cache=false}
> delta
> original query
> 28
> 295
> 267
> w/o grouping
> 58
> 325
> 267
> w/o sort on date
> 28
> 293
> 265
>
> All solr caches are disabled (except built in lucene field cache):
>
>
> q
> fq {!cache=false}
> delta
> original query
> 4113
> 4381
> 268
> w/o grouping
> 131
> 407
> 276
> w/o sort on date
> 4217
> 4400
> 183
>
> *median runtime in ms
>
> As you can see, disabling grouping and/or sorting does not affect the
> results much. That is, the difference between running with
> 'fq{!cache=false}' or with 'q' is the same, while 'fq' performs slower in
> all cases.
>
> Is it correct to assume then that the performance difference comes from
> computing the filter (traversing the posting lists and building the
> bitset)?
> Does it also mean that not caching the filter does not affect grouping?
> I.e. perhaps the second pass of grouping uses the already computed filter,
> and does not attempt to fetch it from the cache?
>
> As a general rule of thumb, at least in our case, would you please comment
> on the following assumptions/conclusions (note, all assuming that we don't
> want to cache filters, and the 'fq' part is only used to avoid scoring):
>
> 1) If the query sorts by any other field than score (e.g. date), we can
> put the 'fq' part in 'q'. Scoring won't be done, and we won't pay the cost
> of building the filter, and then discarding it when the query completes.
>
> 2) In fact, if we don't intend to cache the filter, we might as well just
> use only 'q'. At least, on our dataset (this may definitely *not* be a
> general statement).
>
> 3) If we sort by relevance, but want to avoid scoring of the 'filter'
> clauses, is there anything we can do on 4.7?
> 3.1) The ^= operator is only available in 5.1, which seems exactly what we
> need.
> 3.2) Adding the filter clauses to the query w/ boost 0 will still compute
> their score, only they won't affect the overall document score correct?
>
> 4) A more general question -- with the addition of ^= to query clauses in
> 5.1 (resolved to ConstantScoreQuery down stream), what is the use case for
> using fq w/ !cache=false? As we understand it, users who use this want to
> compute a filter but not cache it. As we see, there is some added cost to
> building a filter, so if you pay this cost over and over, would it not be
> better to just use ^=?
>
> Best regards,
> Esther
>
>
>
>
> From:
> Erick Erickson <erickerick...@gmail.com>
> To:
> solr-user@lucene.apache.org
> Date:
> 25/06/2015 02:38 AM
> Subject:
> Re: fq versus q
>
>
>
> Tell us a bit more about your test setup. 1 or 2 tests
> don't mean much. For instance, if the fq query has to
> load the low-level caches from disk then the q-only
> query is run and doesn't that could skew the results.
> Or if somehow you're hitting the queryResultCache. Or....
>
> Frankly I'd disable all my caches for running tests like
> this, and be sure to mix-n-match the tests so I wasn't
> getting bitten by caches.
>
> And please tell us what the actual numbers are. 5-10X
> doesn't mean much at all if it's 25ms .vs. 5 ms. It means
> a lot (and something's very wrong) if it means
> 200ms .vs. 1,000ms.
>
> Best,
> Erick
>
> On Wed, Jun 24, 2015 at 5:30 PM, Upayavira <u...@odoko.co.uk> wrote:
> > Are you wanting to do no scoring at all, or just have a portion of the
> > query not contribute to the score?
> >
> > If you don't want scoring at all, just sort by another field. If you
> > don't have a field, I just tried "&sort=1 desc", and it worked! This
> > should, if I'm right, pull documents out of the index in index order.
> >
> > Upayavira
> >
> > On Wed, Jun 24, 2015, at 08:26 PM, Shai Erera wrote:
> >> Ah thanks. I see it was added in 5.1 - is there any other way prior to
> >> that
> >> (like 4.7)?
> >>
> >> if not, I guess the only option is to not use fq if we don't intend to
> >> cache it, and on 5.1 use the ^= syntax.
> >>
> >> Shai
> >>
> >> On Wed, Jun 24, 2015 at 9:21 PM, Jack Krupansky
> >> <jack.krupan...@gmail.com>
> >> wrote:
> >>
> >> > Yonik added syntax to request a constant score query in Solr with the
> ^=
> >> > operator.
> >> >
> >> > For example: +color:blue^=1 text:shoes
> >> >
> >> > See:
> >> > https://issues.apache.org/jira/browse/SOLR-7218
> >> >
> >> > -- Jack Krupansky
> >> >
> >> > On Wed, Jun 24, 2015 at 1:41 PM, Shai Erera <ser...@gmail.com> wrote:
> >> >
> >> > > Thanks Shawn,
> >> > >
> >> > > What's Solr equivalence to ConstantScoreQuery? I.e., what if you
> want to
> >> > > run a query that does not score, but only filter. The rationale
> behind
> >> > > using a non-cached 'fq' was just that.
> >> > >
> >> > > Shai
> >> > >
> >> > > On Wed, Jun 24, 2015 at 4:29 PM, Shawn Heisey <apa...@elyograg.org>
> >> > wrote:
> >> > >
> >> > > > On 6/24/2015 5:28 AM, Esther Goldbraich wrote:
> >> > > > > We are comparing the performance of fq versus q for queries
> that are
> >> > > > > actually filters and should not be cached.
> >> > > > > In part of queries we see strange behavior where q performs
> 5-10x
> >> > > better
> >> > > > > than fq. The question is why?
> >> > > > >
> >> > > > > An example1:
> >> > > > > q=maildate:{DATE1 to DATE2} COMPARED TO
> >> > > fq={!cache=false}maildate:{DATE1
> >> > > > > to DATE2}
> >> > > > > sort=maildate_sort* desc
> >> > > >
> >> > > > <snip>
> >> > > >
> >> > > > > <field name="maildate" stored="true" indexed="true"
> type="tdate"/>
> >> > > > > <field name="maildate_sort" stored="false" indexed="false"
> >> > type="tdate"
> >> > > > > docValues="true"/>
> >> > > >
> >> > > > For simplicity, I would probably just use one field for that,
> rather
> >> > > > than a separate sort field.  The disk space required would
> probably be
> >> > > > the same either way, but your interaction with the index will not
> be as
> >> > > > complex.  There's nothing wrong with doing it the way you have,
> though.
> >> > > >
> >> > > > I'm not at all an expert, but I've been a member of this
> community for
> >> > a
> >> > > > long time.  Here's my guess about why your query is faster in the
> q
> >> > > > parameter than a non-cached filter:
> >> > > >
> >> > > > The result of a standard query is the stored fields from the top
> N
> >> > > > documents, where N is the value in the rows parameter.  The
> default for
> >> > > > N is typically set to 10, and for most people will normally be
> 200 or
> >> > > less.
> >> > > >
> >> > > > The result of a filter is very different -- it is a bitset of all
> the
> >> > > > documents in your entire index, with binary 0 for documents that
> don't
> >> > > > match the filter and binary 1 for documents that do match.
> >> > > >
> >> > > > If your index has 100 million documents, every single one of
> those 100
> >> > > > million documents must be checked against the filter query to
> produce a
> >> > > > filter bitset, but when it's in the q parameter, shortcuts can be
> taken
> >> > > > which will get the top N results quickly.
> >> > > >
> >> > > > The filterCache levels the playing field when filters are
> re-used.  If
> >> > a
> >> > > > requested filter is already in the cache, it can be retrieved and
> >> > > > applied to a result VERY quickly.
> >> > > >
> >> > > > You have turned off the caching for your filter.  I'm not sure
> why you
> >> > > > did this, but you know your use case a lot better than I do.  If
> it
> >> > were
> >> > > > me, I would use filter queries and do everything possible to
> re-use the
> >> > > > same filters, and I would cache them.
> >> > > >
> >> > > > Thanks,
> >> > > > Shawn
> >> > > >
> >> > > >
> >> > >
> >> >
>
>
>
>

Reply via email to