The tables came across corrupt, here they are (times in ms): Caches enabled:
q fq delta original query 28 295 267 w/o grouping 58 325 267 w/o sort on date 28 293 265 Caches disabled: q fq delta original query 4113 4381 268 w/o grouping 131 407 276 w/o sort on date 4217 4400 183 Shai On Thu, Jun 25, 2015 at 2:04 PM, Esther Goldbraich <estherg...@il.ibm.com> wrote: > Thank you all for collaborative thinking! > > Ran additional benchmarks as proposed. Some results: > > All solr caches are enabled (queryResultCache hit ratio = 0.02): > > > q > fq {!cache=false} > delta > original query > 28 > 295 > 267 > w/o grouping > 58 > 325 > 267 > w/o sort on date > 28 > 293 > 265 > > All solr caches are disabled (except built in lucene field cache): > > > q > fq {!cache=false} > delta > original query > 4113 > 4381 > 268 > w/o grouping > 131 > 407 > 276 > w/o sort on date > 4217 > 4400 > 183 > > *median runtime in ms > > As you can see, disabling grouping and/or sorting does not affect the > results much. That is, the difference between running with > 'fq{!cache=false}' or with 'q' is the same, while 'fq' performs slower in > all cases. > > Is it correct to assume then that the performance difference comes from > computing the filter (traversing the posting lists and building the > bitset)? > Does it also mean that not caching the filter does not affect grouping? > I.e. perhaps the second pass of grouping uses the already computed filter, > and does not attempt to fetch it from the cache? > > As a general rule of thumb, at least in our case, would you please comment > on the following assumptions/conclusions (note, all assuming that we don't > want to cache filters, and the 'fq' part is only used to avoid scoring): > > 1) If the query sorts by any other field than score (e.g. date), we can > put the 'fq' part in 'q'. Scoring won't be done, and we won't pay the cost > of building the filter, and then discarding it when the query completes. > > 2) In fact, if we don't intend to cache the filter, we might as well just > use only 'q'. At least, on our dataset (this may definitely *not* be a > general statement). > > 3) If we sort by relevance, but want to avoid scoring of the 'filter' > clauses, is there anything we can do on 4.7? > 3.1) The ^= operator is only available in 5.1, which seems exactly what we > need. > 3.2) Adding the filter clauses to the query w/ boost 0 will still compute > their score, only they won't affect the overall document score correct? > > 4) A more general question -- with the addition of ^= to query clauses in > 5.1 (resolved to ConstantScoreQuery down stream), what is the use case for > using fq w/ !cache=false? As we understand it, users who use this want to > compute a filter but not cache it. As we see, there is some added cost to > building a filter, so if you pay this cost over and over, would it not be > better to just use ^=? > > Best regards, > Esther > > > > > From: > Erick Erickson <erickerick...@gmail.com> > To: > solr-user@lucene.apache.org > Date: > 25/06/2015 02:38 AM > Subject: > Re: fq versus q > > > > Tell us a bit more about your test setup. 1 or 2 tests > don't mean much. For instance, if the fq query has to > load the low-level caches from disk then the q-only > query is run and doesn't that could skew the results. > Or if somehow you're hitting the queryResultCache. Or.... > > Frankly I'd disable all my caches for running tests like > this, and be sure to mix-n-match the tests so I wasn't > getting bitten by caches. > > And please tell us what the actual numbers are. 5-10X > doesn't mean much at all if it's 25ms .vs. 5 ms. It means > a lot (and something's very wrong) if it means > 200ms .vs. 1,000ms. > > Best, > Erick > > On Wed, Jun 24, 2015 at 5:30 PM, Upayavira <u...@odoko.co.uk> wrote: > > Are you wanting to do no scoring at all, or just have a portion of the > > query not contribute to the score? > > > > If you don't want scoring at all, just sort by another field. If you > > don't have a field, I just tried "&sort=1 desc", and it worked! This > > should, if I'm right, pull documents out of the index in index order. > > > > Upayavira > > > > On Wed, Jun 24, 2015, at 08:26 PM, Shai Erera wrote: > >> Ah thanks. I see it was added in 5.1 - is there any other way prior to > >> that > >> (like 4.7)? > >> > >> if not, I guess the only option is to not use fq if we don't intend to > >> cache it, and on 5.1 use the ^= syntax. > >> > >> Shai > >> > >> On Wed, Jun 24, 2015 at 9:21 PM, Jack Krupansky > >> <jack.krupan...@gmail.com> > >> wrote: > >> > >> > Yonik added syntax to request a constant score query in Solr with the > ^= > >> > operator. > >> > > >> > For example: +color:blue^=1 text:shoes > >> > > >> > See: > >> > https://issues.apache.org/jira/browse/SOLR-7218 > >> > > >> > -- Jack Krupansky > >> > > >> > On Wed, Jun 24, 2015 at 1:41 PM, Shai Erera <ser...@gmail.com> wrote: > >> > > >> > > Thanks Shawn, > >> > > > >> > > What's Solr equivalence to ConstantScoreQuery? I.e., what if you > want to > >> > > run a query that does not score, but only filter. The rationale > behind > >> > > using a non-cached 'fq' was just that. > >> > > > >> > > Shai > >> > > > >> > > On Wed, Jun 24, 2015 at 4:29 PM, Shawn Heisey <apa...@elyograg.org> > >> > wrote: > >> > > > >> > > > On 6/24/2015 5:28 AM, Esther Goldbraich wrote: > >> > > > > We are comparing the performance of fq versus q for queries > that are > >> > > > > actually filters and should not be cached. > >> > > > > In part of queries we see strange behavior where q performs > 5-10x > >> > > better > >> > > > > than fq. The question is why? > >> > > > > > >> > > > > An example1: > >> > > > > q=maildate:{DATE1 to DATE2} COMPARED TO > >> > > fq={!cache=false}maildate:{DATE1 > >> > > > > to DATE2} > >> > > > > sort=maildate_sort* desc > >> > > > > >> > > > <snip> > >> > > > > >> > > > > <field name="maildate" stored="true" indexed="true" > type="tdate"/> > >> > > > > <field name="maildate_sort" stored="false" indexed="false" > >> > type="tdate" > >> > > > > docValues="true"/> > >> > > > > >> > > > For simplicity, I would probably just use one field for that, > rather > >> > > > than a separate sort field. The disk space required would > probably be > >> > > > the same either way, but your interaction with the index will not > be as > >> > > > complex. There's nothing wrong with doing it the way you have, > though. > >> > > > > >> > > > I'm not at all an expert, but I've been a member of this > community for > >> > a > >> > > > long time. Here's my guess about why your query is faster in the > q > >> > > > parameter than a non-cached filter: > >> > > > > >> > > > The result of a standard query is the stored fields from the top > N > >> > > > documents, where N is the value in the rows parameter. The > default for > >> > > > N is typically set to 10, and for most people will normally be > 200 or > >> > > less. > >> > > > > >> > > > The result of a filter is very different -- it is a bitset of all > the > >> > > > documents in your entire index, with binary 0 for documents that > don't > >> > > > match the filter and binary 1 for documents that do match. > >> > > > > >> > > > If your index has 100 million documents, every single one of > those 100 > >> > > > million documents must be checked against the filter query to > produce a > >> > > > filter bitset, but when it's in the q parameter, shortcuts can be > taken > >> > > > which will get the top N results quickly. > >> > > > > >> > > > The filterCache levels the playing field when filters are > re-used. If > >> > a > >> > > > requested filter is already in the cache, it can be retrieved and > >> > > > applied to a result VERY quickly. > >> > > > > >> > > > You have turned off the caching for your filter. I'm not sure > why you > >> > > > did this, but you know your use case a lot better than I do. If > it > >> > were > >> > > > me, I would use filter queries and do everything possible to > re-use the > >> > > > same filters, and I would cache them. > >> > > > > >> > > > Thanks, > >> > > > Shawn > >> > > > > >> > > > > >> > > > >> > > > > >