Side note on dates and fqs. If you're using NOW in your date expressions you may be able to re-use fqs by using "date math", see: https://lucidworks.com/blog/date-math-now-and-filter-queries/ Of course this may not be applicable in your situation...
FWIW, Erick On Thu, Jun 25, 2015 at 8:03 AM, Shai Erera <ser...@gmail.com> wrote: > The tables came across corrupt, here they are (times in ms): > > Caches enabled: > > q fq delta > original query 28 295 267 > w/o grouping 58 325 267 > w/o sort on date 28 293 265 > > Caches disabled: > > q fq delta > original query 4113 4381 268 > w/o grouping 131 407 276 > w/o sort on date 4217 4400 183 > > Shai > > On Thu, Jun 25, 2015 at 2:04 PM, Esther Goldbraich <estherg...@il.ibm.com> > wrote: > >> Thank you all for collaborative thinking! >> >> Ran additional benchmarks as proposed. Some results: >> >> All solr caches are enabled (queryResultCache hit ratio = 0.02): >> >> >> q >> fq {!cache=false} >> delta >> original query >> 28 >> 295 >> 267 >> w/o grouping >> 58 >> 325 >> 267 >> w/o sort on date >> 28 >> 293 >> 265 >> >> All solr caches are disabled (except built in lucene field cache): >> >> >> q >> fq {!cache=false} >> delta >> original query >> 4113 >> 4381 >> 268 >> w/o grouping >> 131 >> 407 >> 276 >> w/o sort on date >> 4217 >> 4400 >> 183 >> >> *median runtime in ms >> >> As you can see, disabling grouping and/or sorting does not affect the >> results much. That is, the difference between running with >> 'fq{!cache=false}' or with 'q' is the same, while 'fq' performs slower in >> all cases. >> >> Is it correct to assume then that the performance difference comes from >> computing the filter (traversing the posting lists and building the >> bitset)? >> Does it also mean that not caching the filter does not affect grouping? >> I.e. perhaps the second pass of grouping uses the already computed filter, >> and does not attempt to fetch it from the cache? >> >> As a general rule of thumb, at least in our case, would you please comment >> on the following assumptions/conclusions (note, all assuming that we don't >> want to cache filters, and the 'fq' part is only used to avoid scoring): >> >> 1) If the query sorts by any other field than score (e.g. date), we can >> put the 'fq' part in 'q'. Scoring won't be done, and we won't pay the cost >> of building the filter, and then discarding it when the query completes. >> >> 2) In fact, if we don't intend to cache the filter, we might as well just >> use only 'q'. At least, on our dataset (this may definitely *not* be a >> general statement). >> >> 3) If we sort by relevance, but want to avoid scoring of the 'filter' >> clauses, is there anything we can do on 4.7? >> 3.1) The ^= operator is only available in 5.1, which seems exactly what we >> need. >> 3.2) Adding the filter clauses to the query w/ boost 0 will still compute >> their score, only they won't affect the overall document score correct? >> >> 4) A more general question -- with the addition of ^= to query clauses in >> 5.1 (resolved to ConstantScoreQuery down stream), what is the use case for >> using fq w/ !cache=false? As we understand it, users who use this want to >> compute a filter but not cache it. As we see, there is some added cost to >> building a filter, so if you pay this cost over and over, would it not be >> better to just use ^=? >> >> Best regards, >> Esther >> >> >> >> >> From: >> Erick Erickson <erickerick...@gmail.com> >> To: >> solr-user@lucene.apache.org >> Date: >> 25/06/2015 02:38 AM >> Subject: >> Re: fq versus q >> >> >> >> Tell us a bit more about your test setup. 1 or 2 tests >> don't mean much. For instance, if the fq query has to >> load the low-level caches from disk then the q-only >> query is run and doesn't that could skew the results. >> Or if somehow you're hitting the queryResultCache. Or.... >> >> Frankly I'd disable all my caches for running tests like >> this, and be sure to mix-n-match the tests so I wasn't >> getting bitten by caches. >> >> And please tell us what the actual numbers are. 5-10X >> doesn't mean much at all if it's 25ms .vs. 5 ms. It means >> a lot (and something's very wrong) if it means >> 200ms .vs. 1,000ms. >> >> Best, >> Erick >> >> On Wed, Jun 24, 2015 at 5:30 PM, Upayavira <u...@odoko.co.uk> wrote: >> > Are you wanting to do no scoring at all, or just have a portion of the >> > query not contribute to the score? >> > >> > If you don't want scoring at all, just sort by another field. If you >> > don't have a field, I just tried "&sort=1 desc", and it worked! This >> > should, if I'm right, pull documents out of the index in index order. >> > >> > Upayavira >> > >> > On Wed, Jun 24, 2015, at 08:26 PM, Shai Erera wrote: >> >> Ah thanks. I see it was added in 5.1 - is there any other way prior to >> >> that >> >> (like 4.7)? >> >> >> >> if not, I guess the only option is to not use fq if we don't intend to >> >> cache it, and on 5.1 use the ^= syntax. >> >> >> >> Shai >> >> >> >> On Wed, Jun 24, 2015 at 9:21 PM, Jack Krupansky >> >> <jack.krupan...@gmail.com> >> >> wrote: >> >> >> >> > Yonik added syntax to request a constant score query in Solr with the >> ^= >> >> > operator. >> >> > >> >> > For example: +color:blue^=1 text:shoes >> >> > >> >> > See: >> >> > https://issues.apache.org/jira/browse/SOLR-7218 >> >> > >> >> > -- Jack Krupansky >> >> > >> >> > On Wed, Jun 24, 2015 at 1:41 PM, Shai Erera <ser...@gmail.com> wrote: >> >> > >> >> > > Thanks Shawn, >> >> > > >> >> > > What's Solr equivalence to ConstantScoreQuery? I.e., what if you >> want to >> >> > > run a query that does not score, but only filter. The rationale >> behind >> >> > > using a non-cached 'fq' was just that. >> >> > > >> >> > > Shai >> >> > > >> >> > > On Wed, Jun 24, 2015 at 4:29 PM, Shawn Heisey <apa...@elyograg.org> >> >> > wrote: >> >> > > >> >> > > > On 6/24/2015 5:28 AM, Esther Goldbraich wrote: >> >> > > > > We are comparing the performance of fq versus q for queries >> that are >> >> > > > > actually filters and should not be cached. >> >> > > > > In part of queries we see strange behavior where q performs >> 5-10x >> >> > > better >> >> > > > > than fq. The question is why? >> >> > > > > >> >> > > > > An example1: >> >> > > > > q=maildate:{DATE1 to DATE2} COMPARED TO >> >> > > fq={!cache=false}maildate:{DATE1 >> >> > > > > to DATE2} >> >> > > > > sort=maildate_sort* desc >> >> > > > >> >> > > > <snip> >> >> > > > >> >> > > > > <field name="maildate" stored="true" indexed="true" >> type="tdate"/> >> >> > > > > <field name="maildate_sort" stored="false" indexed="false" >> >> > type="tdate" >> >> > > > > docValues="true"/> >> >> > > > >> >> > > > For simplicity, I would probably just use one field for that, >> rather >> >> > > > than a separate sort field. The disk space required would >> probably be >> >> > > > the same either way, but your interaction with the index will not >> be as >> >> > > > complex. There's nothing wrong with doing it the way you have, >> though. >> >> > > > >> >> > > > I'm not at all an expert, but I've been a member of this >> community for >> >> > a >> >> > > > long time. Here's my guess about why your query is faster in the >> q >> >> > > > parameter than a non-cached filter: >> >> > > > >> >> > > > The result of a standard query is the stored fields from the top >> N >> >> > > > documents, where N is the value in the rows parameter. The >> default for >> >> > > > N is typically set to 10, and for most people will normally be >> 200 or >> >> > > less. >> >> > > > >> >> > > > The result of a filter is very different -- it is a bitset of all >> the >> >> > > > documents in your entire index, with binary 0 for documents that >> don't >> >> > > > match the filter and binary 1 for documents that do match. >> >> > > > >> >> > > > If your index has 100 million documents, every single one of >> those 100 >> >> > > > million documents must be checked against the filter query to >> produce a >> >> > > > filter bitset, but when it's in the q parameter, shortcuts can be >> taken >> >> > > > which will get the top N results quickly. >> >> > > > >> >> > > > The filterCache levels the playing field when filters are >> re-used. If >> >> > a >> >> > > > requested filter is already in the cache, it can be retrieved and >> >> > > > applied to a result VERY quickly. >> >> > > > >> >> > > > You have turned off the caching for your filter. I'm not sure >> why you >> >> > > > did this, but you know your use case a lot better than I do. If >> it >> >> > were >> >> > > > me, I would use filter queries and do everything possible to >> re-use the >> >> > > > same filters, and I would cache them. >> >> > > > >> >> > > > Thanks, >> >> > > > Shawn >> >> > > > >> >> > > > >> >> > > >> >> > >> >> >> >>