Re: Cached fq decreases performance
Yes please.: http://www.amazon.com/Solr-Troubleshooting-Maintenance-Alexandre-Rafalovitch/dp/1491920149/ :-) Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 4 September 2015 at 10:30, Yonik Seeleywrote: > On Fri, Sep 4, 2015 at 10:18 AM, Alexandre Rafalovitch > wrote: >> Yonik, >> >> Is this all visible on query debug level? > > Nope, unfortunately not. > > This is part of a bigger issue we should work at doing better at for > Solr 6: debugability / supportability. > For a specific request, what took up the memory, what cache misses or > cache instantiations were there, how much request-specific memory was > allocated, how much shared memory was needed to satisfy the request, > etc. > > -Yonik
Re: Cached fq decreases performance
>> This is part of a bigger issue we should work at doing better at for >> Solr 6: debugability / supportability. >> For a specific request, what took up the memory, what cache misses or >> cache instantiations were there, how much request-specific memory was >> allocated, how much shared memory was needed to satisfy the request, >> etc. Oh, and if we have the ability to *tell* when a request is going to allocate a big chunk of memory, then we should also be able to either prevent it from happening or terminate the request shortly after. So one could say, only allow this request to: - cause 500MB more of shared memory to be allocated (like field cache) - only allow it to use 5GB of shared memory total (so successive queries don't keep upping the total amount allocated) - only allow 100MB of request-specific memory to be allocated -Yonik
Re: Cached fq decreases performance
On Fri, Sep 4, 2015 at 10:18 AM, Alexandre Rafalovitchwrote: > Yonik, > > Is this all visible on query debug level? Nope, unfortunately not. This is part of a bigger issue we should work at doing better at for Solr 6: debugability / supportability. For a specific request, what took up the memory, what cache misses or cache instantiations were there, how much request-specific memory was allocated, how much shared memory was needed to satisfy the request, etc. -Yonik
Re: Cached fq decreases performance
Yonik, Is this all visible on query debug level? Would it be effective to ask to run both queries with debug enabled and to share the expanded query value? Would that show up the differences between Lucene implementations you described? (Looking for troubleshooting tips to reuse). Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 4 September 2015 at 10:06, Yonik Seeleywrote: > On Thu, Sep 3, 2015 at 4:45 PM, Jeff Wartes wrote: >> >> I have a query like: >> >> q==enabled:true >> >> For purposes of this conversation, "fq=enabled:true" is set for every query, >> I never open a new searcher, and this is the only fq I ever use, so the >> filter cache size is 1, and the hit ratio is 1. >> The fq=enabled:true clause matches about 15% of my documents. I have some >> 20M documents per shard, in a 5.3 solrcloud cluster. >> >> Under these circumstances, this alternate version of the query averages >> about 1/3 faster, consumes less CPU, and generates less garbage: >> >> q= +enabled:true >> >> So it appears I have a case where using the cached fq result is more >> expensive than just putting the same restriction in the query. >> Does someone have a clear mental model of how “q” and “fq” interact? > > Lucene seems to always be changing it's execution model, so it can be > difficult to keep up. What version of Solr are you using? > Lucene also changed how filters work, so now, a filter is > incorporated with the query like so: > > query = new BooleanQuery.Builder() > .add(query, Occur.MUST) > .add(pf.filter, Occur.FILTER) > .build(); > > It may be that term queries are no longer worth caching... if this is > the case, we could automatically not cache them. > > It also may be the structure of the query that is making the > difference. Solr is creating > > (complicated stuff) +(filter(enabled:true)) > > If you added +enabled:true directly to an existing boolean query, that > may be more efficient for lucene to process (flatter structure). > > If you haven't already, could you try putting parens around your > (complicated stuff) to see if it makes any difference? > > -Yonik
Re: Cached fq decreases performance
On Thu, Sep 3, 2015 at 4:45 PM, Jeff Warteswrote: > > I have a query like: > > q==enabled:true > > For purposes of this conversation, "fq=enabled:true" is set for every query, > I never open a new searcher, and this is the only fq I ever use, so the > filter cache size is 1, and the hit ratio is 1. > The fq=enabled:true clause matches about 15% of my documents. I have some 20M > documents per shard, in a 5.3 solrcloud cluster. > > Under these circumstances, this alternate version of the query averages about > 1/3 faster, consumes less CPU, and generates less garbage: > > q= +enabled:true > > So it appears I have a case where using the cached fq result is more > expensive than just putting the same restriction in the query. > Does someone have a clear mental model of how “q” and “fq” interact? Lucene seems to always be changing it's execution model, so it can be difficult to keep up. What version of Solr are you using? Lucene also changed how filters work, so now, a filter is incorporated with the query like so: query = new BooleanQuery.Builder() .add(query, Occur.MUST) .add(pf.filter, Occur.FILTER) .build(); It may be that term queries are no longer worth caching... if this is the case, we could automatically not cache them. It also may be the structure of the query that is making the difference. Solr is creating (complicated stuff) +(filter(enabled:true)) If you added +enabled:true directly to an existing boolean query, that may be more efficient for lucene to process (flatter structure). If you haven't already, could you try putting parens around your (complicated stuff) to see if it makes any difference? -Yonik
Re: Cached fq decreases performance
On 9/4/15, 7:06 AM, "Yonik Seeley"wrote: > >Lucene seems to always be changing it's execution model, so it can be >difficult to keep up. What version of Solr are you using? >Lucene also changed how filters work, so now, a filter is >incorporated with the query like so: > >query = new BooleanQuery.Builder() >.add(query, Occur.MUST) >.add(pf.filter, Occur.FILTER) >.build(); > >It may be that term queries are no longer worth caching... if this is >the case, we could automatically not cache them. > >It also may be the structure of the query that is making the >difference. Solr is creating > >(complicated stuff) +(filter(enabled:true)) > >If you added +enabled:true directly to an existing boolean query, that >may be more efficient for lucene to process (flatter structure). > >If you haven't already, could you try putting parens around your >(complicated stuff) to see if it makes any difference? > >-Yonik I’ll reply at this point in the thread, since it’s addressed to me, but I strongly agree with some of the later comments in the thread about knowing what’s going on. The whole point of this post is that this situation violated my mental heuristics about how to craft a query. In answer to the question, this is a Solrcloud 5.3 cluster. I can provide a little more detail on (complicated stuff) too if that’s helpful. I have not tried putting everything else in parens, but it’s a couple of distinct paren clauses anyway: q=+(_query_:"{!dismax (several fields)}") +(_query_:"{!dismax (several fields)}") +(_query_:"{!dismax (several fields)}") +enabled:true So to be clear, that query template outperforms this one: q=+(_query_:"{!dismax (several fields)}") +(_query_:"{!dismax (several fields)}") +(_query_:"{!dismax (several fields)}")=enabled:true Your comments remind me that I migrated from 5.2.1 to 5.3 while I’ve been doing my performance testing, and I thought I noticed a performance degradation in that transition, but I never followed though to confirm that. I hadn’t tested moving that FQ clause into the Q on 5.2.1, only 5.3.
Re: Cached fq decreases performance
I’m measuring performance in the aggregate, over several minutes and tens of thousands of distinct queries that all use this specific fq. The cache hit count reported is roughly identical to the number of queries I’ve sent, so no, this isn’t a first-query cache-miss situation. The fq result will be large, 15% of my documents qualify, so if solr is intelligent enough to ignore that restriction in the main query until it’s found a much smaller set to scan for that criteria, I could see how simply processing the intersection of the full fq cache value could be time consuming. Is that the kind of thing you’re talking about with intersection hopping? On 9/3/15, 2:00 PM, "Alexandre Rafalovitch"wrote: >FQ has to calculate the result bit set for every document to be able >to cache it. Q will only calculate it for the documents it matches on >and there is some intersection hopping going on. > >Are you seeing this performance hit on first query only or or every >one? I would expect on first query only unless your filter cache size >assumptions are somehow wrong. > >Regards, > Alex. > > >Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: >http://www.solr-start.com/ > > >On 3 September 2015 at 16:45, Jeff Wartes wrote: >> >> I have a query like: >> >> q==enabled:true >> >> For purposes of this conversation, "fq=enabled:true" is set for every >>query, I never open a new searcher, and this is the only fq I ever use, >>so the filter cache size is 1, and the hit ratio is 1. >> The fq=enabled:true clause matches about 15% of my documents. I have >>some 20M documents per shard, in a 5.3 solrcloud cluster. >> >> Under these circumstances, this alternate version of the query averages >>about 1/3 faster, consumes less CPU, and generates less garbage: >> >> q= +enabled:true >> >> So it appears I have a case where using the cached fq result is more >>expensive than just putting the same restriction in the query. >> Does someone have a clear mental model of how “q” and “fq” interact? >> Naively, I’d expect that either the “q” operates within the set matched >>by the fq (in which case it’s doing "complicated stuff" on only a subset >>and should be faster) or that Solr takes the intersection of the q & fq >>sets (in which case putting the restriction in the “q” means that set >>needs to be generated instead of retrieved from cache, and should be >>slower). >> This has me wondering, if you want fq cache speed boosts, but also want >>ranking involved, can you do that? Would something like q=>stuff> AND = help, or just be >>more work? >> >> Thanks.
Cached fq decreases performance
I have a query like: q==enabled:true For purposes of this conversation, "fq=enabled:true" is set for every query, I never open a new searcher, and this is the only fq I ever use, so the filter cache size is 1, and the hit ratio is 1. The fq=enabled:true clause matches about 15% of my documents. I have some 20M documents per shard, in a 5.3 solrcloud cluster. Under these circumstances, this alternate version of the query averages about 1/3 faster, consumes less CPU, and generates less garbage: q= +enabled:true So it appears I have a case where using the cached fq result is more expensive than just putting the same restriction in the query. Does someone have a clear mental model of how “q” and “fq” interact? Naively, I’d expect that either the “q” operates within the set matched by the fq (in which case it’s doing "complicated stuff" on only a subset and should be faster) or that Solr takes the intersection of the q & fq sets (in which case putting the restriction in the “q” means that set needs to be generated instead of retrieved from cache, and should be slower). This has me wondering, if you want fq cache speed boosts, but also want ranking involved, can you do that? Would something like q= AND = help, or just be more work? Thanks.
Re: Cached fq decreases performance
FQ has to calculate the result bit set for every document to be able to cache it. Q will only calculate it for the documents it matches on and there is some intersection hopping going on. Are you seeing this performance hit on first query only or or every one? I would expect on first query only unless your filter cache size assumptions are somehow wrong. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 3 September 2015 at 16:45, Jeff Warteswrote: > > I have a query like: > > q==enabled:true > > For purposes of this conversation, "fq=enabled:true" is set for every query, > I never open a new searcher, and this is the only fq I ever use, so the > filter cache size is 1, and the hit ratio is 1. > The fq=enabled:true clause matches about 15% of my documents. I have some 20M > documents per shard, in a 5.3 solrcloud cluster. > > Under these circumstances, this alternate version of the query averages about > 1/3 faster, consumes less CPU, and generates less garbage: > > q= +enabled:true > > So it appears I have a case where using the cached fq result is more > expensive than just putting the same restriction in the query. > Does someone have a clear mental model of how “q” and “fq” interact? > Naively, I’d expect that either the “q” operates within the set matched by > the fq (in which case it’s doing "complicated stuff" on only a subset and > should be faster) or that Solr takes the intersection of the q & fq sets (in > which case putting the restriction in the “q” means that set needs to be > generated instead of retrieved from cache, and should be slower). > This has me wondering, if you want fq cache speed boosts, but also want > ranking involved, can you do that? Would something like q= > AND = help, or just be more work? > > Thanks.