Re: Cached fq decreases performance

2015-09-04 Thread Alexandre Rafalovitch
Yes please.:
http://www.amazon.com/Solr-Troubleshooting-Maintenance-Alexandre-Rafalovitch/dp/1491920149/

:-)

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 4 September 2015 at 10:30, Yonik Seeley  wrote:
> On Fri, Sep 4, 2015 at 10:18 AM, Alexandre Rafalovitch
>  wrote:
>> Yonik,
>>
>> Is this all visible on query debug level?
>
> Nope, unfortunately not.
>
> This is part of a bigger issue we should work at doing better at for
> Solr 6: debugability / supportability.
> For a specific request, what took up the memory, what cache misses or
> cache instantiations were there, how much request-specific memory was
> allocated, how much shared memory was needed to satisfy the request,
> etc.
>
> -Yonik


Re: Cached fq decreases performance

2015-09-04 Thread Yonik Seeley
>> This is part of a bigger issue we should work at doing better at for
>> Solr 6: debugability / supportability.
>> For a specific request, what took up the memory, what cache misses or
>> cache instantiations were there, how much request-specific memory was
>> allocated, how much shared memory was needed to satisfy the request,
>> etc.

Oh, and if we have the ability to *tell* when a request is going to
allocate a big chunk of memory,
then we should also be able to either prevent it from happening or
terminate the request shortly after.

So one could say, only allow this request to:
- cause 500MB more of shared memory to be allocated (like field cache)
- only allow it to use 5GB of shared memory total (so successive
queries don't keep upping the total amount allocated)
- only allow 100MB of request-specific memory to be allocated

-Yonik


Re: Cached fq decreases performance

2015-09-04 Thread Yonik Seeley
On Fri, Sep 4, 2015 at 10:18 AM, Alexandre Rafalovitch
 wrote:
> Yonik,
>
> Is this all visible on query debug level?

Nope, unfortunately not.

This is part of a bigger issue we should work at doing better at for
Solr 6: debugability / supportability.
For a specific request, what took up the memory, what cache misses or
cache instantiations were there, how much request-specific memory was
allocated, how much shared memory was needed to satisfy the request,
etc.

-Yonik


Re: Cached fq decreases performance

2015-09-04 Thread Alexandre Rafalovitch
Yonik,

Is this all visible on query debug level? Would it be effective to ask
to run both queries with debug enabled and to share the expanded query
value? Would that show up the differences between Lucene
implementations you described?

(Looking for troubleshooting tips to reuse).

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 4 September 2015 at 10:06, Yonik Seeley  wrote:
> On Thu, Sep 3, 2015 at 4:45 PM, Jeff Wartes  wrote:
>>
>> I have a query like:
>>
>> q==enabled:true
>>
>> For purposes of this conversation, "fq=enabled:true" is set for every query, 
>> I never open a new searcher, and this is the only fq I ever use, so the 
>> filter cache size is 1, and the hit ratio is 1.
>> The fq=enabled:true clause matches about 15% of my documents. I have some 
>> 20M documents per shard, in a 5.3 solrcloud cluster.
>>
>> Under these circumstances, this alternate version of the query averages 
>> about 1/3 faster, consumes less CPU, and generates less garbage:
>>
>> q= +enabled:true
>>
>> So it appears I have a case where using the cached fq result is more 
>> expensive than just putting the same restriction in the query.
>> Does someone have a clear mental model of how “q” and “fq” interact?
>
> Lucene seems to always be changing it's execution model, so it can be
> difficult to keep up.  What version of Solr are you using?
> Lucene also changed how filters work,  so now, a filter is
> incorporated with the query like so:
>
> query = new BooleanQuery.Builder()
> .add(query, Occur.MUST)
> .add(pf.filter, Occur.FILTER)
> .build();
>
> It may be that term queries are no longer worth caching... if this is
> the case, we could automatically not cache them.
>
> It also may be the structure of the query that is making the
> difference.  Solr is creating
>
> (complicated stuff) +(filter(enabled:true))
>
> If you added +enabled:true directly to an existing boolean query, that
> may be more efficient for lucene to process (flatter structure).
>
> If you haven't already, could you try putting parens around your
> (complicated stuff) to see if it makes any difference?
>
> -Yonik


Re: Cached fq decreases performance

2015-09-04 Thread Yonik Seeley
On Thu, Sep 3, 2015 at 4:45 PM, Jeff Wartes  wrote:
>
> I have a query like:
>
> q==enabled:true
>
> For purposes of this conversation, "fq=enabled:true" is set for every query, 
> I never open a new searcher, and this is the only fq I ever use, so the 
> filter cache size is 1, and the hit ratio is 1.
> The fq=enabled:true clause matches about 15% of my documents. I have some 20M 
> documents per shard, in a 5.3 solrcloud cluster.
>
> Under these circumstances, this alternate version of the query averages about 
> 1/3 faster, consumes less CPU, and generates less garbage:
>
> q= +enabled:true
>
> So it appears I have a case where using the cached fq result is more 
> expensive than just putting the same restriction in the query.
> Does someone have a clear mental model of how “q” and “fq” interact?

Lucene seems to always be changing it's execution model, so it can be
difficult to keep up.  What version of Solr are you using?
Lucene also changed how filters work,  so now, a filter is
incorporated with the query like so:

query = new BooleanQuery.Builder()
.add(query, Occur.MUST)
.add(pf.filter, Occur.FILTER)
.build();

It may be that term queries are no longer worth caching... if this is
the case, we could automatically not cache them.

It also may be the structure of the query that is making the
difference.  Solr is creating

(complicated stuff) +(filter(enabled:true))

If you added +enabled:true directly to an existing boolean query, that
may be more efficient for lucene to process (flatter structure).

If you haven't already, could you try putting parens around your
(complicated stuff) to see if it makes any difference?

-Yonik


Re: Cached fq decreases performance

2015-09-04 Thread Jeff Wartes


On 9/4/15, 7:06 AM, "Yonik Seeley"  wrote:
>
>Lucene seems to always be changing it's execution model, so it can be
>difficult to keep up.  What version of Solr are you using?
>Lucene also changed how filters work,  so now, a filter is
>incorporated with the query like so:
>
>query = new BooleanQuery.Builder()
>.add(query, Occur.MUST)
>.add(pf.filter, Occur.FILTER)
>.build();
>
>It may be that term queries are no longer worth caching... if this is
>the case, we could automatically not cache them.
>
>It also may be the structure of the query that is making the
>difference.  Solr is creating
>
>(complicated stuff) +(filter(enabled:true))
>
>If you added +enabled:true directly to an existing boolean query, that
>may be more efficient for lucene to process (flatter structure).
>
>If you haven't already, could you try putting parens around your
>(complicated stuff) to see if it makes any difference?
>
>-Yonik


I’ll reply at this point in the thread, since it’s addressed to me, but I
strongly agree with some of the later comments in the thread about knowing
what’s going on. The whole point of this post is that this situation
violated my mental heuristics about how to craft a query.

In answer to the question, this is a Solrcloud 5.3 cluster. I can provide
a little more detail on (complicated stuff) too if that’s helpful. I have
not tried putting everything else in parens, but it’s a couple of distinct
paren clauses anyway:

q=+(_query_:"{!dismax (several fields)}") +(_query_:"{!dismax (several
fields)}") +(_query_:"{!dismax (several fields)}") +enabled:true

So to be clear, that query template outperforms this one:
q=+(_query_:"{!dismax (several fields)}") +(_query_:"{!dismax (several
fields)}") +(_query_:"{!dismax (several fields)}")=enabled:true


Your comments remind me that I migrated from 5.2.1 to 5.3 while I’ve been
doing my performance testing, and I thought I noticed a performance
degradation in that transition, but I never followed though to confirm
that. I hadn’t tested moving that FQ clause into the Q on 5.2.1, only 5.3.






Re: Cached fq decreases performance

2015-09-03 Thread Jeff Wartes

I’m measuring performance in the aggregate, over several minutes and tens
of thousands of distinct queries that all use this specific fq.
The cache hit count reported is roughly identical to the number of queries
I’ve sent, so no, this isn’t a first-query cache-miss situation.

The fq result will be large, 15% of my documents qualify, so if solr is
intelligent enough to ignore that restriction in the main query until it’s
found a much smaller set to scan for that criteria, I could see how simply
processing the intersection of the full fq cache value could be time
consuming. Is that the kind of thing you’re talking about with
intersection hopping?


On 9/3/15, 2:00 PM, "Alexandre Rafalovitch"  wrote:

>FQ has to calculate the result bit set for every document to be able
>to cache it. Q will only calculate it for the documents it matches on
>and there is some intersection hopping going on.
>
>Are you seeing this performance hit on first query only or or every
>one? I would expect on first query only unless your filter cache size
>assumptions are somehow wrong.
>
>Regards,
>   Alex.
>
>
>Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>http://www.solr-start.com/
>
>
>On 3 September 2015 at 16:45, Jeff Wartes  wrote:
>>
>> I have a query like:
>>
>> q==enabled:true
>>
>> For purposes of this conversation, "fq=enabled:true" is set for every
>>query, I never open a new searcher, and this is the only fq I ever use,
>>so the filter cache size is 1, and the hit ratio is 1.
>> The fq=enabled:true clause matches about 15% of my documents. I have
>>some 20M documents per shard, in a 5.3 solrcloud cluster.
>>
>> Under these circumstances, this alternate version of the query averages
>>about 1/3 faster, consumes less CPU, and generates less garbage:
>>
>> q= +enabled:true
>>
>> So it appears I have a case where using the cached fq result is more
>>expensive than just putting the same restriction in the query.
>> Does someone have a clear mental model of how “q” and “fq” interact?
>> Naively, I’d expect that either the “q” operates within the set matched
>>by the fq (in which case it’s doing "complicated stuff" on only a subset
>>and should be faster) or that Solr takes the intersection of the q & fq
>>sets (in which case putting the restriction in the “q” means that set
>>needs to be generated instead of retrieved from cache, and should be
>>slower).
>> This has me wondering, if you want fq cache speed boosts, but also want
>>ranking involved, can you do that? Would something like q=>stuff> AND = help, or just be
>>more work?
>>
>> Thanks.



Cached fq decreases performance

2015-09-03 Thread Jeff Wartes

I have a query like:

q==enabled:true

For purposes of this conversation, "fq=enabled:true" is set for every query, I 
never open a new searcher, and this is the only fq I ever use, so the filter 
cache size is 1, and the hit ratio is 1.
The fq=enabled:true clause matches about 15% of my documents. I have some 20M 
documents per shard, in a 5.3 solrcloud cluster.

Under these circumstances, this alternate version of the query averages about 
1/3 faster, consumes less CPU, and generates less garbage:

q= +enabled:true

So it appears I have a case where using the cached fq result is more expensive 
than just putting the same restriction in the query.
Does someone have a clear mental model of how “q” and “fq” interact?
Naively, I’d expect that either the “q” operates within the set matched by the 
fq (in which case it’s doing "complicated stuff" on only a subset and should be 
faster) or that Solr takes the intersection of the q & fq sets (in which case 
putting the restriction in the “q” means that set needs to be generated instead 
of retrieved from cache, and should be slower).
This has me wondering, if you want fq cache speed boosts, but also want ranking 
involved, can you do that? Would something like q= AND 
= help, or just be more work?

Thanks.


Re: Cached fq decreases performance

2015-09-03 Thread Alexandre Rafalovitch
FQ has to calculate the result bit set for every document to be able
to cache it. Q will only calculate it for the documents it matches on
and there is some intersection hopping going on.

Are you seeing this performance hit on first query only or or every
one? I would expect on first query only unless your filter cache size
assumptions are somehow wrong.

Regards,
   Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 3 September 2015 at 16:45, Jeff Wartes  wrote:
>
> I have a query like:
>
> q==enabled:true
>
> For purposes of this conversation, "fq=enabled:true" is set for every query, 
> I never open a new searcher, and this is the only fq I ever use, so the 
> filter cache size is 1, and the hit ratio is 1.
> The fq=enabled:true clause matches about 15% of my documents. I have some 20M 
> documents per shard, in a 5.3 solrcloud cluster.
>
> Under these circumstances, this alternate version of the query averages about 
> 1/3 faster, consumes less CPU, and generates less garbage:
>
> q= +enabled:true
>
> So it appears I have a case where using the cached fq result is more 
> expensive than just putting the same restriction in the query.
> Does someone have a clear mental model of how “q” and “fq” interact?
> Naively, I’d expect that either the “q” operates within the set matched by 
> the fq (in which case it’s doing "complicated stuff" on only a subset and 
> should be faster) or that Solr takes the intersection of the q & fq sets (in 
> which case putting the restriction in the “q” means that set needs to be 
> generated instead of retrieved from cache, and should be slower).
> This has me wondering, if you want fq cache speed boosts, but also want 
> ranking involved, can you do that? Would something like q= 
> AND = help, or just be more work?
>
> Thanks.