Re: Is there any performance cost of using lots of OR in the solr query

2012-04-06 Thread Shawn Heisey

On 4/5/2012 3:49 PM, Erick Erickson wrote:

Of course putting more clauses in an OR query will
have a performance cost, there's more work to do

OK, being a smart-alec aside you will probably
be fine with a few hundred clauses. The question
is simply whether the performance hit is acceptable.
I'm afraid that question can't be answered in the
abstract, you'll have to test...

Since you're putting them in an fq, there's also some chance
that they'll be re-used from the cache, at least if there
are common patterns.


Roz,

I have a similar situation going on in my index.  Because employees have 
access to far more than real users, they get filter queries constructed 
that have HUGE number of clauses in them.  We have implemented a new 
field for a feature that we call search groups but it has not 
penetrated all aspects of the application yet.  Also, until we can make 
those groups use a hierarchy, which is not a trivial undertaking, we may 
be stuck with large filter queries.


These complex filters have led to a problem that you have probably not 
considered - really long filterCache autowarm times.  I have reduced the 
autoWarm value on my filterCache to FOUR, and there are still times that 
the autowarm takes up to 60 seconds.  Most of the time it is only a few 
seconds, with up to 30 seconds being relatively common.


I just thought of a new localparam feature for this situation and filed 
SOLR-.  I will talk to our developers about using the existing 
localparam that skips filterCache entirely.


Thanks,
Shawn



Re: Is there any performance cost of using lots of OR in the solr query

2012-04-06 Thread Erick Erickson
Shawn:

Ahhh, so *that* was what your JIRA was about

Consider https://issues.apache.org/jira/browse/SOLR-2429
for your ACL calculations, that's what this was developed
for.

The basic idea is that you can write a custom filter that returns
whether the document should be included in the results set that's
only called _after_ all other clauses (search and FQs) have been
satisfied.

Here's the issue. Normally, fqs are calculated across the entire
document set. That's what allows them to be cached and
re-used. But, as you've found, doing ACL calculations
for the entire document set is expensive. So this is an attempt
to make a lower-cost alternative. The downside is that it is NOT
cached, so it must be calculated anew each time. But it's only
calculated for a subset of documents.

Best
Erick

On Fri, Apr 6, 2012 at 9:00 AM, Shawn Heisey s...@elyograg.org wrote:
 On 4/5/2012 3:49 PM, Erick Erickson wrote:

 Of course putting more clauses in an OR query will
 have a performance cost, there's more work to do

 OK, being a smart-alec aside you will probably
 be fine with a few hundred clauses. The question
 is simply whether the performance hit is acceptable.
 I'm afraid that question can't be answered in the
 abstract, you'll have to test...

 Since you're putting them in an fq, there's also some chance
 that they'll be re-used from the cache, at least if there
 are common patterns.


 Roz,

 I have a similar situation going on in my index.  Because employees have
 access to far more than real users, they get filter queries constructed that
 have HUGE number of clauses in them.  We have implemented a new field for a
 feature that we call search groups but it has not penetrated all aspects
 of the application yet.  Also, until we can make those groups use a
 hierarchy, which is not a trivial undertaking, we may be stuck with large
 filter queries.

 These complex filters have led to a problem that you have probably not
 considered - really long filterCache autowarm times.  I have reduced the
 autoWarm value on my filterCache to FOUR, and there are still times that the
 autowarm takes up to 60 seconds.  Most of the time it is only a few seconds,
 with up to 30 seconds being relatively common.

 I just thought of a new localparam feature for this situation and filed
 SOLR-.  I will talk to our developers about using the existing
 localparam that skips filterCache entirely.

 Thanks,
 Shawn



Re: Is there any performance cost of using lots of OR in the solr query

2012-04-05 Thread Erick Erickson
Of course putting more clauses in an OR query will
have a performance cost, there's more work to do

OK, being a smart-alec aside you will probably
be fine with a few hundred clauses. The question
is simply whether the performance hit is acceptable.
I'm afraid that question can't be answered in the
abstract, you'll have to test...

Since you're putting them in an fq, there's also some chance
that they'll be re-used from the cache, at least if there
are common patterns.

Best
Erick

On Wed, Apr 4, 2012 at 8:05 PM, roz dev rozde...@gmail.com wrote:
 Hi All,

 I am working on an application which makes few solr calls to get the data.

 On the high level, We have a requirement like this


   - Make first call to Solr, to get the list of products which are
   children of a given category
   - Make 2nd solr call to get product documents based on a list of product
   ids

 2nd query will look like

 q=document_type:SKUfq=product_id:(34 OR 45 OR 56 OR 77)

 We can have close to 100 product ids in fq.

 is there a performance cost of doing these solr calls which have lots of OR?

 As per Slide # 41 of Presentation The Seven Deadly Sins of Solr, it is a
 bad idea to have these kind of queries.

 http://www.slideshare.net/lucenerevolution/hill-jay-7-sins-of-solrpdf

 But, It does not become clear the reason it is bad.

 Any inputs will be welcome.

 Thanks

 Saroj


Is there any performance cost of using lots of OR in the solr query

2012-04-04 Thread roz dev
Hi All,

I am working on an application which makes few solr calls to get the data.

On the high level, We have a requirement like this


   - Make first call to Solr, to get the list of products which are
   children of a given category
   - Make 2nd solr call to get product documents based on a list of product
   ids

2nd query will look like

q=document_type:SKUfq=product_id:(34 OR 45 OR 56 OR 77)

We can have close to 100 product ids in fq.

is there a performance cost of doing these solr calls which have lots of OR?

As per Slide # 41 of Presentation The Seven Deadly Sins of Solr, it is a
bad idea to have these kind of queries.

http://www.slideshare.net/lucenerevolution/hill-jay-7-sins-of-solrpdf

But, It does not become clear the reason it is bad.

Any inputs will be welcome.

Thanks

Saroj