I'm pretty sure Solr/lucene have no such "optimization" already, but it's not clear to me that it would result in much of a performance benefit, just because of the way lucene works, it's not obvious to me that the second version of your query will be noticeably faster than the first version.

Maybe in cases with many many clauses, rather than the few clauses in your example. You'd definitely want to performance test it to verify there are any gains, before embarking on writing the 'optimization' -- you can test it just by sending the different versions of your real world queries to Solr and seeing what the response times are, calculating the hypothetically 'optimized' version yourself by hand if need be, right?



On 7/27/2011 5:05 PM, Scott Smith wrote:
We have a solr application which ends up creating queries with very complicated 
filters (literally hundreds and sometimes thousands of terms-typically a large 
number of terms OR'ed together where each of these terms might have a half a 
dozen keywords ANDed/ORed together).  In looking at the filters, I realized 
that there are often a lot of common sub-filters.

A simple example of what I mean is:

                 ("cat" AND "dog") OR ("cat" AND "horse")

This could clearly be simplified by saying:

                 "cat" AND ("dog" OR "horse")

It turns out that finding and combining common sub-filters isn't trivial for our 
application.  So, before I start a project to attempt some kind of 
"optimization", my question is whether it's likely that I will see significant 
decreases in query times to justify the development effort it takes to optimize the 
filters.  Certainly, if I thought I might get a 20%+ decrease in time, I would say it's 
probably a good project.  If it's just a few percentage points of improvement, then I'm 
less excited about doing it.

Does Solr already go through some kind of optimization which effectively 
combines common sub-filters and possibly duplicated terms?  Does anyone have 
any thoughts on this subject?

Thanks

Scott

Reply via email to