Re: OR-FilterQuery

Erick Erickson Tue, 14 Feb 2012 17:28:03 -0800

Ah, OK, I misread your post apparently. And yes, what you suggest
would result in some efficiencies, but at present I don't think there's any
syntax that allows one to combine filter queries as you suggest. There
was some discussion about it in the JIRA I referenced, but no action that
I could see.


That is, efficiencies in some circumstances, though I think it would be
hard to predict. For instance, imagine a set of 100 entries in an FQ. And
no, I'm not making things up, I've seen applications where this makes
sense. Splitting that out into 100 separate entries in the filterCache would
use up a lot of space. Likewise, I suspect that the actual process of
creating the heuristics that were able to analyze an incoming filter
query and "do the right thing" in terms of splitting it up and recombining
it would be pretty hairy. Local parameters for instance, and let's throw in
dereferencing too <G>...

So I suspect that this is one of those features that is quite easy to see
the benefits of in the simple case, but pretty quickly becomes a
nightmare to actually implement correctly, but that's mostly
a guess.

And before putting the work into it, I think modeling the actual
benefits would be wise, as well as convincing myself that there
are enough cases where this *would* be beneficial. I mean Solr
does a pretty reasonable job of caching these anyway, and with the
"non-cached" filters it's not clear to me that the benefits are
sufficient...

Good luck, though, if you want to tackle it!
Erick



On Tue, Feb 14, 2012 at 4:54 PM, Em <mailformailingli...@yahoo.de> wrote:
> Hi Erick,
>
>> Whoa!
>>
>> fq=id(1 OR 2)
>> is not the same thing at all as
>> fq=id:1&fq=id:2
> Ahm, who said they would be the same? :)
> I mean, you are completely right in what you are saying but it seems to
> me that we are talking about two different things.
>
> I was talking about caching each filter-criteria instead of the whole
> filter-query to recombine the cached filter-criteria based on the
> boolean-operators the client sends.
>
> In other words:
> currently
> fq=id:1 OR id:2
> results into ONE cached filter-entry.
>
> fq=id:2 OR id:1
> results into ANOTHER cached filter-entry
>
> fq=id:2 AND id:1
> results into (surprise, surprise) a third filter-entry (although this
> example does not make sense).
>
> My idea was to cache each filter-criteria, that means caching the bitset
> for id:1 and the bitset for id:2 to recombine both bitsets via AND, OR,
> NOT etc. whenever this is neccessary.
>
> This way one could save memory (and maybe computing-time as well) which
> definitely makes sense when you got a way smaller set of
> filter-criterias while having a much larger set of possible (and used)
> combinations of each filter-criteria with a small number of repetitions
> per combination (which would destroy the benefit of caching).
>
> Don't you agree?
>
> Kind regards,
> Em
>
>
> Am 14.02.2012 22:33, schrieb Erick Erickson:
>> Whoa!
>>
>> fq=id(1 OR 2)
>> is not the same thing at all as
>> fq=id:1&fq=id:2
>>
>> Assuming that any document had one and only one ID,  the second clause
>> would return exactly 0 documents, each and every time.
>>
>> Multiple fq clauses are essentially set intersections. So the first query is 
>> the
>> set of all documents where id is 1 or 2
>> the second is the intersection of two sets of documents, one set
>> with an id of 1 and one with an id of 2. Not the same thing at all.
>>
>> There's no support for the concept of
>> (fq=id:1 OR fq=id:2)
>>
>> Best
>> Erick
>>
>> On Tue, Feb 14, 2012 at 2:13 PM, Em <mailformailingli...@yahoo.de> wrote:
>>> Hi Mikhail,
>>>
>>> thanks for kicking in some brainstorming-code!
>>> The given thread is almost a year old and I was working with Solr in my
>>> freetime to see where it fails to behave/perform as I expect/wish.
>>>
>>> I found out that if you got a lot of different access-patterns for a
>>> filter-query, you might end up with either a big cache to make things
>>> fast or with lower performance (impact depends on usecase and
>>> circumstances).
>>>
>>> Scenario:
>>> You got a permission-field and the client is able to filter by one to
>>> three permission-values.
>>> That is:
>>> fq=foo:user
>>> fq=foo:moderator
>>> fq=foo:manager
>>>
>>> If you can not control/guarantee the order of the fq's values, you could
>>> end up with a lot of mess which all returns the same.
>>>
>>> Example:
>>> fq=permission:user OR permission:moderator OR permission:manager
>>> fq=permission:user OR permission:manager OR permission:moderator
>>> fq=permission:moderator OR permission:user OR permission:manager
>>> ...
>>> They all return the same but where cached seperately which leads to the
>>> fact that you are wasting memory a lot.
>>>
>>> Furthermore, if your access pattern will lead to a lot of different fq's
>>> on a small set of distinct values, it may make more sense to cache each
>>> filter-query for itself from a memory-consuming point of view (may cost
>>> a little bit performance).
>>>
>>> That beeing said, if you cache a filter for foo:user, foo:moderator and
>>> foo:manager you can combine those filters with AND, OR, NOT or whatever
>>> without recomputing every filter over and over again which would be the
>>> case if your filter-cache is not large enough.
>>>
>>> However, I never compared the performance differences (in terms of
>>> speed) of a cached filter-query like
>>> foo:bar OR foo:baz
>>> With a combination of two cached filter-queries like
>>> foo:bar
>>> foo:baz
>>> combined by a logical OR.
>>>
>>> That's how the background looks like.
>>> Unfortunately I didn't had the time to implement this in the past.
>>>
>>> Back to your post:
>>> Looks like a cool idea and is almost what I had in mind!
>>>
>>> I would formulate an easier syntax so that one is able to "parse" each
>>> fq-clause on its own to cache the CachingWrapperFilter to reuse it again.
>>>
>>>> it will use per segment bitset at contrast to Solr's fq which caches for
>>>> top level reader.
>>> Could you explain why this bitset would be per-segment based, please?
>>> I don't see a reason why this *have* to be so.
>>> What is the benefit you are seeing?
>>>
>>> Kind regards,
>>> Em
>>>
>>> Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
>>>> Hi Em,
>>>>
>>>> I briefly read the thread. Are you talking about combing of cached clauses
>>>> of BooleanQuery, instead of evaluating whole BQ as a filter?
>>>>
>>>> I found something like that in API (but only in API)
>>>> http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)
>>>>
>>>> Am I get you right? Why do you need it, btw? If I'm ..
>>>> I have idea how to do it in two mins:
>>>>
>>>> q=+f:text
>>>> +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 
>>>> _query_:{!fq}id:4)...
>>>>
>>>> Right leg will be a BooleanQuery with SHOULD clauses backed on cached
>>>> queries (see below).
>>>>
>>>> if you are not scarred by the syntax yet you can implement trivial
>>>> "fq"QParserPlugin, which will be just
>>>>
>>>> // lazily through User/Generic Cache
>>>> q = new FilteredQuery (new MatchAllDocsQuery(), new
>>>> CachingWrapperFilter(new
>>>> QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V)))));
>>>> return q;
>>>>
>>>> it will use per segment bitset at contrast to Solr's fq which caches for
>>>> top level reader.
>>>>
>>>> WDYT?
>>>>
>>>> On Mon, Feb 13, 2012 at 11:34 PM, Em <mailformailingli...@yahoo.de> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> have a look at:
>>>>> http://search-lucene.com/m/Z8lWGEiKoI
>>>>>
>>>>> I think not much had changed since then.
>>>>>
>>>>> Regards,
>>>>> Em
>>>>>
>>>>> Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
>>>>>> Hi,
>>>>>>
>>>>>> how efficent is such an query:
>>>>>>
>>>>>> q=some text
>>>>>> fq=id:(1 OR 2 OR 3...)
>>>>>>
>>>>>> Should I better use q:some text AND id:(1 OR 2 OR 3...)?
>>>>>>
>>>>>> Is the Filter Cache used for the OR'ed fq?
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>

Re: OR-FilterQuery

Reply via email to