Hi Erick,

> Whoa!
>
> fq=id(1 OR 2)
> is not the same thing at all as
> fq=id:1&fq=id:2
Ahm, who said they would be the same? :)
I mean, you are completely right in what you are saying but it seems to
me that we are talking about two different things.

I was talking about caching each filter-criteria instead of the whole
filter-query to recombine the cached filter-criteria based on the
boolean-operators the client sends.

In other words:
currently
fq=id:1 OR id:2
results into ONE cached filter-entry.

fq=id:2 OR id:1
results into ANOTHER cached filter-entry

fq=id:2 AND id:1
results into (surprise, surprise) a third filter-entry (although this
example does not make sense).

My idea was to cache each filter-criteria, that means caching the bitset
for id:1 and the bitset for id:2 to recombine both bitsets via AND, OR,
NOT etc. whenever this is neccessary.

This way one could save memory (and maybe computing-time as well) which
definitely makes sense when you got a way smaller set of
filter-criterias while having a much larger set of possible (and used)
combinations of each filter-criteria with a small number of repetitions
per combination (which would destroy the benefit of caching).

Don't you agree?

Kind regards,
Em


Am 14.02.2012 22:33, schrieb Erick Erickson:
> Whoa!
> 
> fq=id(1 OR 2)
> is not the same thing at all as
> fq=id:1&fq=id:2
> 
> Assuming that any document had one and only one ID,  the second clause
> would return exactly 0 documents, each and every time.
> 
> Multiple fq clauses are essentially set intersections. So the first query is 
> the
> set of all documents where id is 1 or 2
> the second is the intersection of two sets of documents, one set
> with an id of 1 and one with an id of 2. Not the same thing at all.
> 
> There's no support for the concept of
> (fq=id:1 OR fq=id:2)
> 
> Best
> Erick
> 
> On Tue, Feb 14, 2012 at 2:13 PM, Em <mailformailingli...@yahoo.de> wrote:
>> Hi Mikhail,
>>
>> thanks for kicking in some brainstorming-code!
>> The given thread is almost a year old and I was working with Solr in my
>> freetime to see where it fails to behave/perform as I expect/wish.
>>
>> I found out that if you got a lot of different access-patterns for a
>> filter-query, you might end up with either a big cache to make things
>> fast or with lower performance (impact depends on usecase and
>> circumstances).
>>
>> Scenario:
>> You got a permission-field and the client is able to filter by one to
>> three permission-values.
>> That is:
>> fq=foo:user
>> fq=foo:moderator
>> fq=foo:manager
>>
>> If you can not control/guarantee the order of the fq's values, you could
>> end up with a lot of mess which all returns the same.
>>
>> Example:
>> fq=permission:user OR permission:moderator OR permission:manager
>> fq=permission:user OR permission:manager OR permission:moderator
>> fq=permission:moderator OR permission:user OR permission:manager
>> ...
>> They all return the same but where cached seperately which leads to the
>> fact that you are wasting memory a lot.
>>
>> Furthermore, if your access pattern will lead to a lot of different fq's
>> on a small set of distinct values, it may make more sense to cache each
>> filter-query for itself from a memory-consuming point of view (may cost
>> a little bit performance).
>>
>> That beeing said, if you cache a filter for foo:user, foo:moderator and
>> foo:manager you can combine those filters with AND, OR, NOT or whatever
>> without recomputing every filter over and over again which would be the
>> case if your filter-cache is not large enough.
>>
>> However, I never compared the performance differences (in terms of
>> speed) of a cached filter-query like
>> foo:bar OR foo:baz
>> With a combination of two cached filter-queries like
>> foo:bar
>> foo:baz
>> combined by a logical OR.
>>
>> That's how the background looks like.
>> Unfortunately I didn't had the time to implement this in the past.
>>
>> Back to your post:
>> Looks like a cool idea and is almost what I had in mind!
>>
>> I would formulate an easier syntax so that one is able to "parse" each
>> fq-clause on its own to cache the CachingWrapperFilter to reuse it again.
>>
>>> it will use per segment bitset at contrast to Solr's fq which caches for
>>> top level reader.
>> Could you explain why this bitset would be per-segment based, please?
>> I don't see a reason why this *have* to be so.
>> What is the benefit you are seeing?
>>
>> Kind regards,
>> Em
>>
>> Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
>>> Hi Em,
>>>
>>> I briefly read the thread. Are you talking about combing of cached clauses
>>> of BooleanQuery, instead of evaluating whole BQ as a filter?
>>>
>>> I found something like that in API (but only in API)
>>> http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)
>>>
>>> Am I get you right? Why do you need it, btw? If I'm ..
>>> I have idea how to do it in two mins:
>>>
>>> q=+f:text
>>> +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 
>>> _query_:{!fq}id:4)...
>>>
>>> Right leg will be a BooleanQuery with SHOULD clauses backed on cached
>>> queries (see below).
>>>
>>> if you are not scarred by the syntax yet you can implement trivial
>>> "fq"QParserPlugin, which will be just
>>>
>>> // lazily through User/Generic Cache
>>> q = new FilteredQuery (new MatchAllDocsQuery(), new
>>> CachingWrapperFilter(new
>>> QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V)))));
>>> return q;
>>>
>>> it will use per segment bitset at contrast to Solr's fq which caches for
>>> top level reader.
>>>
>>> WDYT?
>>>
>>> On Mon, Feb 13, 2012 at 11:34 PM, Em <mailformailingli...@yahoo.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> have a look at:
>>>> http://search-lucene.com/m/Z8lWGEiKoI
>>>>
>>>> I think not much had changed since then.
>>>>
>>>> Regards,
>>>> Em
>>>>
>>>> Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
>>>>> Hi,
>>>>>
>>>>> how efficent is such an query:
>>>>>
>>>>> q=some text
>>>>> fq=id:(1 OR 2 OR 3...)
>>>>>
>>>>> Should I better use q:some text AND id:(1 OR 2 OR 3...)?
>>>>>
>>>>> Is the Filter Cache used for the OR'ed fq?
>>>>>
>>>>> Thank you
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
> 

Reply via email to