Re: Query parsing VS marshalling/unmarshalling

Erick Erickson Tue, 24 Apr 2012 10:39:27 -0700

If you're assembling an fq clause, this is all done or you, although
you need to take some care to form the fq clause _exactly_
the same way each time. Think of the filterCache as a key/value
map where the key is the raw fq text and the value is the docs
satisfying that query.


So fq=acl:(a OR a) will not, for instance, match
     fq=acl:(b OR a)

FWIW
Erick

2012/4/24 Mindaugas Žakšauskas <min...@gmail.com>:
> Hi Erick,
>
> Thanks for looking into this and for the tips you've sent.
>
> I am leaning towards custom query component at the moment, the primary
> reason for it would be to be able to squeeze the amount of data that
> is sent over to Solr. A single round trip within the same datacenter
> is worth around 0.5 ms [1] and if query doesn't fit into a single
> ethernet packet, this number effectively has to double/triple/etc.
>
> Regarding cache filters - I was actually thinking the opposite:
> caching ACL queries (filter queries) would be beneficial as those tend
> to be the same across multiple search requests.
>
> [1] 
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//people/jeff/stanford-295-talk.pdf
> , slide 13
>
> m.
>
> On Tue, Apr 24, 2012 at 4:43 PM, Erick Erickson <erickerick...@gmail.com> 
> wrote:
>> In general, query parsing is such a small fraction of the total time that,
>> almost no matter how complex, it's not worth worrying about. To see
>> this, attach &debugQuery=on to your query and look at the timings
>> in the "pepare" and "process" portions of the response. I'd  be
>> very sure that it was a problem before spending any time trying to make
>> the transmission of the data across the wire more efficient, my first
>> reaction is that this is premature optimization.
>>
>> Second, you could do this on the server side with a custom query
>> component if you chose. You can freely modify the query
>> over there and it may make sense in your situation.
>>
>> Third, consider "no cache filters", which were developed for
>> expensive filter queries, ACL being one of them. See:
>> https://issues.apache.org/jira/browse/SOLR-2429
>>
>> Fourth, I'd ask if there's a way to reduce the size of the FQ
>> clause. Is this on a particular user basis or groups basis?
>> If you can get this down to a few groups that would help. Although
>> there's often some outlier who is member of thousands of
>> groups :(.
>>
>> Best
>> Erick
>>
>>
>> 2012/4/24 Mindaugas Žakšauskas <min...@gmail.com>:
>>> On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies <bimargul...@gmail.com> 
>>> wrote:
>>>> I'm about to try out a contribution for serializing queries in
>>>> Javascript using Jackson. I've previously done this by serializing my
>>>> own data structure and putting the JSON into a custom query parameter.
>>>
>>> Thanks for your reply. Appreciate your effort, but I'm not sure if I
>>> fully understand the gain.
>>>
>>> Having data in JSON would still require it to be converted into Lucene
>>> Query at the end which takes space & CPU effort, right? Or are you
>>> saying that having query serialized into a structured data blob (JSON
>>> in this case) makes it somehow easier to convert it into Lucene Query?
>>>
>>> I only thought about Java serialization because:
>>> - it's rather close to the in-object format
>>> - the mechanism is rather stable and is an established standard in Java/JVM
>>> - Lucene Queries seem to implement java.io.Serializable (haven't done
>>> a thorough check but looks good on the surface)
>>> - other conversions (e.g. using Xtream) are either slow or require
>>> custom annotations. I personally don't see how would Lucene/Solr
>>> include them in their core classes.
>>>
>>> Anyway, it would still be interesting to hear if anyone could
>>> elaborate on query parsing complexity.
>>>
>>> m.

Re: Query parsing VS marshalling/unmarshalling

Reply via email to