Re: Intuition check

Mike Klaas Tue, 13 Nov 2007 21:43:30 -0800


On 8-Nov-07, at 4:34 PM, Chris Hostetter wrote:

: First, how to determine whether the filter-embedding would beeffective? We
        ...
: really available. It can be estimated assuming the filter andquery are
: independent, but this definitely isn't always true.  If the filter

I was assuming we could use a simple hueristic...
    if( configOption < docSet.size()/numDocs() )

Another case that comes to mind is if the matching query is aMatchAllDocsQuery, in which case the filter should probably be useddirectly.

: Second, embedding the filter itself. This is much morenettlesome within: SolrIndexSearcher than within one of the request handlers. Oneproblem is the
really?  why should it be?

Sorry, that sentence was the product of thinking-while-responding,which is always a recipe for being wrong <g>. I had a particularquery structure in mind, one that had the matching clauses embeddedin the inner "core" of the query with several layers of scoremodification queries wrapped on top of this (e.g. dismax's variousboost queries; yonik's multiplicative boost queries). I wasimagining that it was necessary to embed the filter clauses in the"core" to produce an effective implementation. By the time Ifinished my response, I had read enough of the relevant Lucene scorercode (in particular, ReqOptScorer) to realize that the benefits wouldbe had using an outer-layer ConjunctionQuery as well.

anything the request handler can do to much with the Query object
SolrIndexSearcher can do as well .. and by the time

getDocListNC/getDocListAndSetNC are called the "pure negative"issues are

alrady resolved.

The only difference is that in those methods we already have a DocSet

(instead of a Query) but it should be easy to wrap a DocSet in aQuery to

add to the main query.

: ISTM then that the main challenge is in determining when the filter

: intersection should be embedded. Also, the ability to controlfilter caching

: is still difficult with this implementation, but perhaps that's less
: important.

yeah ... it seems like there are two orthoginal use cases...
  1) "here is an 'fq', i know it's not worth caching" ... in which we
     don't put it in the filterCache.

2) "here is an 'fq'" ... in which we get the DocSet and add it tothe

     main query if it's small.

for any given input, 1 and 2 might both apply, or just 1, or just 2

True. I'm tempted to implement a <!nocache> directive via embedding(without advertising the fact), and work on the fq optimizationseparately.


Thanks,
-Mike

Re: Intuition check

Reply via email to