On 8-Nov-07, at 4:34 PM, Chris Hostetter wrote:


: First, how to determine whether the filter-embedding would be effective? We
        ...
: really available. It can be estimated assuming the filter and query are
: independent, but this definitely isn't always true.  If the filter

I was assuming we could use a simple hueristic...
    if( configOption < docSet.size()/numDocs() )

Another case that comes to mind is if the matching query is a MatchAllDocsQuery, in which case the filter should probably be used directly.

: Second, embedding the filter itself. This is much more nettlesome within : SolrIndexSearcher than within one of the request handlers. One problem is the

really?  why should it be?

Sorry, that sentence was the product of thinking-while-responding, which is always a recipe for being wrong <g>. I had a particular query structure in mind, one that had the matching clauses embedded in the inner "core" of the query with several layers of score modification queries wrapped on top of this (e.g. dismax's various boost queries; yonik's multiplicative boost queries). I was imagining that it was necessary to embed the filter clauses in the "core" to produce an effective implementation. By the time I finished my response, I had read enough of the relevant Lucene scorer code (in particular, ReqOptScorer) to realize that the benefits would be had using an outer-layer ConjunctionQuery as well.

anything the request handler can do to much with the Query object
SolrIndexSearcher can do as well .. and by the time
getDocListNC/getDocListAndSetNC are called the "pure negative" issues are
alrady resolved.

The only difference is that in those methods we already have a DocSet
(instead of a Query) but it should be easy to wrap a DocSet in a Query to
add to the main query.

: ISTM then that the main challenge is in determining when the filter
: intersection should be embedded. Also, the ability to control filter caching
: is still difficult with this implementation, but perhaps that's less
: important.

yeah ... it seems like there are two orthoginal use cases...
  1) "here is an 'fq', i know it's not worth caching" ... in which we
     don't put it in the filterCache.
2) "here is an 'fq'" ... in which we get the DocSet and add it to the
     main query if it's small.

for any given input, 1 and 2 might both apply, or just 1, or just 2

True. I'm tempted to implement a <!nocache> directive via embedding (without advertising the fact), and work on the fq optimization separately.

Thanks,
-Mike

Reply via email to