Re: Intuition check

Mike Klaas Thu, 08 Nov 2007 12:16:58 -0800

On 8-Nov-07, at 8:59 AM, Chris Hostetter wrote:

Let's back up a second...
the theory is that while it's frequently handy to cache fq'sindependentof the main query (because they are probably used over and over) insome
cases it may be advantageous to use an FQ directly in the body of hte
main query to get better skipTo behavior. -- the fundemental issue is
orthogonal to wether or not a DocSet for the FQ is cached, thequestion
is how should that FQ be used when computing the final DocList.
So what if instead of letting the client say "this argument is anfq whichshould be used to generate a BitSet and cached as a filter, thisargumentis an fq.nocache which should be added to the main query" weinstead make
SolrIndexSearcher smart enough to say "i've been asked to filter the
main query using some DocSets, the intersection of those DocSets issmall
enough, that instead of filtering the query on it, i'm going to add a
query that only matches docs in it to the main query to improve skipTo
behavior." ... so now clients don't have to know, they just pass in a
bunch of fq params. we still cache a DocSet for each one, butwhen it
comes time to do the search, we get the skipTo benefit anytime the
intersection of all fqs is really small (wether the individual fqs are
small enough individually or not)


I agree that this would be awesome if it can be pulled off.

that should just be a simple change to getDocListNC right?

Let's think about this: To effectively do what you suggest, the queryhandling needs to

1. determine whether a given (set of) filter(s) would be effective ina skipTo context

2. embed the filter in the query as a scorer

I see difficulties with both, but perhaps they are not unsurmountable.

First, how to determine whether the filter-embedding would beeffective? We have at our disposal the size of the filter-intersection, assuming they are cached. The most important criterionhere is probably the relative size difference of the result set withthe filter applied or not, which isn't really available. It can beestimated assuming the filter and query are independent, but thisdefinitely isn't always true. If the filter isn't/shouldn't becached? You have to compute it separately for this (avoiding thatis part of the goal).

Second, embedding the filter itself. This is much more nettlesomewithin SolrIndexSearcher than within one of the request handlers.One problem is the use of BooleanScorer--I suppose we could detectthat by walking the query tree looking for it. Another is theembedding location: if filters are embedded in SIS, then then onlyreasonable option is to wrap everything in another top-levelBooleanScorer with the original and filter query as required clauses(perhaps the filter would be inserted as prohibited if the inversebitset was sufficiently sparse). This means that the next()'s thathappen to occur on the original query will pull in lots of extrascoring that might not be needed: bq's, bf's, pf's, whatever else islayered on the scoring (in my case, there are be 1-2 layers ofmultiplicative boosts as well). It is nice to insert the filtersdirectly into the "matching" part of the query.

Actually, nevermind: ReqOptSumScorer does not pull ahead its optionalscorers until .score() is called, so the effects should be largelythe same.

ISTM then that the main challenge is in determining when the filterintersection should be embedded. Also, the ability to control filtercaching is still difficult with this implementation, but perhapsthat's less important.


Thanks for the feedback,
-Mike

Re: Intuition check

Reply via email to