I implemented the PostFilter approach described by Joel. Just iterating over the OpenBitSet, even without the scaling or the HashMap lookup, added 30ms to a query time, which kinda surprised me. There were about 150K hits out of a total of 500K. Is OpenBitSet the best way to do this?
Thanks, Peter On Thu, Dec 19, 2013 at 9:51 AM, Peter Keegan <peterlkee...@gmail.com>wrote: > In order to size the PriorityQueue, the result window size for the query > is needed. This has been computed in the SolrIndexSearcher and available > in: QueryCommand.getSupersetMaxDoc(), but doesn't seem to be available for > the PostFilter in either the SolrParms or SolrQueryRequest. Is there a way > to get this precomputed value or do I have to duplicate the logic from > SolrIndexSearcher? > > Thanks, > Peter > > > On Thu, Dec 12, 2013 at 1:53 PM, Joel Bernstein <joels...@gmail.com>wrote: > >> Thanks, I agree this powerful stuff. One of the reasons that I haven't >> gotten back to pluggable collectors is that I've been using PostFilters >> instead. >> >> When you start doing stuff with scores in postfilters you'll run into the >> bug in SOLR-5416. This will effect you when you use facets in combination >> with the QueryResultCache or tag and exclude faceting. >> >> The patch in SOLR-5416 resolves this issue. You'll just need your >> PostFilter to implement ScoreFilter and the SolrIndexSearcher will know >> how >> to handle things. >> >> The DelegatingCollector.finish() method is so new, these kinds of bugs are >> still being cleaned out of the system. SOLR-5416 should be in Solr 4.7. >> >> >> >> >> >> >> >> >> >> On Thu, Dec 12, 2013 at 12:54 PM, Peter Keegan <peterlkee...@gmail.com >> >wrote: >> >> > This is pretty cool, and worthy of adding to Solr in Action (v2) and the >> > other books. With function queries, flexible filter processing and >> caching, >> > custom collectors, and post filters, there's a lot of flexibility here. >> > >> > Btw, the query times using a custom collector to scale/recompute scores >> is >> > excellent (will have to see how it compares to your outlined solution). >> > >> > Thanks, >> > Peter >> > >> > >> > On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <joels...@gmail.com> >> > wrote: >> > >> > > The sorting is going to happen in the lower level collectors. You >> need a >> > > value source that returns the score of the document being collected. >> > > >> > > Here is how you can make this happen: >> > > >> > > 1) Create an object in your PostFilter that simply holds the current >> > score. >> > > Place this object in the SearchRequest context map. Update >> object.score >> > as >> > > you pass the docs and scores to the lower collectors. >> > > >> > > 2) Create a values source that checks the SearchRequest context for >> the >> > > object that's holding the current score. Use this object to return the >> > > current score when called. For example if you give the value source a >> > > handle called "score" a compound function call will look like this: >> > > sum(score(), field(x)) >> > > >> > > Joel >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <peterlkee...@gmail.com >> > > >wrote: >> > > >> > > > Regarding my original goal, which is to perform a math function >> using >> > the >> > > > scaled score and a field value, and sort on the result, how does >> this >> > fit >> > > > in? Must I implement another custom PostFilter with a higher cost >> than >> > > the >> > > > scale PostFilter? >> > > > >> > > > Thanks, >> > > > Peter >> > > > >> > > > >> > > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan < >> peterlkee...@gmail.com >> > > > >wrote: >> > > > >> > > > > Thanks very much for the guidance. I'd be happy to donate a >> working >> > > > > solution. >> > > > > >> > > > > Peter >> > > > > >> > > > > >> > > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein < >> joels...@gmail.com >> > > > >wrote: >> > > > > >> > > > >> SOLR-5020 has the commit info, it's mainly changes to >> > > SolrIndexSearcher >> > > > I >> > > > >> believe. They might apply to 4.3. >> > > > >> I think as long you have the finish method that's all you'll >> need. >> > If >> > > > you >> > > > >> can get this working it would be excellent if you could donate >> back >> > > the >> > > > >> Scale PostFilter. >> > > > >> >> > > > >> >> > > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan < >> > peterlkee...@gmail.com >> > > > >> >wrote: >> > > > >> >> > > > >> > This is what I was looking for, but the DelegatingCollector >> > 'finish' >> > > > >> method >> > > > >> > doesn't exist in 4.3.0 :( Can this be patched in and are >> there >> > any >> > > > >> other >> > > > >> > PostFilter dependencies on 4.5? >> > > > >> > >> > > > >> > Thanks, >> > > > >> > Peter >> > > > >> > >> > > > >> > >> > > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein < >> > joels...@gmail.com >> > > > >> > > > >> > wrote: >> > > > >> > >> > > > >> > > Here is one approach to use in a postfilter >> > > > >> > > >> > > > >> > > 1) In the collect() method call score for each doc. Use the >> > scores >> > > > to >> > > > >> > > create your scaleInfo. >> > > > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top >> X >> > > > >> ScoreDocs. >> > > > >> > > 3) Don't delegate any documents to lower collectors in the >> > > collect() >> > > > >> > > method. >> > > > >> > > 4) In the finish method create a score mapping (use the hppc >> > > > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their >> > > score, >> > > > >> > using >> > > > >> > > the priorityQueue created in step 2. Then iterate the bitset >> > (also >> > > > >> > created >> > > > >> > > in step 2) sending down each doc to the lower collectors, >> > > retrieving >> > > > >> and >> > > > >> > > scaling the score from the score map. If the document is not >> in >> > > the >> > > > >> score >> > > > >> > > map then send down 0. >> > > > >> > > >> > > > >> > > You'll have setup a dummy scorer to feed to lower collectors. >> > The >> > > > >> > > CollapsingQParserPlugin has an example of how to do this. >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan < >> > > > peterlkee...@gmail.com >> > > > >> > > >wrote: >> > > > >> > > >> > > > >> > > > Hi Joel, >> > > > >> > > > >> > > > >> > > > I thought about using a PostFilter, but the problem is that >> > the >> > > > >> 'scale' >> > > > >> > > > function must be done after all matching docs have been >> scored >> > > but >> > > > >> > before >> > > > >> > > > adding them to the PriorityQueue that sorts just the rows >> to >> > be >> > > > >> > returned. >> > > > >> > > > Doing the 'scale' function wrapped in a 'query' is proving >> to >> > be >> > > > too >> > > > >> > slow >> > > > >> > > > when it visits every document in the index. >> > > > >> > > > >> > > > >> > > > In the Collector, I can see how to get the field values >> like >> > > this: >> > > > >> > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField, >> > > > >> > > > QParser).getValues() >> > > > >> > > > >> > > > >> > > > But, 'getValueSource' needs a QParser, which isn't >> available. >> > > > >> > > > And I can't create a QParser without a SolrQueryRequest, >> which >> > > > isn't >> > > > >> > > > available. >> > > > >> > > > >> > > > >> > > > Thanks, >> > > > >> > > > Peter >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein < >> > > > joels...@gmail.com >> > > > >> > >> > > > >> > > > wrote: >> > > > >> > > > >> > > > >> > > > > Peter, >> > > > >> > > > > >> > > > >> > > > > It sounds like you could achieve what you want to do in a >> > > > >> PostFilter >> > > > >> > > > rather >> > > > >> > > > > then extending the TopDocsCollector. Is there a reason >> why a >> > > > >> > PostFilter >> > > > >> > > > > won't work for you? >> > > > >> > > > > >> > > > >> > > > > Joel >> > > > >> > > > > >> > > > >> > > > > >> > > > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan < >> > > > >> > peterlkee...@gmail.com >> > > > >> > > > > >wrote: >> > > > >> > > > > >> > > > >> > > > > > Quick question: >> > > > >> > > > > > In the context of a custom collector, how does one get >> the >> > > > >> values >> > > > >> > of >> > > > >> > > a >> > > > >> > > > > > field of type 'ExternalFileField'? >> > > > >> > > > > > >> > > > >> > > > > > Thanks, >> > > > >> > > > > > Peter >> > > > >> > > > > > >> > > > >> > > > > > >> > > > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan < >> > > > >> > > peterlkee...@gmail.com >> > > > >> > > > > > >wrote: >> > > > >> > > > > > >> > > > >> > > > > > > Hi Joel, >> > > > >> > > > > > > >> > > > >> > > > > > > This is related to another thread on function query >> > > > matching ( >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> > > >> > >> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513 >> > > > >> > > > > > ). >> > > > >> > > > > > > The patch in SOLR-4465 will allow me to extend >> > > > >> TopDocsCollector >> > > > >> > and >> > > > >> > > > > > perform >> > > > >> > > > > > > the 'scale' function on only the documents matching >> the >> > > main >> > > > >> > dismax >> > > > >> > > > > > query. >> > > > >> > > > > > > As you mention, it is a slightly intrusive design and >> > > > requires >> > > > >> > > that I >> > > > >> > > > > > > manage my own PriorityQueue (and a local duplicate of >> > > > >> HitQueue), >> > > > >> > > but >> > > > >> > > > > > should >> > > > >> > > > > > > work. I think a better design would hide the PQ from >> the >> > > > >> plugin. >> > > > >> > > > > > > >> > > > >> > > > > > > Thanks, >> > > > >> > > > > > > Peter >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein < >> > > > >> > joels...@gmail.com >> > > > >> > > > >> > > > >> > > > > > wrote: >> > > > >> > > > > > > >> > > > >> > > > > > >> Hi Peter, >> > > > >> > > > > > >> >> > > > >> > > > > > >> I've been meaning to revisit configurable ranking >> > > > collectors, >> > > > >> > but >> > > > >> > > I >> > > > >> > > > > > >> haven't >> > > > >> > > > > > >> yet had a chance. It's on the shortlist of things >> I'd >> > > like >> > > > to >> > > > >> > > tackle >> > > > >> > > > > > >> though. >> > > > >> > > > > > >> >> > > > >> > > > > > >> >> > > > >> > > > > > >> >> > > > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan < >> > > > >> > > > peterlkee...@gmail.com> >> > > > >> > > > > > >> wrote: >> > > > >> > > > > > >> >> > > > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it >> appears >> > > > that >> > > > >> > there >> > > > >> > > > is >> > > > >> > > > > a >> > > > >> > > > > > >> goal >> > > > >> > > > > > >> > to be able to do custom sorting and ranking in a >> > > > >> PostFilter. >> > > > >> > So >> > > > >> > > > far, >> > > > >> > > > > > it >> > > > >> > > > > > >> > looks like only custom aggregation can be >> implemented >> > > in >> > > > >> > > > PostFilter >> > > > >> > > > > > >> (5045). >> > > > >> > > > > > >> > Custom sorting/ranking can be done in a pluggable >> > > > collector >> > > > >> > > > (4465), >> > > > >> > > > > > but >> > > > >> > > > > > >> > this patch is no longer in dev. >> > > > >> > > > > > >> > >> > > > >> > > > > > >> > Is there any other dev. being done on adding >> custom >> > > > sorting >> > > > >> > > (after >> > > > >> > > > > > >> > collection) via a plugin? >> > > > >> > > > > > >> > >> > > > >> > > > > > >> > Thanks, >> > > > >> > > > > > >> > Peter >> > > > >> > > > > > >> > >> > > > >> > > > > > >> >> > > > >> > > > > > >> >> > > > >> > > > > > >> >> > > > >> > > > > > >> -- >> > > > >> > > > > > >> Joel Bernstein >> > > > >> > > > > > >> Search Engineer at Heliosearch >> > > > >> > > > > > >> >> > > > >> > > > > > > >> > > > >> > > > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > >> > > > > >> > > > >> > > > > >> > > > >> > > > > -- >> > > > >> > > > > Joel Bernstein >> > > > >> > > > > Search Engineer at Heliosearch >> > > > >> > > > > >> > > > >> > > > >> > > > >> > > >> > > > >> > > >> > > > >> > > >> > > > >> > > -- >> > > > >> > > Joel Bernstein >> > > > >> > > Search Engineer at Heliosearch >> > > > >> > > >> > > > >> > >> > > > >> >> > > > >> >> > > > >> >> > > > >> -- >> > > > >> Joel Bernstein >> > > > >> Search Engineer at Heliosearch >> > > > >> >> > > > > >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > Joel Bernstein >> > > Search Engineer at Heliosearch >> > > >> > >> >> >> >> -- >> Joel Bernstein >> Search Engineer at Heliosearch >> > >