Re: Configurable collectors for custom ranking

Joel Bernstein Mon, 23 Dec 2013 15:38:53 -0800

Peter,

You actually only need the current score being collected to be in the
request context. So you don't need a map, you just need an object wrapper
around a mutable float.


If you have a page size of X, only the top X scores need to be held onto,
because all the other scores wouldn't have made it into that page anyway so
they might as well be 0. Because the QueryResultCache caches's a larger
window then the page size you should keep enough scores so the cached
docList is correct. But if you're only dealing with 150K of results you
could just keep all the scores in a FloatArrayList and not worry about the
keeping the top X scores in a priority queue.

During the collect hang onto the docIds and scores and build your scaling
info.

During the finish iterate your docIds and scale the scores as you go.

Set your scaled score into the object wrapper that is in the request
context before you collect each document.

When you call collect on the delegate collectors they will call the custom
value source for each document to perform the sort. Your custom value
source will return whatever the float value is in the request context at
that time.

If you're also going to run this postfilter when you're doing a standard
rank by score you'll also need to send down a dummy scorer to the delegate
collectors. Spend some time with the CollapsingQParserPlugin in trunk to
see how the dummy scorer works.

I'll be adding value source collapse criteria to the
CollapsingQParserPlugin this week and it will have a similar interaction
between a PostFilter and value source. So you may want to watch SOLR-5536
to see an example of this.

Joel












Joel Bernstein
Search Engineer at Heliosearch


On Mon, Dec 23, 2013 at 4:03 PM, Peter Keegan <peterlkee...@gmail.com>wrote:

> Hi Joel,
>
> Could you clarify what would be in the key,value Map added to the
> SearchRequest context? It seems that all the docId/score tuples need to be
> there, including the ones not in the 'top N ScoreDocs' PriorityQueue
> (score=0). If so would the Map be something like:
> "scaled_scores",Map<Integer,Float> ?
>
> Also, what is the reason for passing score=0 for documents that aren't in
> the PriorityQueue? Will these docs get filtered out before a normal sort by
> score?
>
> Thanks,
> Peter
>
>
> On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <joels...@gmail.com>
> wrote:
>
> > The sorting is going to happen in the lower level collectors. You need a
> > value source that returns the score of the document being collected.
> >
> > Here is how you can make this happen:
> >
> > 1) Create an object in your PostFilter that simply holds the current
> score.
> > Place this object in the SearchRequest context map. Update object.score
> as
> > you pass the docs and scores to the lower collectors.
> >
> > 2) Create a values source that checks the SearchRequest context for the
> > object that's holding the current score. Use this object to return the
> > current score when called. For example if you give the value source a
> > handle called "score" a compound function call will look like this:
> > sum(score(), field(x))
> >
> > Joel
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <peterlkee...@gmail.com
> > >wrote:
> >
> > > Regarding my original goal, which is to perform a math function using
> the
> > > scaled score and a field value, and sort on the result, how does this
> fit
> > > in? Must I implement another custom PostFilter with a higher cost than
> > the
> > > scale PostFilter?
> > >
> > > Thanks,
> > > Peter
> > >
> > >
> > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <peterlkee...@gmail.com
> > > >wrote:
> > >
> > > > Thanks very much for the guidance. I'd be happy to donate a working
> > > > solution.
> > > >
> > > > Peter
> > > >
> > > >
> > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <joels...@gmail.com
> > > >wrote:
> > > >
> > > >> SOLR-5020 has the commit info, it's mainly changes to
> > SolrIndexSearcher
> > > I
> > > >> believe. They might apply to 4.3.
> > > >> I think as long you have the finish method that's all you'll need.
> If
> > > you
> > > >> can get this working it would be excellent if you could donate back
> > the
> > > >> Scale PostFilter.
> > > >>
> > > >>
> > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <
> peterlkee...@gmail.com
> > > >> >wrote:
> > > >>
> > > >> > This is what I was looking for, but the DelegatingCollector
> 'finish'
> > > >> method
> > > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are there
> any
> > > >> other
> > > >> > PostFilter dependencies on 4.5?
> > > >> >
> > > >> > Thanks,
> > > >> > Peter
> > > >> >
> > > >> >
> > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <
> joels...@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Here is one approach to use in a postfilter
> > > >> > >
> > > >> > > 1) In the collect() method call score for each doc. Use the
> scores
> > > to
> > > >> > > create your scaleInfo.
> > > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
> > > >> ScoreDocs.
> > > >> > > 3) Don't delegate any documents to lower collectors in the
> > collect()
> > > >> > > method.
> > > >> > > 4) In the finish method create a score mapping (use the hppc
> > > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their
> > score,
> > > >> > using
> > > >> > > the priorityQueue created in step 2. Then iterate the bitset
> (also
> > > >> > created
> > > >> > > in step 2) sending down each doc to the lower collectors,
> > retrieving
> > > >> and
> > > >> > > scaling the score from the score map. If the document is not in
> > the
> > > >> score
> > > >> > > map then send down 0.
> > > >> > >
> > > >> > > You'll have setup a dummy scorer to feed to lower collectors.
> The
> > > >> > > CollapsingQParserPlugin has an example of how to do this.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
> > > peterlkee...@gmail.com
> > > >> > > >wrote:
> > > >> > >
> > > >> > > > Hi Joel,
> > > >> > > >
> > > >> > > > I thought about using a PostFilter, but the problem is that
> the
> > > >> 'scale'
> > > >> > > > function must be done after all matching docs have been scored
> > but
> > > >> > before
> > > >> > > > adding them to the PriorityQueue that sorts just the rows to
> be
> > > >> > returned.
> > > >> > > > Doing the 'scale' function wrapped in a 'query' is proving to
> be
> > > too
> > > >> > slow
> > > >> > > > when it visits every document in the index.
> > > >> > > >
> > > >> > > > In the Collector, I can see how to get the field values like
> > this:
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> > > >> > > > QParser).getValues()
> > > >> > > >
> > > >> > > > But, 'getValueSource' needs a QParser, which isn't available.
> > > >> > > > And I can't create a QParser without a SolrQueryRequest, which
> > > isn't
> > > >> > > > available.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Peter
> > > >> > > >
> > > >> > > >
> > > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
> > > joels...@gmail.com
> > > >> >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Peter,
> > > >> > > > >
> > > >> > > > > It sounds like you could achieve what you want to do in a
> > > >> PostFilter
> > > >> > > > rather
> > > >> > > > > then extending the TopDocsCollector. Is there a reason why a
> > > >> > PostFilter
> > > >> > > > > won't work for you?
> > > >> > > > >
> > > >> > > > > Joel
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
> > > >> > peterlkee...@gmail.com
> > > >> > > > > >wrote:
> > > >> > > > >
> > > >> > > > > > Quick question:
> > > >> > > > > > In the context of a custom collector, how does one get the
> > > >> values
> > > >> > of
> > > >> > > a
> > > >> > > > > > field of type 'ExternalFileField'?
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Peter
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> > > >> > > peterlkee...@gmail.com
> > > >> > > > > > >wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Joel,
> > > >> > > > > > >
> > > >> > > > > > > This is related to another thread on function query
> > > matching (
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > > >> > > > > > ).
> > > >> > > > > > > The patch in SOLR-4465 will allow me to extend
> > > >> TopDocsCollector
> > > >> > and
> > > >> > > > > > perform
> > > >> > > > > > > the 'scale' function on only the documents matching the
> > main
> > > >> > dismax
> > > >> > > > > > query.
> > > >> > > > > > > As you mention, it is a slightly intrusive design and
> > > requires
> > > >> > > that I
> > > >> > > > > > > manage my own PriorityQueue (and a local duplicate of
> > > >> HitQueue),
> > > >> > > but
> > > >> > > > > > should
> > > >> > > > > > > work. I think a better design would hide the PQ from the
> > > >> plugin.
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > > Peter
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
> > > >> > joels...@gmail.com
> > > >> > > >
> > > >> > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > >> Hi Peter,
> > > >> > > > > > >>
> > > >> > > > > > >> I've been meaning to revisit configurable ranking
> > > collectors,
> > > >> > but
> > > >> > > I
> > > >> > > > > > >> haven't
> > > >> > > > > > >> yet had a chance. It's on the shortlist of things I'd
> > like
> > > to
> > > >> > > tackle
> > > >> > > > > > >> though.
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> > > >> > > > peterlkee...@gmail.com>
> > > >> > > > > > >> wrote:
> > > >> > > > > > >>
> > > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it appears
> > > that
> > > >> > there
> > > >> > > > is
> > > >> > > > > a
> > > >> > > > > > >> goal
> > > >> > > > > > >> > to be able to do custom sorting and ranking in a
> > > >> PostFilter.
> > > >> > So
> > > >> > > > far,
> > > >> > > > > > it
> > > >> > > > > > >> > looks like only custom aggregation can be implemented
> > in
> > > >> > > > PostFilter
> > > >> > > > > > >> (5045).
> > > >> > > > > > >> > Custom sorting/ranking can be done in a pluggable
> > > collector
> > > >> > > > (4465),
> > > >> > > > > > but
> > > >> > > > > > >> > this patch is no longer in dev.
> > > >> > > > > > >> >
> > > >> > > > > > >> > Is there any other dev. being done on adding custom
> > > sorting
> > > >> > > (after
> > > >> > > > > > >> > collection) via a plugin?
> > > >> > > > > > >> >
> > > >> > > > > > >> > Thanks,
> > > >> > > > > > >> > Peter
> > > >> > > > > > >> >
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >>
> > > >> > > > > > >> --
> > > >> > > > > > >> Joel Bernstein
> > > >> > > > > > >> Search Engineer at Heliosearch
> > > >> > > > > > >>
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Joel Bernstein
> > > >> > > > > Search Engineer at Heliosearch
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Joel Bernstein
> > > >> > > Search Engineer at Heliosearch
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Joel Bernstein
> > > >> Search Engineer at Heliosearch
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
>

Re: Configurable collectors for custom ranking

Reply via email to