I implemented the PostFilter approach described by Joel. Just iterating
over the OpenBitSet, even without the scaling or the HashMap lookup, added
30ms to a query time, which kinda surprised me. There were about 150K hits
out of a total of 500K. Is OpenBitSet the best way to do this?

Thanks,
Peter


On Thu, Dec 19, 2013 at 9:51 AM, Peter Keegan <peterlkee...@gmail.com>wrote:

> In order to size the PriorityQueue, the result window size for the query
> is needed. This has been computed in the SolrIndexSearcher and available
> in: QueryCommand.getSupersetMaxDoc(), but doesn't seem to be available for
> the PostFilter in either the SolrParms or SolrQueryRequest. Is there a way
> to get this precomputed value or do I have to duplicate the logic from
> SolrIndexSearcher?
>
> Thanks,
> Peter
>
>
> On Thu, Dec 12, 2013 at 1:53 PM, Joel Bernstein <joels...@gmail.com>wrote:
>
>> Thanks, I agree this powerful stuff. One of the reasons that I haven't
>> gotten back to pluggable collectors is that I've been using PostFilters
>> instead.
>>
>> When you start doing stuff with scores in postfilters you'll run into the
>> bug in SOLR-5416. This will effect you when you use facets in combination
>> with the QueryResultCache or tag and exclude faceting.
>>
>> The patch in SOLR-5416 resolves this issue. You'll just need your
>> PostFilter to implement ScoreFilter and the SolrIndexSearcher will know
>> how
>> to handle things.
>>
>> The DelegatingCollector.finish() method is so new, these kinds of bugs are
>> still being cleaned out of the system. SOLR-5416 should be in Solr 4.7.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Dec 12, 2013 at 12:54 PM, Peter Keegan <peterlkee...@gmail.com
>> >wrote:
>>
>> > This is pretty cool, and worthy of adding to Solr in Action (v2) and the
>> > other books. With function queries, flexible filter processing and
>> caching,
>> > custom collectors, and post filters, there's a lot of flexibility here.
>> >
>> > Btw, the query times using a custom collector to scale/recompute scores
>> is
>> > excellent (will have to see how it compares to your outlined solution).
>> >
>> > Thanks,
>> > Peter
>> >
>> >
>> > On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <joels...@gmail.com>
>> > wrote:
>> >
>> > > The sorting is going to happen in the lower level collectors. You
>> need a
>> > > value source that returns the score of the document being collected.
>> > >
>> > > Here is how you can make this happen:
>> > >
>> > > 1) Create an object in your PostFilter that simply holds the current
>> > score.
>> > > Place this object in the SearchRequest context map. Update
>> object.score
>> > as
>> > > you pass the docs and scores to the lower collectors.
>> > >
>> > > 2) Create a values source that checks the SearchRequest context for
>> the
>> > > object that's holding the current score. Use this object to return the
>> > > current score when called. For example if you give the value source a
>> > > handle called "score" a compound function call will look like this:
>> > > sum(score(), field(x))
>> > >
>> > > Joel
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <peterlkee...@gmail.com
>> > > >wrote:
>> > >
>> > > > Regarding my original goal, which is to perform a math function
>> using
>> > the
>> > > > scaled score and a field value, and sort on the result, how does
>> this
>> > fit
>> > > > in? Must I implement another custom PostFilter with a higher cost
>> than
>> > > the
>> > > > scale PostFilter?
>> > > >
>> > > > Thanks,
>> > > > Peter
>> > > >
>> > > >
>> > > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <
>> peterlkee...@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > Thanks very much for the guidance. I'd be happy to donate a
>> working
>> > > > > solution.
>> > > > >
>> > > > > Peter
>> > > > >
>> > > > >
>> > > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <
>> joels...@gmail.com
>> > > > >wrote:
>> > > > >
>> > > > >> SOLR-5020 has the commit info, it's mainly changes to
>> > > SolrIndexSearcher
>> > > > I
>> > > > >> believe. They might apply to 4.3.
>> > > > >> I think as long you have the finish method that's all you'll
>> need.
>> > If
>> > > > you
>> > > > >> can get this working it would be excellent if you could donate
>> back
>> > > the
>> > > > >> Scale PostFilter.
>> > > > >>
>> > > > >>
>> > > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <
>> > peterlkee...@gmail.com
>> > > > >> >wrote:
>> > > > >>
>> > > > >> > This is what I was looking for, but the DelegatingCollector
>> > 'finish'
>> > > > >> method
>> > > > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are
>> there
>> > any
>> > > > >> other
>> > > > >> > PostFilter dependencies on 4.5?
>> > > > >> >
>> > > > >> > Thanks,
>> > > > >> > Peter
>> > > > >> >
>> > > > >> >
>> > > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <
>> > joels...@gmail.com
>> > > >
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> > > Here is one approach to use in a postfilter
>> > > > >> > >
>> > > > >> > > 1) In the collect() method call score for each doc. Use the
>> > scores
>> > > > to
>> > > > >> > > create your scaleInfo.
>> > > > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top
>> X
>> > > > >> ScoreDocs.
>> > > > >> > > 3) Don't delegate any documents to lower collectors in the
>> > > collect()
>> > > > >> > > method.
>> > > > >> > > 4) In the finish method create a score mapping (use the hppc
>> > > > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their
>> > > score,
>> > > > >> > using
>> > > > >> > > the priorityQueue created in step 2. Then iterate the bitset
>> > (also
>> > > > >> > created
>> > > > >> > > in step 2) sending down each doc to the lower collectors,
>> > > retrieving
>> > > > >> and
>> > > > >> > > scaling the score from the score map. If the document is not
>> in
>> > > the
>> > > > >> score
>> > > > >> > > map then send down 0.
>> > > > >> > >
>> > > > >> > > You'll have setup a dummy scorer to feed to lower collectors.
>> > The
>> > > > >> > > CollapsingQParserPlugin has an example of how to do this.
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
>> > > > peterlkee...@gmail.com
>> > > > >> > > >wrote:
>> > > > >> > >
>> > > > >> > > > Hi Joel,
>> > > > >> > > >
>> > > > >> > > > I thought about using a PostFilter, but the problem is that
>> > the
>> > > > >> 'scale'
>> > > > >> > > > function must be done after all matching docs have been
>> scored
>> > > but
>> > > > >> > before
>> > > > >> > > > adding them to the PriorityQueue that sorts just the rows
>> to
>> > be
>> > > > >> > returned.
>> > > > >> > > > Doing the 'scale' function wrapped in a 'query' is proving
>> to
>> > be
>> > > > too
>> > > > >> > slow
>> > > > >> > > > when it visits every document in the index.
>> > > > >> > > >
>> > > > >> > > > In the Collector, I can see how to get the field values
>> like
>> > > this:
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
>> > > > >> > > > QParser).getValues()
>> > > > >> > > >
>> > > > >> > > > But, 'getValueSource' needs a QParser, which isn't
>> available.
>> > > > >> > > > And I can't create a QParser without a SolrQueryRequest,
>> which
>> > > > isn't
>> > > > >> > > > available.
>> > > > >> > > >
>> > > > >> > > > Thanks,
>> > > > >> > > > Peter
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
>> > > > joels...@gmail.com
>> > > > >> >
>> > > > >> > > > wrote:
>> > > > >> > > >
>> > > > >> > > > > Peter,
>> > > > >> > > > >
>> > > > >> > > > > It sounds like you could achieve what you want to do in a
>> > > > >> PostFilter
>> > > > >> > > > rather
>> > > > >> > > > > then extending the TopDocsCollector. Is there a reason
>> why a
>> > > > >> > PostFilter
>> > > > >> > > > > won't work for you?
>> > > > >> > > > >
>> > > > >> > > > > Joel
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
>> > > > >> > peterlkee...@gmail.com
>> > > > >> > > > > >wrote:
>> > > > >> > > > >
>> > > > >> > > > > > Quick question:
>> > > > >> > > > > > In the context of a custom collector, how does one get
>> the
>> > > > >> values
>> > > > >> > of
>> > > > >> > > a
>> > > > >> > > > > > field of type 'ExternalFileField'?
>> > > > >> > > > > >
>> > > > >> > > > > > Thanks,
>> > > > >> > > > > > Peter
>> > > > >> > > > > >
>> > > > >> > > > > >
>> > > > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
>> > > > >> > > peterlkee...@gmail.com
>> > > > >> > > > > > >wrote:
>> > > > >> > > > > >
>> > > > >> > > > > > > Hi Joel,
>> > > > >> > > > > > >
>> > > > >> > > > > > > This is related to another thread on function query
>> > > > matching (
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
>> > > > >> > > > > > ).
>> > > > >> > > > > > > The patch in SOLR-4465 will allow me to extend
>> > > > >> TopDocsCollector
>> > > > >> > and
>> > > > >> > > > > > perform
>> > > > >> > > > > > > the 'scale' function on only the documents matching
>> the
>> > > main
>> > > > >> > dismax
>> > > > >> > > > > > query.
>> > > > >> > > > > > > As you mention, it is a slightly intrusive design and
>> > > > requires
>> > > > >> > > that I
>> > > > >> > > > > > > manage my own PriorityQueue (and a local duplicate of
>> > > > >> HitQueue),
>> > > > >> > > but
>> > > > >> > > > > > should
>> > > > >> > > > > > > work. I think a better design would hide the PQ from
>> the
>> > > > >> plugin.
>> > > > >> > > > > > >
>> > > > >> > > > > > > Thanks,
>> > > > >> > > > > > > Peter
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
>> > > > >> > joels...@gmail.com
>> > > > >> > > >
>> > > > >> > > > > > wrote:
>> > > > >> > > > > > >
>> > > > >> > > > > > >> Hi Peter,
>> > > > >> > > > > > >>
>> > > > >> > > > > > >> I've been meaning to revisit configurable ranking
>> > > > collectors,
>> > > > >> > but
>> > > > >> > > I
>> > > > >> > > > > > >> haven't
>> > > > >> > > > > > >> yet had a chance. It's on the shortlist of things
>> I'd
>> > > like
>> > > > to
>> > > > >> > > tackle
>> > > > >> > > > > > >> though.
>> > > > >> > > > > > >>
>> > > > >> > > > > > >>
>> > > > >> > > > > > >>
>> > > > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
>> > > > >> > > > peterlkee...@gmail.com>
>> > > > >> > > > > > >> wrote:
>> > > > >> > > > > > >>
>> > > > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it
>> appears
>> > > > that
>> > > > >> > there
>> > > > >> > > > is
>> > > > >> > > > > a
>> > > > >> > > > > > >> goal
>> > > > >> > > > > > >> > to be able to do custom sorting and ranking in a
>> > > > >> PostFilter.
>> > > > >> > So
>> > > > >> > > > far,
>> > > > >> > > > > > it
>> > > > >> > > > > > >> > looks like only custom aggregation can be
>> implemented
>> > > in
>> > > > >> > > > PostFilter
>> > > > >> > > > > > >> (5045).
>> > > > >> > > > > > >> > Custom sorting/ranking can be done in a pluggable
>> > > > collector
>> > > > >> > > > (4465),
>> > > > >> > > > > > but
>> > > > >> > > > > > >> > this patch is no longer in dev.
>> > > > >> > > > > > >> >
>> > > > >> > > > > > >> > Is there any other dev. being done on adding
>> custom
>> > > > sorting
>> > > > >> > > (after
>> > > > >> > > > > > >> > collection) via a plugin?
>> > > > >> > > > > > >> >
>> > > > >> > > > > > >> > Thanks,
>> > > > >> > > > > > >> > Peter
>> > > > >> > > > > > >> >
>> > > > >> > > > > > >>
>> > > > >> > > > > > >>
>> > > > >> > > > > > >>
>> > > > >> > > > > > >> --
>> > > > >> > > > > > >> Joel Bernstein
>> > > > >> > > > > > >> Search Engineer at Heliosearch
>> > > > >> > > > > > >>
>> > > > >> > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > --
>> > > > >> > > > > Joel Bernstein
>> > > > >> > > > > Search Engineer at Heliosearch
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > --
>> > > > >> > > Joel Bernstein
>> > > > >> > > Search Engineer at Heliosearch
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> Joel Bernstein
>> > > > >> Search Engineer at Heliosearch
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Joel Bernstein
>> > > Search Engineer at Heliosearch
>> > >
>> >
>>
>>
>>
>> --
>> Joel Bernstein
>> Search Engineer at Heliosearch
>>
>
>

Reply via email to