Re: Getting the distribution information of scores from query

2012-09-27 Thread Amit Nithian
Thanks! That did the trick! Although it required some more work in the
component level of generating the same query key as the index searcher
else when you go to try and fetch scores for a cached query result, I
got a lot of NPE since the stats are computed in the collector level
which for me isn't set since the cache hit bypasses the lucene level.
I'll write up what I did and probably try and open source the work for
others to see. The stuff with PostFiltering is nice but needs some
examples and documentation.. hopefully mine will help the cause.

Thanks again
Amit

On Wed, Sep 26, 2012 at 5:13 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 I suggest to create a component, put it after QueryComponent. in prepare it
 should add own PostFilter into list of request filters, your post filter
 will be able to inject own DelegatingCollector, then you can just add
 collected histogram into result named list
  http://searchhub.org/dev/2012/02/10/advanced-filter-caching-in-solr/

 On Tue, Sep 25, 2012 at 10:03 PM, Amit Nithian anith...@gmail.com wrote:

 We have a federated search product that issues multiple parallel
 queries to solr cores and fetches the results and blends them. The
 approach we were investigating was taking the scores, normalizing them
 based on some distribution (normal distribution seems reasonable) and
 use that z score as the way to blend the results (else you'll be
 blending scores on different scales). To accomplish this, I was
 looking to get the distribution of the scores for the query as an
 analog to the stats component but seem to see the only way to
 accomplish this would be to create a custom collector that would
 accumulate and store this information (mean, std-dev etc) since the
 stats component only operates on indexed fields.

 Is there an easy way to tell Solr to use a custom collector without
 having to modify the SolrIndexSearcher class? Maybe is there an
 alternative way to get this information?

 Thanks
 Amit




 --
 Sincerely yours
 Mikhail Khludnev
 Tech Lead
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com


Re: Getting the distribution information of scores from query

2012-09-26 Thread Mikhail Khludnev
I suggest to create a component, put it after QueryComponent. in prepare it
should add own PostFilter into list of request filters, your post filter
will be able to inject own DelegatingCollector, then you can just add
collected histogram into result named list
 http://searchhub.org/dev/2012/02/10/advanced-filter-caching-in-solr/

On Tue, Sep 25, 2012 at 10:03 PM, Amit Nithian anith...@gmail.com wrote:

 We have a federated search product that issues multiple parallel
 queries to solr cores and fetches the results and blends them. The
 approach we were investigating was taking the scores, normalizing them
 based on some distribution (normal distribution seems reasonable) and
 use that z score as the way to blend the results (else you'll be
 blending scores on different scales). To accomplish this, I was
 looking to get the distribution of the scores for the query as an
 analog to the stats component but seem to see the only way to
 accomplish this would be to create a custom collector that would
 accumulate and store this information (mean, std-dev etc) since the
 stats component only operates on indexed fields.

 Is there an easy way to tell Solr to use a custom collector without
 having to modify the SolrIndexSearcher class? Maybe is there an
 alternative way to get this information?

 Thanks
 Amit




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com