Re: How to grab matching stats in Similarity class

Varun Thacker Mon, 11 Aug 2014 02:35:31 -0700

On Thu, Aug 7, 2014 at 6:10 AM, Hafiz Mian M Hamid <
mianhami...@yahoo.com.invalid> wrote:


> We're using solr 4.2.1 and use an extension of Lucene's DefaultSimilarity
> as our similarity class. I am trying to figure out how we could get hold of
> the matching stats (i.e. how many/which terms in the query matched on
> different fields in the retrieved document set) in our similarity class
> since we want to add some custom boost to our scoring function. The scoring
> logic needs to know the number of terms matched on each field in the query
> to determine the boost value.
>
> The score is calculated on a per field basis. Hence the similarity will
never know how many fields the term match against.

In Solr if you are using eDismax (
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
)
then the same term is searched across all the fields individually and then
the best score from the highest scoring field is taken. This could solve
your custom logic part in a crude way.


> Basically we want our similarity class to be aware of the global matching
> stats even for scoring a single term in it's TFIDFDocScorer.score() method.
> I was wondering how we could get hold of that information. It looks like
> the exactSimScorer() and sloppySimScorer() methods get an instance of
> AtomicReaderContext as second parameter but it doesn't look like we could
> retrieve matching stats from this object. Is there any other way we could
> make the similarity class aware of the global matching stats?
>
> I'd highly appreciate any help.
>
> Thanks,
> Hamid




-- 


Regards,
Varun Thacker
http://www.vthacker.in/

Re: How to grab matching stats in Similarity class

Reply via email to