RE: score calculation

Burgmans, Tom Wed, 12 Dec 2012 23:53:46 -0800

I am also busy with getting this clear. Here are my notes so far (by copying 
and writing myself):




    queryWeight = the impact of the query against the field
        implementation: boost(query)*idf*queryNorm


    boost(query) = boost of the field at query-time
        Implication: hits in fields with higher boost get a higher score
        Rationale: a term in field A could be more relevant than the same term 
in field B


    idf = inverse document frequency = measure of how often the term appears 
across the index for this field
        implementation: log(numDocs/(docFreq+1))+1
        Implication: the greater the occurrence of a term in different 
documents, the lower its score
        Rationale: common terms are less important than uncommon ones
    numDocs = the total number of documents in the index, not including those 
that are marked as deleted but have not yet been purged. This is a constant 
(the same value for all documents in the index).
    docFreq = the number of documents in the index which contain the term in 
this field. This is a constant (the same value for all documents in the index 
containing this field)


    queryNorm = normalization factor so that queries can be compared
        implementation: 1/sqrt(sumOfSquaredWeights)
        Implication: doesn't impact the relevancy of this result
        Rationale: queryNorm is not related to the relevance of the document, 
but rather tries to make scores between different queries comparable. This 
value is equal for all results of the query


    fieldWeight = the score of a term matching the field
        implementation: tf*idf*fieldNorm


    tf = term frequency in a field = measure of how often a term appears in the 
field
        implementation: sqrt(freq)
        Implication: the more frequent a term occurs in a field, the greater 
its score
        Rationale: fields which contains more of a term are generally more 
relevant
    freq = termFreq = amount of times the term occurs in the field for this 
document


    fieldNorm = impact of a hit in this field
        implementation: lengthNorm*boost(index)
    lengthNorm = measure of the importance of a term according to the total 
number of terms in the field
        implementation: 1/sqrt(numTerms)
        Implication: a term matched in fields with less terms have a higher 
score
        Rationale: a term in a field with less terms is more important than one 
with more
    numTerms = amount of terms in a field
    boost (index) = boost of the field at index-time
        Implication: hits in fields with higher boost get a higher score
        Rationale: a term in field A could be more relevant than the same term 
in field B


    maxDocs = the number of documents in the index, including those that are 
marked as deleted but have not yet been purged. This is a constant (the same 
value for all documents in the index)
        Implication: (probably) doesn't play a role in the scoring calculation


    coord = number of terms in the query that were found in the document 
(omitted if equal to 1)
        implementation: overlap/maxOverlap
        Implication: of the terms in the query, a document that contains more 
terms will have a higher score
        Rationale: documents that match the most optional terms score highest
    overlap = the number of query terms matched in the document
    maxOverlap = the total number of terms in the query


    FunctionQuery = could be any kind of custom ranking function, which outcome 
is added to, or multiplied with the default rank score.
        Implication: various


Look at the EXPLAIN information to see how the final score is calculated.

Tom


-----Original Message-----
From: Sangeetha [mailto:sangeetha...@gmail.com]
Sent: Thursday 13 December 2012 08:33
To: solr-user@lucene.apache.org
Subject: score calculation


I want to know how score is calculated?

what is fieldweight, fieldNorm, queryWeight and queryNorm. And what is the
formula to get the final score using fieldweight, fieldNorm, queryWeight
,queryNorm, idf and tf.

Can anyone explain or provide some links?

Thanks,
Sangeetha



--
View this message in context: 
http://lucene.472066.n3.nabble.com/score-calculation-tp4026669.html
Sent from the Solr - User mailing list archive at Nabble.com.

This email and any attachments may contain confidential or privileged 
information
and is intended for the addressee only. If you are not the intended recipient, 
please
immediately notify us by email or telephone and delete the original email and 
attachments
without using, disseminating or reproducing its contents to anyone other than 
the intended
recipient. Wolters Kluwer shall not be liable for the incorrect or incomplete 
transmission of
of this email or any attachments, nor for unauthorized use by its employees.

Wolters Kluwer nv has its registered address in Alphen aan den Rijn, The 
Netherlands, and is registered
with the Trade Registry of the Dutch Chamber of Commerce under number 33202517.

RE: score calculation

Reply via email to