Hello all,

I am trying to understand the output of Solr explain for a one word query.
I am querying on the "ocr" field with no stemming/synonyms or stopwords.
And no query or index time boosting.

The query is "ocr:the"

The document (result below)  which contains two words "The Aeroplane" gets
more hits than documents with 50 or more occurances of the word "the"
Since the idf is the same I am assuming this is a result of length norms.

The explain (debugQuery) shows the following for fieldnorm:
 0.625 = fieldNorm(field=ocr, doc=16624)
What does the "doc=16624" mean?  It certainly can not represent either the
length of the field (as an integer) since there are only two terms in the
field.
It can't represent the number of docs with the query term (the idf output
shows the word "the" occurs in 16,219 docs.

I have appended below the explain scoring for a couple of documents with tf
50 and 67.


<float name="score">0.6798219</float>
    <str name="ID">DF9199B7049F8DFE-220</str>
    <str name="doc_ID">DF9199B7049F8DFE</str>
    <str name="ocr">The Aeroplane
</str>
<str name="DF9199B7049F8DFE-220">
0.6798219 = (MATCH) fieldWeight(ocr:the in 16624), product of:
  1.0 = tf(termFreq(ocr:the)=1)
  1.087715 = idf(docFreq=16219, maxDocs=17707)
  0.625 = fieldNorm(field=ocr, doc=16624)
</str>

Tom Burton-West

-----

    <str name="78562575E066497D-518">
0.42061833 = (MATCH) fieldWeight(ocr:the in 8396), product of:
  7.071068 = tf(termFreq(ocr:the)=50)
  1.087715 = idf(docFreq=16219, maxDocs=17707)
  0.0546875 = fieldNorm(field=ocr, doc=8396)
</str>



 <str name="18881D8AE8B1576E-120">

0.41734362 = (MATCH) fieldWeight(ocr:the in 2782), product of:
  8.185352 = tf(termFreq(ocr:the)=67)
  1.087715 = idf(docFreq=16219, maxDocs=17707)
  0.046875 = fieldNorm(field=ocr, doc=2782)
</str>

Reply via email to