On 25/05/2012 20:13, Tom Burton-West wrote:
Hello all,

I am trying to understand the output of Solr explain for a one word query.
I am querying on the "ocr" field with no stemming/synonyms or stopwords.
And no query or index time boosting.

The query is "ocr:the"

The document (result below)  which contains two words "The Aeroplane" gets
more hits than documents with 50 or more occurances of the word "the"
Since the idf is the same I am assuming this is a result of length norms.

The explain (debugQuery) shows the following for fieldnorm:
  0.625 = fieldNorm(field=ocr, doc=16624)
What does the "doc=16624" mean?  It certainly can not represent either the
length of the field (as an integer) since there are only two terms in the
field.
It can't represent the number of docs with the query term (the idf output
shows the word "the" occurs in 16,219 docs.

Hi Tom,

This is an internal document number within a Lucene index. This number is useless from the level of Solr APIs because you can't use it to actually do anything. At the Lucene level (e.g. in Luke) you could navigate to this number and for example retrieve stored fields of this document.

As it's shown in the Explanation-s, it can be only used to co-ordinate parts of the query that matched the same document number.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to