Re: How to pull document scoring values

2004-09-29 Thread Zia Syed
Hi Paul,
Thanks for your detailed reply! It really helped alot.
However, I am experiancing some conflicts.

For one of the documents in result set, when i use 

IndexReader fir=FilterIndexReader.open(index);
byte[] fNorm=fir.norm(Body);
System.out.println(FNorm: + fNorm[306]);
Document d=fir.document(306);
Field f=d.getField(Body);

System.out.println(Body: + f.stringValue());

This gives me out fNorm 113, whereas total number of term (including
stop-words) are 42 in this particular field of selected document. In the
explanation , fieldNorm (field=Body, doc=306) is 0.1562, which is approx
41 term words for that field in that documents. So explanation values
makes sense with real data, while including all stop words like to,it,
the  etc. 

So, my question is, 
 Am i getting the norm values from right place?
 Is there any way to find out number of indexed terms for each
document?

Please advise!

Thanks,

Zia



On Wed, 2004-09-29 at 08:17, Paul Elschot wrote:
 Zia,
 
 On Tuesday 28 September 2004 21:22, you wrote:
  Hi,
 
  I'm trying to learn the Scoring mechanism of Lucene. I want to fetch
  each parameter value individually as they are collectively dumped out by
  Explanation. I've managed to pull out TF and IDF values using
  DefaultSimilarity and FilterIndexReader, but not sure from where to get
  the fieldNorm and queryNorm from.
 
 The norms are here:
 http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#norms(java.lang.String)
 The resulting array is indexed by the document number for the IndexReader.
 With the default similarity, each norm is the inverse square root of the number of 
 indexed terms in the 
 document field. However, there are only 8 bits available to encode this value, so 
 it's quite rough.
 
 The default queryNorm is here:
 http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/DefaultSimilarity.html#queryNorm(float)
 There is an explanation of the scoring in the javadocs of Similarity.
 There has been some discussion on an idf factor that was missing from this 
 documentation, 
 I don't know whether the docs have been adapted for this.
 
  Also is there any reference about how normalisation has been
  implemented?
 
 See above, DefaultSimilarity is the default implementation of the Similarity 
 interface.
 queryNorm() takes a sumOfSquaredWeights, where the weights are the term weights
 from the query. It returns the square root.
 
 It may be that the sum of squared weights should be a sum of square rooted weights
 and that queryNorm should return a square then.
 I posted this on lucene-user on 20 September:
 http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=10023
 
 It's only a normalisation, so it doesn't affect the order of the search results much.
 Taking the square roots of the  query term weights would have
 the query weights directly apllied to the the query term density in the document 
 field,
 whereas now the weights seem to be applied to the square root of the density.
 The density value is an approximation, see above for the rough field norms.
 
 Regards,
 Paul Elschot
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
-- 
Zia Syed [EMAIL PROTECTED]
Smartweb Research Center, Robert Gordon University


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



How to pull document scoring values

2004-09-28 Thread Zia Syed
Hi,

I'm trying to learn the Scoring mechanism of Lucene. I want to fetch
each parameter value individually as they are collectively dumped out by
Explanation. I've managed to pull out TF and IDF values using
DefaultSimilarity and FilterIndexReader, but not sure from where to get
the fieldNorm and queryNorm from. 
Also is there any reference about how normalisation has been
implemented? 

Any idea?

Thanks,
Zia
-- 
Zia Syed [EMAIL PROTECTED]
Smartweb Research Center, Robert Gordon University


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]