Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Hi,

I've just changed the stemming algorithm slightly and am running a few  
tests against the old stemmer versus the new stemmer. I did a query  
for 'hanger' and using the old stemmer I get the following scoring for  
a document with the title: Converter Hanger Assembly Replacement


6.4242806 = (MATCH) sum of:
  2.5697122 = (MATCH) max of:
0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.1963516 = queryWeight(markup_t:hanger), product of:
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)
  3.8545685 = (MATCH) max of:
0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.8320503 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
  2.248498 = (MATCH) max of:
0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.19635157 = queryWeight(markup_t:hanger), product of:
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)
  3.372747 = (MATCH) max of:
0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.83205026 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)

The thing that is perplexing is that the fieldNorm for the title_t  
field is different in each of the explanations, ie: the fieldNorm  
using the old stemmer is: 0.5 = fieldNorm(field=title_t, doc=3454).  
For the new stemmer  0.4375 = fieldNorm(field=title_t, doc=3454). I  
ran the title through both stemmers and get the same number of tokens  
produced. I do no index time boosting on the title_t field. I am using  
DefaultSimilarity in both instances. So I figured the calculated  
fieldNorm would be:


field boost * lengthNorm = 1 * 1/sqrt(4) = 0.5

I wouldn't have thought that changing the stemmer would have any  
impact on the fieldNorm in this case. Any insight? Please kick me over  
to the lucene list if you feel this isn't appropriate 

Re: Question about fieldNorm

2008-06-11 Thread Yonik Seeley
That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:
 I've just changed the stemming algorithm slightly and am running a few tests
 against the old stemmer versus the new stemmer. I did a query for 'hanger'
 and using the old stemmer I get the following scoring for a document with
 the title: Converter Hanger Assembly Replacement

 6.4242806 = (MATCH) sum of:
  2.5697122 = (MATCH) max of:
0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.1963516 = queryWeight(markup_t:hanger), product of:
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)
  3.8545685 = (MATCH) max of:
0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
  0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.8320503 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)

 Using the new stemmer I get:

 5.621245 = (MATCH) sum of:
  2.248498 = (MATCH) max of:
0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.19635157 = queryWeight(markup_t:hanger), product of:
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)
  3.372747 = (MATCH) max of:
0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
  0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.83205026 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)

 The thing that is perplexing is that the fieldNorm for the title_t field is
 different in each of the explanations, ie: the fieldNorm using the old
 stemmer is: 0.5 = fieldNorm(field=title_t, doc=3454). For the new stemmer
  0.4375 = fieldNorm(field=title_t, doc=3454). I ran the title through both
 stemmers and get the same number of tokens produced. I do no index time
 boosting on the title_t field. I am using DefaultSimilarity in both
 instances. So I figured the calculated fieldNorm would be:

 field boost * lengthNorm = 1 * 

Re: Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Hi Yonik,

Yes I did rebuild the index and they are the same document (just  
verified). The only thing that changed was the stemmer, but that makes  
no sense to me. Also, if the equation for the fieldNorm is:


fieldBoost * lengthNorm = fieldBoost * 1 /sqrt(numTermsForField)

Then that would mean numTermsForField would be: 5.22 when the norm is  
0.4375. Am I correct about how this is calculated?


Thanks again
Brendan

On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:


That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:
I've just changed the stemming algorithm slightly and am running a  
few tests
against the old stemmer versus the new stemmer. I did a query for  
'hanger'
and using the old stemmer I get the following scoring for a  
document with

the title: Converter Hanger Assembly Replacement

6.4242806 = (MATCH) sum of:
2.5697122 = (MATCH) max of:
  0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.1963516 = queryWeight(markup_t:hanger), product of:
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)
3.8545685 = (MATCH) max of:
  0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.8320503 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
2.248498 = (MATCH) max of:
  0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.19635157 = queryWeight(markup_t:hanger), product of:
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)
3.372747 = (MATCH) max of:
  0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.83205026 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)

The thing that is perplexing is that the fieldNorm for the title_t  
field is
different in each of the explanations, ie: the fieldNorm using the  
old
stemmer is: 0.5 = 

Re: Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Hi Yonik,

I just realized that the stemmer does make a difference because of  
synonyms. So on indexing using the new stemmer converter hanger  
assembly replacement gets expanded to: converter hanger assembly  
assemble replacement so there are 5 terms which gets a length norm of  
0.4472136 instead of 0.5. Still unsure how it gets 0.4375 though as  
the result for the field norm though unless I have a boost of 0.9783  
somewhere there.


Brendan


On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:


That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:
I've just changed the stemming algorithm slightly and am running a  
few tests
against the old stemmer versus the new stemmer. I did a query for  
'hanger'
and using the old stemmer I get the following scoring for a  
document with

the title: Converter Hanger Assembly Replacement

6.4242806 = (MATCH) sum of:
2.5697122 = (MATCH) max of:
  0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.1963516 = queryWeight(markup_t:hanger), product of:
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)
3.8545685 = (MATCH) max of:
  0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.8320503 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
2.248498 = (MATCH) max of:
  0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.19635157 = queryWeight(markup_t:hanger), product of:
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)
3.372747 = (MATCH) max of:
  0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.83205026 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)

The thing that is perplexing is that the fieldNorm for the title_t  
field is
different in each of the explanations, ie: the fieldNorm using the  

Re: Question about fieldNorm

2008-06-11 Thread Yonik Seeley
Field norms have limited precision (it's encoded as an 8 bit float) so
you are probably seeing rounding.

-Yonik

On Wed, Jun 11, 2008 at 2:13 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:
 Hi Yonik,

 I just realized that the stemmer does make a difference because of synonyms.
 So on indexing using the new stemmer converter hanger assembly replacement
 gets expanded to: converter hanger assembly assemble replacement so there
 are 5 terms which gets a length norm of 0.4472136 instead of 0.5. Still
 unsure how it gets 0.4375 though as the result for the field norm though
 unless I have a boost of 0.9783 somewhere there.

 Brendan


 On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:

 That is strange... did you re-index or change the index?  If so, you
 might want to verify that docid=3454 still corresponds to the same
 document you queried earlier.

 -Yonik


 On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
 [EMAIL PROTECTED] wrote:

 I've just changed the stemming algorithm slightly and am running a few
 tests
 against the old stemmer versus the new stemmer. I did a query for
 'hanger'
 and using the old stemmer I get the following scoring for a document with
 the title: Converter Hanger Assembly Replacement

 6.4242806 = (MATCH) sum of:
 2.5697122 = (MATCH) max of:
  0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.1963516 = queryWeight(markup_t:hanger), product of:
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)
 3.8545685 = (MATCH) max of:
  0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.8320503 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)

 Using the new stemmer I get:

 5.621245 = (MATCH) sum of:
 2.248498 = (MATCH) max of:
  0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.19635157 = queryWeight(markup_t:hanger), product of:
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)
 3.372747 = (MATCH) max of:
  0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.83205026 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = 

Re: Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Thanks so much, that explains it.

Brendan

On Jun 11, 2008, at 4:00 PM, Yonik Seeley wrote:


Field norms have limited precision (it's encoded as an 8 bit float) so
you are probably seeing rounding.

-Yonik

On Wed, Jun 11, 2008 at 2:13 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:

Hi Yonik,

I just realized that the stemmer does make a difference because of  
synonyms.
So on indexing using the new stemmer converter hanger assembly  
replacement
gets expanded to: converter hanger assembly assemble replacement  
so there
are 5 terms which gets a length norm of 0.4472136 instead of 0.5.  
Still
unsure how it gets 0.4375 though as the result for the field norm  
though

unless I have a boost of 0.9783 somewhere there.

Brendan


On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:


That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:


I've just changed the stemming algorithm slightly and am running  
a few

tests
against the old stemmer versus the new stemmer. I did a query for
'hanger'
and using the old stemmer I get the following scoring for a  
document with

the title: Converter Hanger Assembly Replacement

6.4242806 = (MATCH) sum of:
2.5697122 = (MATCH) max of:
0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.1963516 = queryWeight(markup_t:hanger), product of:
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)
3.8545685 = (MATCH) max of:
0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.8320503 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
2.248498 = (MATCH) max of:
0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.19635157 = queryWeight(markup_t:hanger), product of:
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)
3.372747 = (MATCH) max of:
0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.83205026 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)

The