Field norms have limited precision (it's encoded as an 8 bit float) so you are probably seeing rounding.
-Yonik On Wed, Jun 11, 2008 at 2:13 PM, Brendan Grainger <[EMAIL PROTECTED]> wrote: > Hi Yonik, > > I just realized that the stemmer does make a difference because of synonyms. > So on indexing using the new stemmer "converter hanger assembly replacement" > gets expanded to: "converter hanger assembly assemble replacement" so there > are 5 terms which gets a length norm of 0.4472136 instead of 0.5. Still > unsure how it gets 0.4375 though as the result for the field norm though > unless I have a boost of 0.9783 somewhere there. > > Brendan > > > On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote: > >> That is strange... did you re-index or change the index? If so, you >> might want to verify that docid=3454 still corresponds to the same >> document you queried earlier. >> >> -Yonik >> >> >> On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger >> <[EMAIL PROTECTED]> wrote: >>> >>> I've just changed the stemming algorithm slightly and am running a few >>> tests >>> against the old stemmer versus the new stemmer. I did a query for >>> 'hanger' >>> and using the old stemmer I get the following scoring for a document with >>> the title: Converter Hanger Assembly Replacement >>> >>> 6.4242806 = (MATCH) sum of: >>> 2.5697122 = (MATCH) max of: >>> 0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of: >>> 0.1963516 = queryWeight(markup_t:hanger), product of: >>> 6.5593724 = idf(docFreq=6375, numDocs=1655591) >>> 0.02993451 = queryNorm >>> 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >>> 1.7320508 = tf(termFreq(markup_t:hanger)=3) >>> 6.5593724 = idf(docFreq=6375, numDocs=1655591) >>> 0.109375 = fieldNorm(field=markup_t, doc=3454) >>> 2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: >>> 0.5547002 = queryWeight(title_t:hanger^2.0), product of: >>> 2.0 = boost >>> 9.265229 = idf(docFreq=425, numDocs=1655591) >>> 0.02993451 = queryNorm >>> 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: >>> 1.0 = tf(termFreq(title_t:hanger)=1) >>> 9.265229 = idf(docFreq=425, numDocs=1655591) >>> 0.5 = fieldNorm(field=title_t, doc=3454) >>> 3.8545685 = (MATCH) max of: >>> 0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: >>> 0.0981758 = queryWeight(markup_t:hanger^0.5), product of: >>> 0.5 = boost >>> 6.5593724 = idf(docFreq=6375, numDocs=1655591) >>> 0.02993451 = queryNorm >>> 1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >>> 1.7320508 = tf(termFreq(markup_t:hanger)=3) >>> 6.5593724 = idf(docFreq=6375, numDocs=1655591) >>> 0.109375 = fieldNorm(field=markup_t, doc=3454) >>> 3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: >>> 0.8320503 = queryWeight(title_t:hanger^3.0), product of: >>> 3.0 = boost >>> 9.265229 = idf(docFreq=425, numDocs=1655591) >>> 0.02993451 = queryNorm >>> 4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: >>> 1.0 = tf(termFreq(title_t:hanger)=1) >>> 9.265229 = idf(docFreq=425, numDocs=1655591) >>> 0.5 = fieldNorm(field=title_t, doc=3454) >>> >>> Using the new stemmer I get: >>> >>> 5.621245 = (MATCH) sum of: >>> 2.248498 = (MATCH) max of: >>> 0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of: >>> 0.19635157 = queryWeight(markup_t:hanger), product of: >>> 6.559371 = idf(docFreq=6375, numDocs=1655589) >>> 0.029934512 = queryNorm >>> 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >>> 1.7320508 = tf(termFreq(markup_t:hanger)=3) >>> 6.559371 = idf(docFreq=6375, numDocs=1655589) >>> 0.109375 = fieldNorm(field=markup_t, doc=3454) >>> 2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of: >>> 0.5547002 = queryWeight(title_t:hanger^2.0), product of: >>> 2.0 = boost >>> 9.265228 = idf(docFreq=425, numDocs=1655589) >>> 0.029934512 = queryNorm >>> 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: >>> 1.0 = tf(termFreq(title_t:hanger)=1) >>> 9.265228 = idf(docFreq=425, numDocs=1655589) >>> 0.4375 = fieldNorm(field=title_t, doc=3454) >>> 3.372747 = (MATCH) max of: >>> 0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of: >>> 0.09817579 = queryWeight(markup_t:hanger^0.5), product of: >>> 0.5 = boost >>> 6.559371 = idf(docFreq=6375, numDocs=1655589) >>> 0.029934512 = queryNorm >>> 1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of: >>> 1.7320508 = tf(termFreq(markup_t:hanger)=3) >>> 6.559371 = idf(docFreq=6375, numDocs=1655589) >>> 0.109375 = fieldNorm(field=markup_t, doc=3454) >>> 3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of: >>> 0.83205026 = queryWeight(title_t:hanger^3.0), product of: >>> 3.0 = boost >>> 9.265228 = idf(docFreq=425, numDocs=1655589) >>> 0.029934512 = queryNorm >>> 4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of: >>> 1.0 = tf(termFreq(title_t:hanger)=1) >>> 9.265228 = idf(docFreq=425, numDocs=1655589) >>> 0.4375 = fieldNorm(field=title_t, doc=3454) >>> >>> The thing that is perplexing is that the fieldNorm for the title_t field >>> is >>> different in each of the explanations, ie: the fieldNorm using the old >>> stemmer is: 0.5 = fieldNorm(field=title_t, doc=3454). For the new stemmer >>> 0.4375 = fieldNorm(field=title_t, doc=3454). I ran the title through both >>> stemmers and get the same number of tokens produced. I do no index time >>> boosting on the title_t field. I am using DefaultSimilarity in both >>> instances. So I figured the calculated fieldNorm would be: >>> >>> field boost * lengthNorm = 1 * 1/sqrt(4) = 0.5 >>> >>> I wouldn't have thought that changing the stemmer would have any impact >>> on >>> the fieldNorm in this case. Any insight? Please kick me over to the >>> lucene >>> list if you feel this isn't appropriate here. >>> >>> Regards >>> Brendan > >