Re: Question about fieldNorm

Yonik Seeley Wed, 11 Jun 2008 13:01:01 -0700

Field norms have limited precision (it's encoded as an 8 bit float) so
you are probably seeing rounding.


-Yonik

On Wed, Jun 11, 2008 at 2:13 PM, Brendan Grainger
<[EMAIL PROTECTED]> wrote:
> Hi Yonik,
>
> I just realized that the stemmer does make a difference because of synonyms.
> So on indexing using the new stemmer "converter hanger assembly replacement"
> gets expanded to: "converter hanger assembly assemble replacement" so there
> are 5 terms which gets a length norm of 0.4472136 instead of 0.5. Still
> unsure how it gets 0.4375 though as the result for the field norm though
> unless I have a boost of 0.9783 somewhere there.
>
> Brendan
>
>
> On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:
>
>> That is strange... did you re-index or change the index?  If so, you
>> might want to verify that docid=3454 still corresponds to the same
>> document you queried earlier.
>>
>> -Yonik
>>
>>
>> On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
>> <[EMAIL PROTECTED]> wrote:
>>>
>>> I've just changed the stemming algorithm slightly and am running a few
>>> tests
>>> against the old stemmer versus the new stemmer. I did a query for
>>> 'hanger'
>>> and using the old stemmer I get the following scoring for a document with
>>> the title: Converter Hanger Assembly Replacement
>>>
>>> 6.4242806 = (MATCH) sum of:
>>> 2.5697122 = (MATCH) max of:
>>>  0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
>>>    0.1963516 = queryWeight(markup_t:hanger), product of:
>>>      6.5593724 = idf(docFreq=6375, numDocs=1655591)
>>>      0.02993451 = queryNorm
>>>    1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
>>>      1.7320508 = tf(termFreq(markup_t:hanger)=3)
>>>      6.5593724 = idf(docFreq=6375, numDocs=1655591)
>>>      0.109375 = fieldNorm(field=markup_t, doc=3454)
>>>  2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
>>>    0.5547002 = queryWeight(title_t:hanger^2.0), product of:
>>>      2.0 = boost
>>>      9.265229 = idf(docFreq=425, numDocs=1655591)
>>>      0.02993451 = queryNorm
>>>    4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
>>>      1.0 = tf(termFreq(title_t:hanger)=1)
>>>      9.265229 = idf(docFreq=425, numDocs=1655591)
>>>      0.5 = fieldNorm(field=title_t, doc=3454)
>>> 3.8545685 = (MATCH) max of:
>>>  0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
>>>    0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
>>>      0.5 = boost
>>>      6.5593724 = idf(docFreq=6375, numDocs=1655591)
>>>      0.02993451 = queryNorm
>>>    1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
>>>      1.7320508 = tf(termFreq(markup_t:hanger)=3)
>>>      6.5593724 = idf(docFreq=6375, numDocs=1655591)
>>>      0.109375 = fieldNorm(field=markup_t, doc=3454)
>>>  3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
>>>    0.8320503 = queryWeight(title_t:hanger^3.0), product of:
>>>      3.0 = boost
>>>      9.265229 = idf(docFreq=425, numDocs=1655591)
>>>      0.02993451 = queryNorm
>>>    4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
>>>      1.0 = tf(termFreq(title_t:hanger)=1)
>>>      9.265229 = idf(docFreq=425, numDocs=1655591)
>>>      0.5 = fieldNorm(field=title_t, doc=3454)
>>>
>>> Using the new stemmer I get:
>>>
>>> 5.621245 = (MATCH) sum of:
>>> 2.248498 = (MATCH) max of:
>>>  0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
>>>    0.19635157 = queryWeight(markup_t:hanger), product of:
>>>      6.559371 = idf(docFreq=6375, numDocs=1655589)
>>>      0.029934512 = queryNorm
>>>    1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
>>>      1.7320508 = tf(termFreq(markup_t:hanger)=3)
>>>      6.559371 = idf(docFreq=6375, numDocs=1655589)
>>>      0.109375 = fieldNorm(field=markup_t, doc=3454)
>>>  2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
>>>    0.5547002 = queryWeight(title_t:hanger^2.0), product of:
>>>      2.0 = boost
>>>      9.265228 = idf(docFreq=425, numDocs=1655589)
>>>      0.029934512 = queryNorm
>>>    4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
>>>      1.0 = tf(termFreq(title_t:hanger)=1)
>>>      9.265228 = idf(docFreq=425, numDocs=1655589)
>>>      0.4375 = fieldNorm(field=title_t, doc=3454)
>>> 3.372747 = (MATCH) max of:
>>>  0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
>>>    0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
>>>      0.5 = boost
>>>      6.559371 = idf(docFreq=6375, numDocs=1655589)
>>>      0.029934512 = queryNorm
>>>    1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
>>>      1.7320508 = tf(termFreq(markup_t:hanger)=3)
>>>      6.559371 = idf(docFreq=6375, numDocs=1655589)
>>>      0.109375 = fieldNorm(field=markup_t, doc=3454)
>>>  3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
>>>    0.83205026 = queryWeight(title_t:hanger^3.0), product of:
>>>      3.0 = boost
>>>      9.265228 = idf(docFreq=425, numDocs=1655589)
>>>      0.029934512 = queryNorm
>>>    4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
>>>      1.0 = tf(termFreq(title_t:hanger)=1)
>>>      9.265228 = idf(docFreq=425, numDocs=1655589)
>>>      0.4375 = fieldNorm(field=title_t, doc=3454)
>>>
>>> The thing that is perplexing is that the fieldNorm for the title_t field
>>> is
>>> different in each of the explanations, ie: the fieldNorm using the old
>>> stemmer is: 0.5 = fieldNorm(field=title_t, doc=3454). For the new stemmer
>>> 0.4375 = fieldNorm(field=title_t, doc=3454). I ran the title through both
>>> stemmers and get the same number of tokens produced. I do no index time
>>> boosting on the title_t field. I am using DefaultSimilarity in both
>>> instances. So I figured the calculated fieldNorm would be:
>>>
>>> field boost * lengthNorm = 1 * 1/sqrt(4) = 0.5
>>>
>>> I wouldn't have thought that changing the stemmer would have any impact
>>> on
>>> the fieldNorm in this case. Any insight? Please kick me over to the
>>> lucene
>>> list if you feel this isn't appropriate here.
>>>
>>> Regards
>>> Brendan
>
>

Re: Question about fieldNorm

Reply via email to