That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
<[EMAIL PROTECTED]> wrote:
> I've just changed the stemming algorithm slightly and am running a few tests
> against the old stemmer versus the new stemmer. I did a query for 'hanger'
> and using the old stemmer I get the following scoring for a document with
> the title: Converter Hanger Assembly Replacement
>
> 6.4242806 = (MATCH) sum of:
>  2.5697122 = (MATCH) max of:
>    0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
>      0.1963516 = queryWeight(markup_t:hanger), product of:
>        6.5593724 = idf(docFreq=6375, numDocs=1655591)
>        0.02993451 = queryNorm
>      1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
>        1.7320508 = tf(termFreq(markup_t:hanger)=3)
>        6.5593724 = idf(docFreq=6375, numDocs=1655591)
>        0.109375 = fieldNorm(field=markup_t, doc=3454)
>    2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
>      0.5547002 = queryWeight(title_t:hanger^2.0), product of:
>        2.0 = boost
>        9.265229 = idf(docFreq=425, numDocs=1655591)
>        0.02993451 = queryNorm
>      4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
>        1.0 = tf(termFreq(title_t:hanger)=1)
>        9.265229 = idf(docFreq=425, numDocs=1655591)
>        0.5 = fieldNorm(field=title_t, doc=3454)
>  3.8545685 = (MATCH) max of:
>    0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
>      0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
>        0.5 = boost
>        6.5593724 = idf(docFreq=6375, numDocs=1655591)
>        0.02993451 = queryNorm
>      1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
>        1.7320508 = tf(termFreq(markup_t:hanger)=3)
>        6.5593724 = idf(docFreq=6375, numDocs=1655591)
>        0.109375 = fieldNorm(field=markup_t, doc=3454)
>    3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
>      0.8320503 = queryWeight(title_t:hanger^3.0), product of:
>        3.0 = boost
>        9.265229 = idf(docFreq=425, numDocs=1655591)
>        0.02993451 = queryNorm
>      4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
>        1.0 = tf(termFreq(title_t:hanger)=1)
>        9.265229 = idf(docFreq=425, numDocs=1655591)
>        0.5 = fieldNorm(field=title_t, doc=3454)
>
> Using the new stemmer I get:
>
> 5.621245 = (MATCH) sum of:
>  2.248498 = (MATCH) max of:
>    0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
>      0.19635157 = queryWeight(markup_t:hanger), product of:
>        6.559371 = idf(docFreq=6375, numDocs=1655589)
>        0.029934512 = queryNorm
>      1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
>        1.7320508 = tf(termFreq(markup_t:hanger)=3)
>        6.559371 = idf(docFreq=6375, numDocs=1655589)
>        0.109375 = fieldNorm(field=markup_t, doc=3454)
>    2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
>      0.5547002 = queryWeight(title_t:hanger^2.0), product of:
>        2.0 = boost
>        9.265228 = idf(docFreq=425, numDocs=1655589)
>        0.029934512 = queryNorm
>      4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
>        1.0 = tf(termFreq(title_t:hanger)=1)
>        9.265228 = idf(docFreq=425, numDocs=1655589)
>        0.4375 = fieldNorm(field=title_t, doc=3454)
>  3.372747 = (MATCH) max of:
>    0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
>      0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
>        0.5 = boost
>        6.559371 = idf(docFreq=6375, numDocs=1655589)
>        0.029934512 = queryNorm
>      1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
>        1.7320508 = tf(termFreq(markup_t:hanger)=3)
>        6.559371 = idf(docFreq=6375, numDocs=1655589)
>        0.109375 = fieldNorm(field=markup_t, doc=3454)
>    3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
>      0.83205026 = queryWeight(title_t:hanger^3.0), product of:
>        3.0 = boost
>        9.265228 = idf(docFreq=425, numDocs=1655589)
>        0.029934512 = queryNorm
>      4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
>        1.0 = tf(termFreq(title_t:hanger)=1)
>        9.265228 = idf(docFreq=425, numDocs=1655589)
>        0.4375 = fieldNorm(field=title_t, doc=3454)
>
> The thing that is perplexing is that the fieldNorm for the title_t field is
> different in each of the explanations, ie: the fieldNorm using the old
> stemmer is: 0.5 = fieldNorm(field=title_t, doc=3454). For the new stemmer
>  0.4375 = fieldNorm(field=title_t, doc=3454). I ran the title through both
> stemmers and get the same number of tokens produced. I do no index time
> boosting on the title_t field. I am using DefaultSimilarity in both
> instances. So I figured the calculated fieldNorm would be:
>
> field boost * lengthNorm = 1 * 1/sqrt(4) = 0.5
>
> I wouldn't have thought that changing the stemmer would have any impact on
> the fieldNorm in this case. Any insight? Please kick me over to the lucene
> list if you feel this isn't appropriate here.
>
> Regards
> Brendan

Reply via email to