Is there standard way of checking to know whether switching off fieldNorm
helps or not?

Regards,
Karan Jindal


On Mon, Oct 14, 2013 at 2:30 PM, Karan jindal <karanjindal1...@gmail.com>wrote:

> Thanks Upayavira for a quick reply,
>
> True what you have said. But the example you have given is more of
> descriptive nature.
> Will the same argument apply if field length can't be more than some
> threshold (say 10) like in case of "Title"?
> "title" are generally short in length.
>
> Consider following examples (mentioning only title field only):-
>
> 1) solr
> 2) Introduction to solr 3.4
> 3) solr big data
>
> search query q = solr
>
> wouldn't it be good to give  all the 3 documents same textual score?
> Later, They can be re-ranked based on some other feature like year
> of publishing?
>
> Will it be good to switch off fieldNorm for "title"?
>
> Regards,
> Karan Jindal
>
>
>
>
>
>
> On Mon, Oct 14, 2013 at 1:14 PM, Upayavira <u...@odoko.co.uk> wrote:
>
>> You search for the word "jack". Which of these three field values best
>> matches?
>>
>> 1) Jack is great.
>> 2) Billy was a young man. Billy studied well and lived well. Jack
>> didn't. Billy went travelling and had a great time.
>> 3) Billy didn't actually like Jack. Jack could at times be difficult.
>> Jack would get angry at the smallest things.
>>
>> Forgetting algorithms, I'd say #1 is a good match, as is #3, but #2
>> isn't so good.
>>
>> So, now looking at how field length normalisation can play into this, we
>> can note that:
>>
>> 1) term frequency for "Jack" is 1. Field length is 3.
>> 2) term frequency for "Jack" is 1. Field length is 21.
>> 3) term frequency for "Jack" is 3. Field length is 19.
>>
>> If you only take term frequency into account, #1 and #2 would be equal
>> matches, and #3 would be a better match than #1.
>>
>> However, if you include the field length in your calculations, you can
>> reach a better approximation to our original proposition, that #1 and #3
>> are better matches.
>>
>> This simply means the longer the field, the lower the score. For longer
>> fields with high term frequencies, the high term frequency can
>> counteract the effect of the longer field, giving a (hopefully) similar
>> score to a shorter field with fewer term occurrences.
>>
>> Upayavira
>>
>> On Mon, Oct 14, 2013, at 08:33 AM, Karan jindal wrote:
>> > Hi all,
>> >
>> > I have a general query about fieldNorm
>> > Is it advisable to use fieldNorm (which kinds of gives importance to
>> > shorter length fields).
>> > Is there any set of standard factors on which the decision of turning
>> > fieldNorm on/off can be taken?
>> >
>> > *In my use case:-*
>> > I have a user generated data and primarily there are two searchable
>> > fields
>> > "title"  and "description" apart from certain other filter flags.
>> >
>> > It is up to user to make "title" short or long?
>> > What will be best is this case?
>> >
>> > Regards,
>> > Karan Jindal
>>
>
>

Reply via email to