You search for the word "jack". Which of these three field values best
matches?

1) Jack is great.
2) Billy was a young man. Billy studied well and lived well. Jack
didn't. Billy went travelling and had a great time.
3) Billy didn't actually like Jack. Jack could at times be difficult.
Jack would get angry at the smallest things.

Forgetting algorithms, I'd say #1 is a good match, as is #3, but #2
isn't so good.

So, now looking at how field length normalisation can play into this, we
can note that:

1) term frequency for "Jack" is 1. Field length is 3.
2) term frequency for "Jack" is 1. Field length is 21.
3) term frequency for "Jack" is 3. Field length is 19.

If you only take term frequency into account, #1 and #2 would be equal
matches, and #3 would be a better match than #1.

However, if you include the field length in your calculations, you can
reach a better approximation to our original proposition, that #1 and #3
are better matches.

This simply means the longer the field, the lower the score. For longer
fields with high term frequencies, the high term frequency can
counteract the effect of the longer field, giving a (hopefully) similar
score to a shorter field with fewer term occurrences.

Upayavira

On Mon, Oct 14, 2013, at 08:33 AM, Karan jindal wrote:
> Hi all,
> 
> I have a general query about fieldNorm
> Is it advisable to use fieldNorm (which kinds of gives importance to
> shorter length fields).
> Is there any set of standard factors on which the decision of turning
> fieldNorm on/off can be taken?
> 
> *In my use case:-*
> I have a user generated data and primarily there are two searchable
> fields
> "title"  and "description" apart from certain other filter flags.
> 
> It is up to user to make "title" short or long?
> What will be best is this case?
> 
> Regards,
> Karan Jindal

Reply via email to