You search for the word "jack". Which of these three field values best matches?
1) Jack is great. 2) Billy was a young man. Billy studied well and lived well. Jack didn't. Billy went travelling and had a great time. 3) Billy didn't actually like Jack. Jack could at times be difficult. Jack would get angry at the smallest things. Forgetting algorithms, I'd say #1 is a good match, as is #3, but #2 isn't so good. So, now looking at how field length normalisation can play into this, we can note that: 1) term frequency for "Jack" is 1. Field length is 3. 2) term frequency for "Jack" is 1. Field length is 21. 3) term frequency for "Jack" is 3. Field length is 19. If you only take term frequency into account, #1 and #2 would be equal matches, and #3 would be a better match than #1. However, if you include the field length in your calculations, you can reach a better approximation to our original proposition, that #1 and #3 are better matches. This simply means the longer the field, the lower the score. For longer fields with high term frequencies, the high term frequency can counteract the effect of the longer field, giving a (hopefully) similar score to a shorter field with fewer term occurrences. Upayavira On Mon, Oct 14, 2013, at 08:33 AM, Karan jindal wrote: > Hi all, > > I have a general query about fieldNorm > Is it advisable to use fieldNorm (which kinds of gives importance to > shorter length fields). > Is there any set of standard factors on which the decision of turning > fieldNorm on/off can be taken? > > *In my use case:-* > I have a user generated data and primarily there are two searchable > fields > "title" and "description" apart from certain other filter flags. > > It is up to user to make "title" short or long? > What will be best is this case? > > Regards, > Karan Jindal