Is there standard way of checking to know whether switching off fieldNorm helps or not?
Regards, Karan Jindal On Mon, Oct 14, 2013 at 2:30 PM, Karan jindal <karanjindal1...@gmail.com>wrote: > Thanks Upayavira for a quick reply, > > True what you have said. But the example you have given is more of > descriptive nature. > Will the same argument apply if field length can't be more than some > threshold (say 10) like in case of "Title"? > "title" are generally short in length. > > Consider following examples (mentioning only title field only):- > > 1) solr > 2) Introduction to solr 3.4 > 3) solr big data > > search query q = solr > > wouldn't it be good to give all the 3 documents same textual score? > Later, They can be re-ranked based on some other feature like year > of publishing? > > Will it be good to switch off fieldNorm for "title"? > > Regards, > Karan Jindal > > > > > > > On Mon, Oct 14, 2013 at 1:14 PM, Upayavira <u...@odoko.co.uk> wrote: > >> You search for the word "jack". Which of these three field values best >> matches? >> >> 1) Jack is great. >> 2) Billy was a young man. Billy studied well and lived well. Jack >> didn't. Billy went travelling and had a great time. >> 3) Billy didn't actually like Jack. Jack could at times be difficult. >> Jack would get angry at the smallest things. >> >> Forgetting algorithms, I'd say #1 is a good match, as is #3, but #2 >> isn't so good. >> >> So, now looking at how field length normalisation can play into this, we >> can note that: >> >> 1) term frequency for "Jack" is 1. Field length is 3. >> 2) term frequency for "Jack" is 1. Field length is 21. >> 3) term frequency for "Jack" is 3. Field length is 19. >> >> If you only take term frequency into account, #1 and #2 would be equal >> matches, and #3 would be a better match than #1. >> >> However, if you include the field length in your calculations, you can >> reach a better approximation to our original proposition, that #1 and #3 >> are better matches. >> >> This simply means the longer the field, the lower the score. For longer >> fields with high term frequencies, the high term frequency can >> counteract the effect of the longer field, giving a (hopefully) similar >> score to a shorter field with fewer term occurrences. >> >> Upayavira >> >> On Mon, Oct 14, 2013, at 08:33 AM, Karan jindal wrote: >> > Hi all, >> > >> > I have a general query about fieldNorm >> > Is it advisable to use fieldNorm (which kinds of gives importance to >> > shorter length fields). >> > Is there any set of standard factors on which the decision of turning >> > fieldNorm on/off can be taken? >> > >> > *In my use case:-* >> > I have a user generated data and primarily there are two searchable >> > fields >> > "title" and "description" apart from certain other filter flags. >> > >> > It is up to user to make "title" short or long? >> > What will be best is this case? >> > >> > Regards, >> > Karan Jindal >> > >