That is caused by the size of the documents. The principle is pretty intuitive if one of your documents is the entire three volumes of The Lord of the Rings, and you search for "tree" I know that The Lord of the Rings will be in the results, and I haven't memorized the entire text of that book :p It is a matter of probability that if you have a big (big!) text any word will have a greater chance to be found than in a smaller letter. So one can infer that the letter is more relevant than the big text. That is the principle applied here and Lucene does that when building the ranking. The first document is bigger (remember that all the values of a multivalued field are merged into one field in the index, so you can not tell one value from another apart) than the second one. In the first one you have [Fred, coolest, guy, town] and in the second [Fred, Anderson], so the second document is more relevant than the first one.
To avoid all this procedure you can set omitNorms to true and that should make the first document more relevant because Fred appears twice (not because Fred appears alone in a value) Regards Emmanuel 2011/7/26 Brian Lamb <brian.l...@journalexperts.com> > Hi all, > > I am a little confused as to why the scoring is working the way it is: > > I have a field defined as: > > <field name="myname" type="text" indexed="true" stored="true" > required="false" multivalued="true" /> > > And I have several documents where that value is: > > RECORD 1 > <arr name="myname"> > <str>Fred</str> > <str>Fred (the coolest guy in town)</str> > </arr> > > OR > > RECORD 2 > <arr name="myname"> > <str>Fred Anderson</str> > </arr> > > What happens when I do a search for > http://localhost:8983/solr/search/?q=myname:Fred I get RECORD 2 > returned before RECORD 1. > > RECORD 2 > 5.282213 = (MATCH) fieldWeight(myname:Fred in 256575), product of: > 1.0 = tf(termFreq(myname:Fred)=1) > 8.451541 = idf(docFreq=7306, maxDocs=12586425) > 0.625 = fieldNorm(field=myname, doc=256575) > > RECORD 1 > 4.482106 = (MATCH) fieldWeight(myname:Fred in 215), product of: > 1.4142135 = tf(termFreq(myname:Fred)=2) > 8.451541 = idf(docFreq=7306, maxDocs=12586425) > 0.375 = fieldNorm(field=myname, doc=215) > > So the difference is fieldNorm obviously but I think that's only part > of the story. Why is RECORD 2 returned with a higher score than RECORD > 1 even though RECORD 1 matches "Fred" exactly? And how should I do > this differently so that I am getting the results I am expecting? > > Thanks, > > Brian Lamb >