I am seeing some issues with search quality for an intranet crawl I am working
on, and the problem seems to be related to fieldNorm value
I have two documents both of comparable size, but the fieldNorm for 'content'
field of one document is significantly lower (2.4414062E-4) :
Here is the output from explain: The second document has a much higher
termFreq but is ranking lower
0.12035767 = (MATCH) sum of:
0.12035767 = (MATCH) weight(content:nfl in 1234), product of:
0.1656037 = queryWeight(content:nfl), product of:
6.644858 = idf(docFreq=5)
0.024922082 = queryNorm
0.7267813 = (MATCH) fieldWeight(content:nfl in 1234), product of:
1.0 = tf(termFreq(content:nfl)=1)
6.644858 = idf(docFreq=5)
0.109375 = fieldNorm(field=content, doc=1234)
----------------------------------------
0.010692856 = (MATCH) sum of:
0.0032762752 = (MATCH) weight(url:nfl^4.0 in 796), product of:
0.73151344 = queryWeight(url:nfl^4.0), product of:
4.0 = boost
7.338005 = idf(docFreq=2)
0.024922082 = queryNorm
0.004478763 = (MATCH) fieldWeight(url:nfl in 796), product of:
1.0 = tf(termFreq(url:nfl)=1)
7.338005 = idf(docFreq=2)
6.1035156E-4 = fieldNorm(field=url, doc=796)
8.495634E-4 = (MATCH) weight(content:nfl in 796), product of:
0.1656037 = queryWeight(content:nfl), product of:
6.644858 = idf(docFreq=5)
0.024922082 = queryNorm
0.005130099 = (MATCH) fieldWeight(content:nfl in 796), product of:
3.1622777 = tf(termFreq(content:nfl)=10)
6.644858 = idf(docFreq=5)
2.4414062E-4 = fieldNorm(field=content, doc=796)
I added some debugs to the indexer and found that the second document has a
lengthNorm of 0.02793 vs 0.03162 for the first document . Why is the fieldNorm
order of magnitude lower? Are there any other factors that impacts the
fieldNorm?
---------------------------------
Never miss a thing. Make Yahoo your homepage.