I am seeing some issues with search quality for an intranet crawl I am working 
on, and the problem seems to be related to fieldNorm value
   
  I have two documents both of comparable size, but the fieldNorm for 'content' 
field of one document is significantly lower (2.4414062E-4) :
   
  Here is the output from explain: The second document has a much higher 
termFreq but is ranking lower
      
   0.12035767 = (MATCH) sum of:
    
      0.12035767 = (MATCH) weight(content:nfl in 1234), product of:
    
         0.1656037 = queryWeight(content:nfl), product of:
    
            6.644858 = idf(docFreq=5)
    
            0.024922082 = queryNorm


    
         0.7267813 = (MATCH) fieldWeight(content:nfl in 1234), product of:
    
            1.0 = tf(termFreq(content:nfl)=1)
    
            6.644858 = idf(docFreq=5)
    
            0.109375 = fieldNorm(field=content, doc=1234)



----------------------------------------     
   0.010692856 = (MATCH) sum of:
    
      0.0032762752 = (MATCH) weight(url:nfl^4.0 in 796), product of:
    
         0.73151344 = queryWeight(url:nfl^4.0), product of:
    
            4.0 = boost
    
            7.338005 = idf(docFreq=2)
    
            0.024922082 = queryNorm


    
         0.004478763 = (MATCH) fieldWeight(url:nfl in 796), product of:
    
            1.0 = tf(termFreq(url:nfl)=1)
    
            7.338005 = idf(docFreq=2)
    
            6.1035156E-4 = fieldNorm(field=url, doc=796)



    
      8.495634E-4 = (MATCH) weight(content:nfl in 796), product of:
  
      0.1656037 = queryWeight(content:nfl), product of:
    
         6.644858 = idf(docFreq=5)
    
         0.024922082 = queryNorm


    
      0.005130099 = (MATCH) fieldWeight(content:nfl in 796), product of:
    
         3.1622777 = tf(termFreq(content:nfl)=10)
    
         6.644858 = idf(docFreq=5)
    
         2.4414062E-4 = fieldNorm(field=content, doc=796)



  I added some debugs to the indexer and found that the second document has a 
lengthNorm of 0.02793 vs 0.03162 for the first document . Why is the fieldNorm 
order of magnitude lower? Are there any other factors that impacts the 
fieldNorm?
   
   
   

       
---------------------------------
Never miss a thing.   Make Yahoo your homepage.

Reply via email to