You're correct, the fieldNorm is calculated at index time. The URL and anchor boosts are query time boosts that can be changed within the query-basic plugin. There's been talk about ripping these out and placing them in the conf file, but I'm not sure if it's been done yet.
On Apr 12, 2005 1:50 AM, Kannan Sundaramoorthy <[EMAIL PROTECTED]> wrote: > > Hi, > Thanks for the explanation. I need some more info. I understand that > fieldNorm is byte-encoded normalization factor for the named field of > every document. This value is returned by norms(String field) of > SegmentReader class. Is this normalization factor calculated at index > time itself and is just read during searching? > > I used explain to see boosts for different fields. Please see the > details below. As I see from explanation, "url" field is assigned a > boost of 4.0 and "anchor" field is assigned a boost of 2.0. Please > suggest me how I can alter boost values for different fields. Does it > need any configuration change during indexing itself? > > page > > * docNo = 1 > * segment = 20050411183746 > * digest = 3835653251e4598bee61618b1c64804c > * boost = 1.8572323 > * lastModified = 1113224620000 > * contentLength = 347 > * primaryType = text > * subType = html > * url = http://localhost:8080/none.html > * title = None Document > > score for query: none > > * 1.5042199 = sum of: > o 0.4181689 = weight(url:none^4.0 in 1), product of: > + 0.8728715 = queryWeight(url:none^4.0), product > of: > # 4.0 = boost > # 1.9162908 = idf(docFreq=1) > # 0.11387514 = queryNorm > + 0.4790727 = fieldWeight(url:none in 1), > product of: > # 1.0 = tf(termFreq(url:none)=1) > # 1.9162908 = idf(docFreq=1) > # 0.25 = fieldNorm(field=url, doc=1) > o 1.0349152 = weight(anchor:none^2.0 in 1), product of: > + 0.43643576 = queryWeight(anchor:none^2.0), > product of: > # 2.0 = boost > # 1.9162908 = idf(docFreq=1) > # 0.11387514 = queryNorm > + 2.3712888 = fieldWeight(anchor:none in 1), > product of: > # 1.4142135 = tf(termFreq(anchor:none)=2) > # 1.9162908 = idf(docFreq=1) > # 0.875 = fieldNorm(field=anchor, doc=1) > o 0.05113577 = weight(content:none in 1), product of: > + 0.21821788 = queryWeight(content:none), > product of: > # 1.9162908 = idf(docFreq=1) > # 0.11387514 = queryNorm > + 0.23433356 = fieldWeight(content:none in 1), > product of: > # 2.236068 = tf(termFreq(content:none)=5) > # 1.9162908 = idf(docFreq=1) > # 0.0546875 = fieldNorm(field=content, doc=1) > > Thanks, > Kannan > On Mon, 2005-04-11 at 17:47, Andy Liu wrote: > > fieldNorm is lengthNorm * document boost. The lengthNorm formula is > > defined within Lucene's similarity class (which is a function of the > > number of terms within the document) and the document boost is > > calculated in IndexSegment.java . > > > > Nutch assigns different boosts to each field so that you can tune your > > search results. For example, you can use explain to see if anchor > > matches are too strong, and adjust accordingly. > > > > Andy > > > > On Apr 11, 2005 12:17 AM, Kannan Sundaramoorthy > > <[EMAIL PROTECTED]> wrote: > > > > > > Hi, > > > I am trying to understand how Nutch computes score for each document. I > > > could figure out how tf, idf and queryNorm are computed but I do not > > > understand how fieldNorm (normalisation for each field) value is > > > computed. It seems to be a magic number for me and this is where Nutch > > > seems to differ from Lucene in computing score. > > > Also Nutch assigns different boosts for different fields (e.g, 4.0 for > > > url field) and uses this value while computing queryWeight. Can anyone > > > explain these please? > > > > > > Thanks, > > > Kannan > > > > > > This e-mail and any files transmitted with it are for the sole use of the > > > intended recipient(s) and may contain confidential and privileged > > > information. > > > If you are not the intended recipient, please contact the sender by reply > > > e-mail and destroy all copies of the original message. > > > Any unauthorised review, use, disclosure, dissemination, forwarding, > > > printing or copying of this email or any action taken in reliance on this > > > e-mail is strictly > > > prohibited and may be unlawful. > > > > > > Visit us at http://www.cognizant.com > > > > > This e-mail and any files transmitted with it are for the sole use of the > intended recipient(s) and may contain confidential and privileged information. > If you are not the intended recipient, please contact the sender by reply > e-mail and destroy all copies of the original message. > Any unauthorised review, use, disclosure, dissemination, forwarding, printing > or copying of this email or any action taken in reliance on this e-mail is > strictly > prohibited and may be unlawful. > > Visit us at http://www.cognizant.com >
