I am using nutch-0.5. For some reason, the effect of document boost is
not showing up in the search results.

    Below is the 'explanation' of a sample query "solar". I don't see the
boost value (1.5514448) being used at all in the calculation of the
document score - from the 'explanation' below and also from the quality of
the search.

    I used CrawlTool to crawl and index. I added 2 lines to CrawlTool as
shown below.

    How can I see the effect of document boost?

    Thanks.

-Vikas

--------------------- CrawlTool.java ---------------------


    // re-fetch everything
    Fetcher.main(new String[] { "-threads", ""+threads,
                                segment } );

--->>>// calculate PageRank <-- added these 2 lines
    String argv2[] = {db, pageRankIterationsString};
    net.nutch.tools.LinkAnalysisTool.main(argv2);

    // index, dedup & merge
    IndexSegment.main(new String[] { segment } );
    DeleteDuplicates.main(new String[] { segments, dir + "/dedup" } );
    IndexMerger.main(new String[] { dir + "/index", segment } );


--------------- a sample explanation ------------
page
docNo = 0
segment = 20041224185557
digest = 18cef64c6c53ecf399abfd9239caf240

---->>>boost = 1.5514448 <--(not used in the calculation below)

lang = en
url = http://www.cs.utexas.edu/users/vgupta/web1/a31.html
anchor = Solar System jigsaw
anchor = a31.html
anchor = Solar System jigsaw
title = Solar System jigsaw
H1 = Solar System jigsaw

score for query: solar
0.7634825 = sum of:
0.5878681 = weight(anchor:solar^2.0 in 0), product of:
0.3017718 = queryWeight(anchor:solar^2.0), product of:
2.0 = boost
2.2039728 = idf(docFreq=2)
0.06846087 = queryNorm
1.9480551 = fieldWeight(anchor:solar in 0), product of:
1.4142135 = tf(termFreq(anchor:solar)=2)
2.2039728 = idf(docFreq=2)
0.625 = fieldNorm(field=anchor, doc=0)
0.010230478 = weight(content:solar in 0), product of:
0.09287914 = queryWeight(content:solar), product of:
1.3566749 = idf(docFreq=6)
0.06846087 = queryNorm
0.11014828 = fieldWeight(content:solar in 0), product of:
1.7320508 = tf(termFreq(content:solar)=3)
1.3566749 = idf(docFreq=6)
0.046875 = fieldNorm(field=content, doc=0)
0.16538392 = weight(H1:solar^1.5 in 0), product of:
0.1393187 = queryWeight(H1:solar^1.5), product of:
1.5 = boost
1.3566749 = idf(docFreq=2)
0.06846087 = queryNorm
1.1870905 = fieldWeight(H1:solar in 0), product of:
1.0 = tf(termFreq(H1:solar)=1)
1.3566749 = idf(docFreq=2)
0.875 = fieldNorm(field=H1, doc=0)




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to