I am using nutch-0.5. For some reason, the effect of document boost is
not showing up in the search results.
Below is the 'explanation' of a sample query "solar". I don't see the
boost value (1.5514448) being used at all in the calculation of the
document score - from the 'explanation' below and also from the quality of
the search.
I used CrawlTool to crawl and index. I added 2 lines to CrawlTool as
shown below.
How can I see the effect of document boost?
Thanks.
-Vikas
--------------------- CrawlTool.java ---------------------
// re-fetch everything
Fetcher.main(new String[] { "-threads", ""+threads,
segment } );
--->>>// calculate PageRank <-- added these 2 lines
String argv2[] = {db, pageRankIterationsString};
net.nutch.tools.LinkAnalysisTool.main(argv2);
// index, dedup & merge
IndexSegment.main(new String[] { segment } );
DeleteDuplicates.main(new String[] { segments, dir + "/dedup" } );
IndexMerger.main(new String[] { dir + "/index", segment } );
--------------- a sample explanation ------------
page
docNo = 0
segment = 20041224185557
digest = 18cef64c6c53ecf399abfd9239caf240
---->>>boost = 1.5514448 <--(not used in the calculation below)
lang = en
url = http://www.cs.utexas.edu/users/vgupta/web1/a31.html
anchor = Solar System jigsaw
anchor = a31.html
anchor = Solar System jigsaw
title = Solar System jigsaw
H1 = Solar System jigsaw
score for query: solar
0.7634825 = sum of:
0.5878681 = weight(anchor:solar^2.0 in 0), product of:
0.3017718 = queryWeight(anchor:solar^2.0), product of:
2.0 = boost
2.2039728 = idf(docFreq=2)
0.06846087 = queryNorm
1.9480551 = fieldWeight(anchor:solar in 0), product of:
1.4142135 = tf(termFreq(anchor:solar)=2)
2.2039728 = idf(docFreq=2)
0.625 = fieldNorm(field=anchor, doc=0)
0.010230478 = weight(content:solar in 0), product of:
0.09287914 = queryWeight(content:solar), product of:
1.3566749 = idf(docFreq=6)
0.06846087 = queryNorm
0.11014828 = fieldWeight(content:solar in 0), product of:
1.7320508 = tf(termFreq(content:solar)=3)
1.3566749 = idf(docFreq=6)
0.046875 = fieldNorm(field=content, doc=0)
0.16538392 = weight(H1:solar^1.5 in 0), product of:
0.1393187 = queryWeight(H1:solar^1.5), product of:
1.5 = boost
1.3566749 = idf(docFreq=2)
0.06846087 = queryNorm
1.1870905 = fieldWeight(H1:solar in 0), product of:
1.0 = tf(termFreq(H1:solar)=1)
1.3566749 = idf(docFreq=2)
0.875 = fieldNorm(field=H1, doc=0)
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general