Vikas, I don't know enough about Nutch internals and how it uses Lucene, how much of Lucene it exposes, but if you can get a reference to Lucene's Explanation instance (see http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/IndexSearcher.html#explain(org.apache.lucene.search.Query,%20int)), you will be able to check the boost of each hit and much more.
Otis --- Vikas Gupta <[EMAIL PROTECTED]> wrote: > I have been browsing around the nutch and lucene code a lot. I just > needed > a clarification. > > From the lucene FAQ > (http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq), > > I saw that the score of a document given a query q is computed > as: > score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t * > boost_t) > * coord_q_d > > where: > score_d : score for document d > sum_t : sum for all terms t > tf_q : the square root of the frequency of t in the query > tf_d : the square root of the frequency of t in d > idf_t : log(numDocs/docFreq_t+1) + 1.0 > numDocs : number of documents in index > docFreq_t : number of documents containing t > norm_q : sqrt(sum_t((tf_q*idf_t)^2)) > norm_d_t : square root of number of tokens in d in the same field > as t > boost_t : the user-specified boost for term t > coord_q_d : number of terms in both query and document / number of > terms in query > > Now, I also saw that we set the document boost is set as PageRank (in > IndexSegment::makeDocument) which is further copied to each field's > boost > in that document. > > So, can you verify that the PageRank's effect in document scoring > will > show up in boost_t above? > > Thanks. > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real > users. > Discover which products truly live up to the hype. Start reading now. > > http://productguide.itmanagersjournal.com/ > _______________________________________________ > Nutch-developers mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/nutch-developers > ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
