Vikas,

I don't know enough about Nutch internals and how it uses Lucene, how
much of Lucene it exposes, but if you can get a reference to Lucene's
Explanation instance (see
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/IndexSearcher.html#explain(org.apache.lucene.search.Query,%20int)),
you will be able to check the boost of each hit and much more.

Otis

--- Vikas Gupta <[EMAIL PROTECTED]> wrote:

> I have been browsing around the nutch and lucene code a lot. I just
> needed
> a clarification.
> 
> From the lucene FAQ
>
(http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq),
> 
>     I saw that the score of a document given a query q is computed
> as:
> score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t *
> boost_t)
> * coord_q_d
> 
> where:
>   score_d   : score for document d
>   sum_t     : sum for all terms t
>   tf_q      : the square root of the frequency of t in the query
>   tf_d      : the square root of the frequency of t in d
>   idf_t     : log(numDocs/docFreq_t+1) + 1.0
>   numDocs   : number of documents in index
>   docFreq_t : number of documents containing t
>   norm_q    : sqrt(sum_t((tf_q*idf_t)^2))
>   norm_d_t  : square root of number of tokens in d in the same field
> as t
>   boost_t    : the user-specified boost for term t
>   coord_q_d  : number of terms in both query and document / number of
> terms in query
> 
> Now, I also saw that we set the document boost is set as PageRank (in
> IndexSegment::makeDocument) which is further copied to each field's
> boost
> in that document.
> 
> So, can you verify that the PageRank's effect in document scoring
> will
> show up in boost_t above?
> 
> Thanks.
> 
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real
> users.
> Discover which products truly live up to the hype. Start reading now.
> 
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> Nutch-developers mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
> 



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to