Hi Ken:

As exactly you described, inside IndexSegment.java,
calculateBoost() method do the real work for calculate
a doc's boost value, which is its' page rank.

Following is its' code

// 1. Start with page's score from DB -- 1.0 if no
link analysis.
float res = pageScore;
// 2. Apply scorePower to this.
res = (float)Math.pow(pageScore, scorePower);
// 3. Optionally boost by log of incoming anchor
count.
if (boostByLinkCount)
   res *= (float)Math.log(Math.E + linkCount);


Seems to me, this calculation procedure doesn't count
the weight(page rank) of the inbound links. Only
consider the number of inbound links.

While the typical page rank formula is 
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

So, does that mean Nutch's link analysis using
different page ranking concept as google's? 

Or I miss some important points? 

thanks,

Michael Ji,
 

> 
> In any case, if you just use default Nutch settings
> and don't run the 
> DistributedAnalysisTool, then all of the page scores
> are 1.0. So the 
> Lucene document boost winds up being ln(e + inbound
> link count). 0 
> inbound links == 1.0, 10 links = 2.54, 100 links =
> 4.63, etc.
> 
> -- Ken
> -- 
> Ken Krugler
> TransPac Software, Inc.
> <http://www.transpac.com>
> +1 530-470-9200
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to