Hi Ken: As exactly you described, inside IndexSegment.java, calculateBoost() method do the real work for calculate a doc's boost value, which is its' page rank.
Following is its' code // 1. Start with page's score from DB -- 1.0 if no link analysis. float res = pageScore; // 2. Apply scorePower to this. res = (float)Math.pow(pageScore, scorePower); // 3. Optionally boost by log of incoming anchor count. if (boostByLinkCount) res *= (float)Math.log(Math.E + linkCount); Seems to me, this calculation procedure doesn't count the weight(page rank) of the inbound links. Only consider the number of inbound links. While the typical page rank formula is PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) So, does that mean Nutch's link analysis using different page ranking concept as google's? Or I miss some important points? thanks, Michael Ji, > > In any case, if you just use default Nutch settings > and don't run the > DistributedAnalysisTool, then all of the page scores > are 1.0. So the > Lucene document boost winds up being ln(e + inbound > link count). 0 > inbound links == 1.0, 10 links = 2.54, 100 links = > 4.63, etc. > > -- Ken > -- > Ken Krugler > TransPac Software, Inc. > <http://www.transpac.com> > +1 530-470-9200 > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
