LinkRank job in webgraph scoring fails

Koch Martina Tue, 24 Feb 2009 08:59:02 -0800

Hi,

I'm testing the webgraph functionality of the current trunk, but I think I'm 
doing something wrong, because the LinkRank job always aborts with the 
following error message:
2009-02-24 11:32:36,952 INFO  webgraph.LinkRank - Finished link counter job
2009-02-24 11:32:36,952 INFO  webgraph.LinkRank - Reading numlinks temp file
2009-02-24 11:32:36,952 INFO  webgraph.LinkRank - Deleting numlinks temp file
2009-02-24 11:32:36,952 FATAL webgraph.LinkRank - LinkAnalysis: 
java.lang.NullPointerException
                at 
org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:113)
                at 
org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:582)
                at 
org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:657)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
                at 
org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:627)


I'm doing the following steps:
Injector - Generator - Fetcher2 - ParseSegment - WebGraph - Loops - LinkRank - 
ScoreUpdater - CrawlDb - LinkDb - Indexer - DeleteDubplicates - IndexMerger

If I ignore the fatal error of the LinkRank tool and continue, I get a valid 
index, but every URL is set to the clear score value defined in the nutch-site 
with property link.score.updater.clear.score.

I tested other sequences of the steps mentioned above, e.g. updating CrawlDb 
first, before doing the scoring or doing severeal generate - fetch - parse 
cycles before starting the scoring for the first time, but nothing helped.

I also tried to use the scoring-link plugin instead of doing the scoring 
seperately, but then many of the documents in the index get a boost of 0.0 
assigned, which is the default initialScore.

Do you have any suggestions on how to perform the webgraph scoring correctly?

Kind regards,

Martina

LinkRank job in webgraph scoring fails

Reply via email to