Hi,

I'm testing the webgraph functionality of the current trunk, but I think I'm 
doing something wrong, because the LinkRank job always aborts with the 
following error message:
2009-02-24 11:32:36,952 INFO  webgraph.LinkRank - Finished link counter job
2009-02-24 11:32:36,952 INFO  webgraph.LinkRank - Reading numlinks temp file
2009-02-24 11:32:36,952 INFO  webgraph.LinkRank - Deleting numlinks temp file
2009-02-24 11:32:36,952 FATAL webgraph.LinkRank - LinkAnalysis: 
java.lang.NullPointerException
                at 
org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:113)
                at 
org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:582)
                at 
org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:657)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
                at 
org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:627)

I'm doing the following steps:
Injector - Generator - Fetcher2 - ParseSegment - WebGraph - Loops - LinkRank - 
ScoreUpdater - CrawlDb - LinkDb - Indexer - DeleteDubplicates - IndexMerger

If I ignore the fatal error of the LinkRank tool and continue, I get a valid 
index, but every URL is set to the clear score value defined in the nutch-site 
with property link.score.updater.clear.score.

I tested other sequences of the steps mentioned above, e.g. updating CrawlDb 
first, before doing the scoring or doing severeal generate - fetch - parse 
cycles before starting the scoring for the first time, but nothing helped.

I also tried to use the scoring-link plugin instead of doing the scoring 
seperately, but then many of the documents in the index get a boost of 0.0 
assigned, which is the default initialScore.

Do you have any suggestions on how to perform the webgraph scoring correctly?

Kind regards,

Martina


Reply via email to