Re: LinkRank job in webgraph scoring fails

Dennis Kubes Fri, 06 Mar 2009 09:11:33 -0800

Ok, I was able to run through a couple of fetch and index cycles withoutissue. I put up an example of the commands I ran:


http://wiki.apache.org/nutch/NewScoringIndexingExample

Please check this and see if there are differences in what you arecurrently running. Will help to narrow down potential problems.


Dennis


Dennis Kubes wrote:

I am looking into this now. Sorry about the delay. Any moreinformation you can provide would be helpful.
Dennis

Koch Martina wrote:
Hi,
I'm testing the webgraph functionality of the current trunk, but Ithink I'm doing something wrong, because the LinkRank job alwaysaborts with the following error message:2009-02-24 11:32:36,952 INFO webgraph.LinkRank - Finished linkcounter job2009-02-24 11:32:36,952 INFO webgraph.LinkRank - Reading numlinkstemp file2009-02-24 11:32:36,952 INFO webgraph.LinkRank - Deleting numlinkstemp file2009-02-24 11:32:36,952 FATAL webgraph.LinkRank - LinkAnalysis:java.lang.NullPointerExceptionatorg.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:113)atorg.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:582)atorg.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:657)atorg.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)atorg.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:627)
I'm doing the following steps:
Injector - Generator - Fetcher2 - ParseSegment - WebGraph - Loops -LinkRank - ScoreUpdater - CrawlDb - LinkDb - Indexer -DeleteDubplicates - IndexMerger
If I ignore the fatal error of the LinkRank tool and continue, I get avalid index, but every URL is set to the clear score value defined inthe nutch-site with property link.score.updater.clear.score.
I tested other sequences of the steps mentioned above, e.g. updatingCrawlDb first, before doing the scoring or doing severeal generate -fetch - parse cycles before starting the scoring for the first time,but nothing helped.
I also tried to use the scoring-link plugin instead of doing thescoring seperately, but then many of the documents in the index get aboost of 0.0 assigned, which is the default initialScore.
Do you have any suggestions on how to perform the webgraph scoringcorrectly?
Kind regards,

Martina

Re: LinkRank job in webgraph scoring fails

Reply via email to