I follow the example tutorial at
http://wiki.apache.org/nutch/NewScoringIndexingExample. Nearly all
command executes well except LinkRank command.
When executing LinkRank command `nutch
org.apache.nutch.scoring.webgraph.LinkRank -webgraphdb
crawl/webgraphdb/,` it throws following exception.
11/09/22 14:44:56 FATAL webgraph.LinkRank: LinkAnalysis:
java.io.IOException: No links to process, is the webgraph empty?
at
org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:131)
at org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:610)
at org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:686)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:656)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
At beginning I do not add the following properties to
hadoop/conf/nutch-site.xml
<!-- linkrank scoring properties -->
<property>
<name>link.ignore.internal.host</name>
<value>true</value>
<description>Ignore outlinks to the same hostname.</description>
</property>
<property>
<name>link.ignore.internal.domain</name>
<value>true</value>
<description>Ignore outlinks to the same domain.</description>
</property>
<property>
<name>link.ignore.limit.page</name>
<value>true</value>
<description>Limit to only a single outlink to the same page.</description>
</property>
<property>
<name>link.ignore.limit.domain</name>
<value>true</value>
<description>Limit to only a single outlink to the same domain.</description>
</property>
But after adding those properties, the exception remains.
What may cause such error?
Environment: java "1.6.0_26", debian with 2.6.39-2-686-pae kernel,
nutch 1.3, hadoop 0.20.2
Thanks