Re: No links to process, is the webgraph empty?

lewis john mcgibbney Thu, 22 Sep 2011 02:27:11 -0700

Hi Thomas,

After adding the properties as you mentioned, did you re-start at the
injecting stage or did you just use the webgraph class? If the latter, then
I would try re-starting the whole process, maybe even checking you reading
your crawldb on the way to executing the webgraph class.


Just a quick note on this one, Markus (I think) added the webgraph commands
to the nutch script so this creates a simpler working environment from 1.4
onwards.

On Thu, Sep 22, 2011 at 7:53 AM, Thomas Anderson
<[email protected]>wrote:

> I follow the example tutorial at
> http://wiki.apache.org/nutch/NewScoringIndexingExample. Nearly all
> command executes well except LinkRank command.
>
> When executing LinkRank command `nutch
> org.apache.nutch.scoring.webgraph.LinkRank -webgraphdb
> crawl/webgraphdb/,` it throws following exception.
>
> 11/09/22 14:44:56 FATAL webgraph.LinkRank: LinkAnalysis:
> java.io.IOException: No links to process, is the webgraph empty?
>        at
> org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:131)
>        at
> org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:610)
>        at org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:686)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at
> org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:656)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> At beginning I do not add the following properties to
> hadoop/conf/nutch-site.xml
>
> <!-- linkrank scoring properties -->
> <property>
>  <name>link.ignore.internal.host</name>
>  <value>true</value>
>  <description>Ignore outlinks to the same hostname.</description>
> </property>
>
> <property>
>  <name>link.ignore.internal.domain</name>
>  <value>true</value>
>  <description>Ignore outlinks to the same domain.</description>
> </property>
>
> <property>
>  <name>link.ignore.limit.page</name>
>  <value>true</value>
>  <description>Limit to only a single outlink to the same
> page.</description>
> </property>
>
> <property>
>  <name>link.ignore.limit.domain</name>
>  <value>true</value>
>  <description>Limit to only a single outlink to the same
> domain.</description>
> </property>
>
> But after adding those properties, the exception remains.
> What may cause such error?
>
> Environment: java "1.6.0_26", debian with 2.6.39-2-686-pae kernel,
> nutch 1.3, hadoop 0.20.2
>
> Thanks
>



-- 
*Lewis*

Re: No links to process, is the webgraph empty?

Reply via email to