I re-crawl from the injection stage, but it still throws `webgraph.LinkRank: LinkAnalysis: java.io.IOException: No links to process, is the webgraph empty?'
Checking the source, it shows hdfs will read from numLinksPath/part-00000, of which numLinksPaths is constructed by webGraphDb/NUM_NODES where NUM_NODES is "_num_nodes_". However, listing hdfs content under webgraphdb, there is not such path existing. drwxr-xr-x - crawler supergroup 0 2011-09-23 13:17 /crawl/webgraphdb/inlinks drwxr-xr-x - crawler supergroup 0 2011-09-23 13:27 /crawl/webgraphdb/linkrank drwxr-xr-x - crawler supergroup 0 2011-09-23 13:25 /crawl/webgraphdb/loops drwxr-xr-x - crawler supergroup 0 2011-09-23 13:18 /crawl/webgraphdb/nodes drwxr-xr-x - crawler supergroup 0 2011-09-23 13:16 /crawl/webgraphdb/outlinks drwxr-xr-x - crawler supergroup 0 2011-09-23 13:24 /crawl/webgraphdb/routes How can I examine what the properties values are used when nutch process? Thanks On Thu, Sep 22, 2011 at 5:26 PM, lewis john mcgibbney <[email protected]> wrote: > Hi Thomas, > > After adding the properties as you mentioned, did you re-start at the > injecting stage or did you just use the webgraph class? If the latter, then > I would try re-starting the whole process, maybe even checking you reading > your crawldb on the way to executing the webgraph class. > > Just a quick note on this one, Markus (I think) added the webgraph commands > to the nutch script so this creates a simpler working environment from 1.4 > onwards. > > On Thu, Sep 22, 2011 at 7:53 AM, Thomas Anderson > <[email protected]>wrote: > >> I follow the example tutorial at >> http://wiki.apache.org/nutch/NewScoringIndexingExample. Nearly all >> command executes well except LinkRank command. >> >> When executing LinkRank command `nutch >> org.apache.nutch.scoring.webgraph.LinkRank -webgraphdb >> crawl/webgraphdb/,` it throws following exception. >> >> 11/09/22 14:44:56 FATAL webgraph.LinkRank: LinkAnalysis: >> java.io.IOException: No links to process, is the webgraph empty? >> at >> org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:131) >> at >> org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:610) >> at org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:686) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at >> org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:656) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >> >> At beginning I do not add the following properties to >> hadoop/conf/nutch-site.xml >> >> <!-- linkrank scoring properties --> >> <property> >> <name>link.ignore.internal.host</name> >> <value>true</value> >> <description>Ignore outlinks to the same hostname.</description> >> </property> >> >> <property> >> <name>link.ignore.internal.domain</name> >> <value>true</value> >> <description>Ignore outlinks to the same domain.</description> >> </property> >> >> <property> >> <name>link.ignore.limit.page</name> >> <value>true</value> >> <description>Limit to only a single outlink to the same >> page.</description> >> </property> >> >> <property> >> <name>link.ignore.limit.domain</name> >> <value>true</value> >> <description>Limit to only a single outlink to the same >> domain.</description> >> </property> >> >> But after adding those properties, the exception remains. >> What may cause such error? >> >> Environment: java "1.6.0_26", debian with 2.6.39-2-686-pae kernel, >> nutch 1.3, hadoop 0.20.2 >> >> Thanks >> > > > > -- > *Lewis* >

