Re: No links to process, is the webgraph empty?

Thomas Anderson Thu, 22 Sep 2011 22:55:49 -0700

I re-crawl from the injection stage, but it still throws
`webgraph.LinkRank: LinkAnalysis: java.io.IOException: No links to
process, is the webgraph empty?'


Checking the source, it shows hdfs will read from
numLinksPath/part-00000, of which numLinksPaths is constructed by
webGraphDb/NUM_NODES where NUM_NODES is "_num_nodes_". However,
listing hdfs content under webgraphdb, there is not such path
existing.

drwxr-xr-x   - crawler supergroup          0 2011-09-23 13:17
/crawl/webgraphdb/inlinks
drwxr-xr-x   - crawler supergroup          0 2011-09-23 13:27
/crawl/webgraphdb/linkrank
drwxr-xr-x   - crawler supergroup          0 2011-09-23 13:25
/crawl/webgraphdb/loops
drwxr-xr-x   - crawler supergroup          0 2011-09-23 13:18
/crawl/webgraphdb/nodes
drwxr-xr-x   - crawler supergroup          0 2011-09-23 13:16
/crawl/webgraphdb/outlinks
drwxr-xr-x   - crawler supergroup          0 2011-09-23 13:24
/crawl/webgraphdb/routes

How can I examine what the properties values are used when nutch process?

Thanks


On Thu, Sep 22, 2011 at 5:26 PM, lewis john mcgibbney
<[email protected]> wrote:
> Hi Thomas,
>
> After adding the properties as you mentioned, did you re-start at the
> injecting stage or did you just use the webgraph class? If the latter, then
> I would try re-starting the whole process, maybe even checking you reading
> your crawldb on the way to executing the webgraph class.
>
> Just a quick note on this one, Markus (I think) added the webgraph commands
> to the nutch script so this creates a simpler working environment from 1.4
> onwards.
>
> On Thu, Sep 22, 2011 at 7:53 AM, Thomas Anderson
> <[email protected]>wrote:
>
>> I follow the example tutorial at
>> http://wiki.apache.org/nutch/NewScoringIndexingExample. Nearly all
>> command executes well except LinkRank command.
>>
>> When executing LinkRank command `nutch
>> org.apache.nutch.scoring.webgraph.LinkRank -webgraphdb
>> crawl/webgraphdb/,` it throws following exception.
>>
>> 11/09/22 14:44:56 FATAL webgraph.LinkRank: LinkAnalysis:
>> java.io.IOException: No links to process, is the webgraph empty?
>>        at
>> org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:131)
>>        at
>> org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:610)
>>        at org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:686)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>        at
>> org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:656)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>> At beginning I do not add the following properties to
>> hadoop/conf/nutch-site.xml
>>
>> <!-- linkrank scoring properties -->
>> <property>
>>  <name>link.ignore.internal.host</name>
>>  <value>true</value>
>>  <description>Ignore outlinks to the same hostname.</description>
>> </property>
>>
>> <property>
>>  <name>link.ignore.internal.domain</name>
>>  <value>true</value>
>>  <description>Ignore outlinks to the same domain.</description>
>> </property>
>>
>> <property>
>>  <name>link.ignore.limit.page</name>
>>  <value>true</value>
>>  <description>Limit to only a single outlink to the same
>> page.</description>
>> </property>
>>
>> <property>
>>  <name>link.ignore.limit.domain</name>
>>  <value>true</value>
>>  <description>Limit to only a single outlink to the same
>> domain.</description>
>> </property>
>>
>> But after adding those properties, the exception remains.
>> What may cause such error?
>>
>> Environment: java "1.6.0_26", debian with 2.6.39-2-686-pae kernel,
>> nutch 1.3, hadoop 0.20.2
>>
>> Thanks
>>
>
>
>
> --
> *Lewis*
>

Re: No links to process, is the webgraph empty?

Reply via email to