Re: Two Errors in Nutch 0.8 Tutorial?

Matthew Holt Tue, 25 Jul 2006 08:20:24 -0700

n/m it's there now..
Matt

Matthew Holt wrote:

If you download the latest trunk copy of 0.8, bin/nutch will not evenbe available.. is this supposed to be this way?

Matt


Bryan Woliner wrote:

I am certainly far from a nutch expert, but it appears to me thatthere are

two errors in the current Nutch 0.8 tutorial.

First off, here is the version of Nutch 0.8 that I am using, in casethere

has been changes made in newer version that invalidate my comments:

-bash-2.05b$ svn info
Path: .
URL: http://svn.apache.org/repos/asf/lucene/nutch/trunk
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 414318
Node Kind: directory
Schedule: normal
Last Changed Author: siren
Last Changed Rev: 414306
Last Changed Date: 2006-06-14 11:08:28 -0500 (Wed, 14 Jun 2006)
Properties Last Updated: 2006-06-14 12:00:57 -0500 (Wed, 14 Jun 2006)

Error #1:

Towards the end of the tutorial, the following command is found:

bin/nutch invertlinks crawl/linkdb crawl/segments


When I call this command verbatim, I get the following error:

2006-07-25 08:44:40,503 WARN  mapred.LocalJobRunner
(LocalJobRunner.java:run(119))
- job_8ly5hf
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,

/home/bryan/nutch-8d/hadoop/mapred/local/localRunner/job_8ly5hf.xmlfinal:

hadoop-site.xml
       at org.apache.hadoop.mapred.InputFormatBase.listPaths(
InputFormatBase.java:96)
       at org.apache.hadoop.mapred.SequenceFileInputFormat.listPaths(
SequenceFileInputFormat.java:37)
       at org.apache.hadoop.mapred.InputFormatBase.getSplits(
InputFormatBase.java:106)
       at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
LocalJobRunner.java:80)
Exception in thread "main" java.io.IOException: Job failed!
       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342)
       at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:203)
       at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:305)

I think the correct syntax for the command should be:

bin/nutch invertlinks crawl/linkdb crawl/segments/* (with the /* added
to the end).

Error #2:

The tutorial says that to index, the following command should be called:

bin/nutch index indexes crawl/linkdb crawl/segments/*

However, when I call that command I get the following error:

Usage: <index> <crawldb> <linkdb> <segment> ...

I believe the correct syntax should be:

bin/nutch index crawl/indexes crawl/crawldb crawl/linkdbcrawl/segments/*


If these are indeed errors in the tutorial, perhaps someone with the
authority to do so would be kind enough the make the necessary
changes.

My two cents,
Bryan

Re: Two Errors in Nutch 0.8 Tutorial?

Reply via email to