n/m it's there now..
Matt

Matthew Holt wrote:
If you download the latest trunk copy of 0.8, bin/nutch will not even be available.. is this supposed to be this way?
Matt

Bryan Woliner wrote:
I am certainly far from a nutch expert, but it appears to me that there are
two errors in the current Nutch 0.8 tutorial.

First off, here is the version of Nutch 0.8 that I am using, in case there
has been changes made in newer version that invalidate my comments:

-bash-2.05b$ svn info
Path: .
URL: http://svn.apache.org/repos/asf/lucene/nutch/trunk
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 414318
Node Kind: directory
Schedule: normal
Last Changed Author: siren
Last Changed Rev: 414306
Last Changed Date: 2006-06-14 11:08:28 -0500 (Wed, 14 Jun 2006)
Properties Last Updated: 2006-06-14 12:00:57 -0500 (Wed, 14 Jun 2006)

Error #1:

Towards the end of the tutorial, the following command is found:

bin/nutch invertlinks crawl/linkdb crawl/segments


When I call this command verbatim, I get the following error:

2006-07-25 08:44:40,503 WARN  mapred.LocalJobRunner
(LocalJobRunner.java:run(119))
- job_8ly5hf
java.io.IOException: No input directories specified in: Configuration:
defaults: hadoop-default.xml , mapred-default.xml ,
/home/bryan/nutch-8d/hadoop/mapred/local/localRunner/job_8ly5hf.xmlfinal:
hadoop-site.xml
       at org.apache.hadoop.mapred.InputFormatBase.listPaths(
InputFormatBase.java:96)
       at org.apache.hadoop.mapred.SequenceFileInputFormat.listPaths(
SequenceFileInputFormat.java:37)
       at org.apache.hadoop.mapred.InputFormatBase.getSplits(
InputFormatBase.java:106)
       at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
LocalJobRunner.java:80)
Exception in thread "main" java.io.IOException: Job failed!
       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342)
       at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:203)
       at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:305)

I think the correct syntax for the command should be:

bin/nutch invertlinks crawl/linkdb crawl/segments/* (with the /* added
to the end).

Error #2:

The tutorial says that to index, the following command should be called:

bin/nutch index indexes crawl/linkdb crawl/segments/*

However, when I call that command I get the following error:

Usage: <index> <crawldb> <linkdb> <segment> ...

I believe the correct syntax should be:

bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/*

If these are indeed errors in the tutorial, perhaps someone with the
authority to do so would be kind enough the make the necessary
changes.

My two cents,
Bryan


Reply via email to