If you download the latest trunk copy of 0.8, bin/nutch will not even be 
available.. is this supposed to be this way?
Matt

Bryan Woliner wrote:
> I am certainly far from a nutch expert, but it appears to me that 
> there are
> two errors in the current Nutch 0.8 tutorial.
>
> First off, here is the version of Nutch 0.8 that I am using, in case 
> there
> has been changes made in newer version that invalidate my comments:
>
> -bash-2.05b$ svn info
> Path: .
> URL: http://svn.apache.org/repos/asf/lucene/nutch/trunk
> Repository Root: http://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 414318
> Node Kind: directory
> Schedule: normal
> Last Changed Author: siren
> Last Changed Rev: 414306
> Last Changed Date: 2006-06-14 11:08:28 -0500 (Wed, 14 Jun 2006)
> Properties Last Updated: 2006-06-14 12:00:57 -0500 (Wed, 14 Jun 2006)
>
> Error #1:
>
> Towards the end of the tutorial, the following command is found:
>
> bin/nutch invertlinks crawl/linkdb crawl/segments
>
>
> When I call this command verbatim, I get the following error:
>
> 2006-07-25 08:44:40,503 WARN  mapred.LocalJobRunner
> (LocalJobRunner.java:run(119))
> - job_8ly5hf
> java.io.IOException: No input directories specified in: Configuration:
> defaults: hadoop-default.xml , mapred-default.xml ,
> /home/bryan/nutch-8d/hadoop/mapred/local/localRunner/job_8ly5hf.xmlfinal:
> hadoop-site.xml
>        at org.apache.hadoop.mapred.InputFormatBase.listPaths(
> InputFormatBase.java:96)
>        at org.apache.hadoop.mapred.SequenceFileInputFormat.listPaths(
> SequenceFileInputFormat.java:37)
>        at org.apache.hadoop.mapred.InputFormatBase.getSplits(
> InputFormatBase.java:106)
>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> LocalJobRunner.java:80)
> Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342)
>        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:203)
>        at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:305)
>
> I think the correct syntax for the command should be:
>
> bin/nutch invertlinks crawl/linkdb crawl/segments/* (with the /* added
> to the end).
>
> Error #2:
>
> The tutorial says that to index, the following command should be called:
>
> bin/nutch index indexes crawl/linkdb crawl/segments/*
>
> However, when I call that command I get the following error:
>
> Usage: <index> <crawldb> <linkdb> <segment> ...
>
> I believe the correct syntax should be:
>
> bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/*
>
> If these are indeed errors in the tutorial, perhaps someone with the
> authority to do so would be kind enough the make the necessary
> changes.
>
> My two cents,
> Bryan
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to