If you download the latest trunk copy of 0.8, bin/nutch will not even be available.. is this supposed to be this way? Matt
Bryan Woliner wrote: > I am certainly far from a nutch expert, but it appears to me that > there are > two errors in the current Nutch 0.8 tutorial. > > First off, here is the version of Nutch 0.8 that I am using, in case > there > has been changes made in newer version that invalidate my comments: > > -bash-2.05b$ svn info > Path: . > URL: http://svn.apache.org/repos/asf/lucene/nutch/trunk > Repository Root: http://svn.apache.org/repos/asf > Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 > Revision: 414318 > Node Kind: directory > Schedule: normal > Last Changed Author: siren > Last Changed Rev: 414306 > Last Changed Date: 2006-06-14 11:08:28 -0500 (Wed, 14 Jun 2006) > Properties Last Updated: 2006-06-14 12:00:57 -0500 (Wed, 14 Jun 2006) > > Error #1: > > Towards the end of the tutorial, the following command is found: > > bin/nutch invertlinks crawl/linkdb crawl/segments > > > When I call this command verbatim, I get the following error: > > 2006-07-25 08:44:40,503 WARN mapred.LocalJobRunner > (LocalJobRunner.java:run(119)) > - job_8ly5hf > java.io.IOException: No input directories specified in: Configuration: > defaults: hadoop-default.xml , mapred-default.xml , > /home/bryan/nutch-8d/hadoop/mapred/local/localRunner/job_8ly5hf.xmlfinal: > hadoop-site.xml > at org.apache.hadoop.mapred.InputFormatBase.listPaths( > InputFormatBase.java:96) > at org.apache.hadoop.mapred.SequenceFileInputFormat.listPaths( > SequenceFileInputFormat.java:37) > at org.apache.hadoop.mapred.InputFormatBase.getSplits( > InputFormatBase.java:106) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run( > LocalJobRunner.java:80) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342) > at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:203) > at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:305) > > I think the correct syntax for the command should be: > > bin/nutch invertlinks crawl/linkdb crawl/segments/* (with the /* added > to the end). > > Error #2: > > The tutorial says that to index, the following command should be called: > > bin/nutch index indexes crawl/linkdb crawl/segments/* > > However, when I call that command I get the following error: > > Usage: <index> <crawldb> <linkdb> <segment> ... > > I believe the correct syntax should be: > > bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/* > > If these are indeed errors in the tutorial, perhaps someone with the > authority to do so would be kind enough the make the necessary > changes. > > My two cents, > Bryan > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
