n/m it's there now.. Matt Matthew Holt wrote: > If you download the latest trunk copy of 0.8, bin/nutch will not even > be available.. is this supposed to be this way? > Matt > > Bryan Woliner wrote: >> I am certainly far from a nutch expert, but it appears to me that >> there are >> two errors in the current Nutch 0.8 tutorial. >> >> First off, here is the version of Nutch 0.8 that I am using, in case >> there >> has been changes made in newer version that invalidate my comments: >> >> -bash-2.05b$ svn info >> Path: . >> URL: http://svn.apache.org/repos/asf/lucene/nutch/trunk >> Repository Root: http://svn.apache.org/repos/asf >> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 >> Revision: 414318 >> Node Kind: directory >> Schedule: normal >> Last Changed Author: siren >> Last Changed Rev: 414306 >> Last Changed Date: 2006-06-14 11:08:28 -0500 (Wed, 14 Jun 2006) >> Properties Last Updated: 2006-06-14 12:00:57 -0500 (Wed, 14 Jun 2006) >> >> Error #1: >> >> Towards the end of the tutorial, the following command is found: >> >> bin/nutch invertlinks crawl/linkdb crawl/segments >> >> >> When I call this command verbatim, I get the following error: >> >> 2006-07-25 08:44:40,503 WARN mapred.LocalJobRunner >> (LocalJobRunner.java:run(119)) >> - job_8ly5hf >> java.io.IOException: No input directories specified in: Configuration: >> defaults: hadoop-default.xml , mapred-default.xml , >> /home/bryan/nutch-8d/hadoop/mapred/local/localRunner/job_8ly5hf.xmlfinal: >> >> hadoop-site.xml >> at org.apache.hadoop.mapred.InputFormatBase.listPaths( >> InputFormatBase.java:96) >> at org.apache.hadoop.mapred.SequenceFileInputFormat.listPaths( >> SequenceFileInputFormat.java:37) >> at org.apache.hadoop.mapred.InputFormatBase.getSplits( >> InputFormatBase.java:106) >> at org.apache.hadoop.mapred.LocalJobRunner$Job.run( >> LocalJobRunner.java:80) >> Exception in thread "main" java.io.IOException: Job failed! >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342) >> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:203) >> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:305) >> >> I think the correct syntax for the command should be: >> >> bin/nutch invertlinks crawl/linkdb crawl/segments/* (with the /* added >> to the end). >> >> Error #2: >> >> The tutorial says that to index, the following command should be called: >> >> bin/nutch index indexes crawl/linkdb crawl/segments/* >> >> However, when I call that command I get the following error: >> >> Usage: <index> <crawldb> <linkdb> <segment> ... >> >> I believe the correct syntax should be: >> >> bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb >> crawl/segments/* >> >> If these are indeed errors in the tutorial, perhaps someone with the >> authority to do so would be kind enough the make the necessary >> changes. >> >> My two cents, >> Bryan >> >
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
