Re: [Nutch-general] Two Errors in Nutch 0.8 Tutorial?

Matthew Holt Tue, 25 Jul 2006 08:20:30 -0700

n/m it's there now..
Matt

Matthew Holt wrote:
> If you download the latest trunk copy of 0.8, bin/nutch will not even 
> be available.. is this supposed to be this way?
> Matt
>
> Bryan Woliner wrote:
>> I am certainly far from a nutch expert, but it appears to me that 
>> there are
>> two errors in the current Nutch 0.8 tutorial.
>>
>> First off, here is the version of Nutch 0.8 that I am using, in case 
>> there
>> has been changes made in newer version that invalidate my comments:
>>
>> -bash-2.05b$ svn info
>> Path: .
>> URL: http://svn.apache.org/repos/asf/lucene/nutch/trunk
>> Repository Root: http://svn.apache.org/repos/asf
>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>> Revision: 414318
>> Node Kind: directory
>> Schedule: normal
>> Last Changed Author: siren
>> Last Changed Rev: 414306
>> Last Changed Date: 2006-06-14 11:08:28 -0500 (Wed, 14 Jun 2006)
>> Properties Last Updated: 2006-06-14 12:00:57 -0500 (Wed, 14 Jun 2006)
>>
>> Error #1:
>>
>> Towards the end of the tutorial, the following command is found:
>>
>> bin/nutch invertlinks crawl/linkdb crawl/segments
>>
>>
>> When I call this command verbatim, I get the following error:
>>
>> 2006-07-25 08:44:40,503 WARN  mapred.LocalJobRunner
>> (LocalJobRunner.java:run(119))
>> - job_8ly5hf
>> java.io.IOException: No input directories specified in: Configuration:
>> defaults: hadoop-default.xml , mapred-default.xml ,
>> /home/bryan/nutch-8d/hadoop/mapred/local/localRunner/job_8ly5hf.xmlfinal: 
>>
>> hadoop-site.xml
>>        at org.apache.hadoop.mapred.InputFormatBase.listPaths(
>> InputFormatBase.java:96)
>>        at org.apache.hadoop.mapred.SequenceFileInputFormat.listPaths(
>> SequenceFileInputFormat.java:37)
>>        at org.apache.hadoop.mapred.InputFormatBase.getSplits(
>> InputFormatBase.java:106)
>>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
>> LocalJobRunner.java:80)
>> Exception in thread "main" java.io.IOException: Job failed!
>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342)
>>        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:203)
>>        at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:305)
>>
>> I think the correct syntax for the command should be:
>>
>> bin/nutch invertlinks crawl/linkdb crawl/segments/* (with the /* added
>> to the end).
>>
>> Error #2:
>>
>> The tutorial says that to index, the following command should be called:
>>
>> bin/nutch index indexes crawl/linkdb crawl/segments/*
>>
>> However, when I call that command I get the following error:
>>
>> Usage: <index> <crawldb> <linkdb> <segment> ...
>>
>> I believe the correct syntax should be:
>>
>> bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb 
>> crawl/segments/*
>>
>> If these are indeed errors in the tutorial, perhaps someone with the
>> authority to do so would be kind enough the make the necessary
>> changes.
>>
>> My two cents,
>> Bryan
>>
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Two Errors in Nutch 0.8 Tutorial?

Reply via email to