Hello
      the other question is how can use it?when I try to run this:

nutch crawl crawl/url -dir crawl -depth 3


got these errors:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: 
Input path does not exist: file:/usr/local/hadoop/nutch-1.1/conf/crawl/url
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Obviously,It is use local file system by default,I think I must do something in 
nutch-site.xml or change this command,but how?I tried googling,but no 
solutions,actually so few stuff about nutch are available,and there is a 
tutorial in tutorial wiki,but it is long and bugging.  


On Thursday, July 22, 2010 03:04:04 pm CatOs Mandros wrote:
> You should run $HADOOP_HOME/bin/start-all.sh
> 
> Specifically, the files I have soft-linked are:
> $NUTCH_HOME/conf/core-site.xml  -> $HADOOP_HOME/conf/core-site.xml
> $NUTCH_HOME/conf/hdfs-site.xml  -> $HADOOP_HOME/conf/hdfs-site.xml
> $NUTCH_HOME/conf/mapred-site.xml  -> $HADOOP_HOME/conf/mapred-site.xml
> $NUTCH_HOME/conf/masters  -> $HADOOP_HOME/conf/masters
> $NUTCH_HOME/conf/slaves  -> $HADOOP_HOME/conf/slaves
> 
> I suppose not all the files are necessary, but its working for me :)
> 
> On Wed, Jul 21, 2010 at 3:37 PM, Alex Luya <[email protected]> wrote:
> > CatOs Mandros:
> >       if you do that,I must restart hadoop cluster,and which command will
> > be run, $HADOOP_HOME/bin/start-all.sh or $NUTCH_HOME/bin/start-all.sh?
> > 
> > On Wednesday, July 21, 2010 01:54:06 pm CatOs Mandros wrote:
> >> I just soft-linked all the relevant configuration files from the nutch
> >> instalation to the hadoop ones, and now I can use the nutch script
> >> transparently.
> >> 
> >> On Wed, Jul 21, 2010 at 3:55 AM, Brian Tingle <[email protected]> 
wrote:
> >> > I wasted a lot of time trying to figure this out before I realized
> >> > there is an ant target where you can go 'ant job' and then you get a
> >> > file 'nutch.job' that you can move to the hadoop cluster and then you
> >> > can do something like  'hadoop -job nutch.job path.to.nutch.Class
> >> > blah blah blah
> >> > 
> >> > 
> >> > -----Original Message-----
> >> > From: Alex Luya [mailto:[email protected]]
> >> > Sent: Tue 7/20/2010 6:09 PM
> >> > To: [email protected]
> >> > Subject: Hello,How can I just get nutch worked on this running hadoop
> >> > cluster without bunch of works of compile and  configuration.
> >> > 
> >> > Hello:
> >> >    According to this
> >> > tutorial:http://wiki.apache.org/nutch/NutchHadoopTutorial,hadoop
> >> > already shipped with nutch,and user can just use it,but
> >> > I  have already a hadoop cluster running now,How can I just get nutch
> >> > worked on this running hadoop cluster without bunch of works of
> >> > compile and configuration.
> >> > (I am using hadoop 0.20.2,and I want to use nutch v1.1,so few tutorial
> >> > and instruction  are available on web )

Reply via email to