Hello
the other question is how can use it?when I try to run this:
nutch crawl crawl/url -dir crawl -depth 3
got these errors:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist: file:/usr/local/hadoop/nutch-1.1/conf/crawl/url
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:190)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Obviously,It is use local file system by default,I think I must do something in
nutch-site.xml or change this command,but how?I tried googling,but no
solutions,actually so few stuff about nutch are available,and there is a
tutorial in tutorial wiki,but it is long and bugging.
On Thursday, July 22, 2010 03:04:04 pm CatOs Mandros wrote:
> You should run $HADOOP_HOME/bin/start-all.sh
>
> Specifically, the files I have soft-linked are:
> $NUTCH_HOME/conf/core-site.xml -> $HADOOP_HOME/conf/core-site.xml
> $NUTCH_HOME/conf/hdfs-site.xml -> $HADOOP_HOME/conf/hdfs-site.xml
> $NUTCH_HOME/conf/mapred-site.xml -> $HADOOP_HOME/conf/mapred-site.xml
> $NUTCH_HOME/conf/masters -> $HADOOP_HOME/conf/masters
> $NUTCH_HOME/conf/slaves -> $HADOOP_HOME/conf/slaves
>
> I suppose not all the files are necessary, but its working for me :)
>
> On Wed, Jul 21, 2010 at 3:37 PM, Alex Luya <[email protected]> wrote:
> > CatOs Mandros:
> > if you do that,I must restart hadoop cluster,and which command will
> > be run, $HADOOP_HOME/bin/start-all.sh or $NUTCH_HOME/bin/start-all.sh?
> >
> > On Wednesday, July 21, 2010 01:54:06 pm CatOs Mandros wrote:
> >> I just soft-linked all the relevant configuration files from the nutch
> >> instalation to the hadoop ones, and now I can use the nutch script
> >> transparently.
> >>
> >> On Wed, Jul 21, 2010 at 3:55 AM, Brian Tingle <[email protected]>
wrote:
> >> > I wasted a lot of time trying to figure this out before I realized
> >> > there is an ant target where you can go 'ant job' and then you get a
> >> > file 'nutch.job' that you can move to the hadoop cluster and then you
> >> > can do something like 'hadoop -job nutch.job path.to.nutch.Class
> >> > blah blah blah
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: Alex Luya [mailto:[email protected]]
> >> > Sent: Tue 7/20/2010 6:09 PM
> >> > To: [email protected]
> >> > Subject: Hello,How can I just get nutch worked on this running hadoop
> >> > cluster without bunch of works of compile and configuration.
> >> >
> >> > Hello:
> >> > According to this
> >> > tutorial:http://wiki.apache.org/nutch/NutchHadoopTutorial,hadoop
> >> > already shipped with nutch,and user can just use it,but
> >> > I have already a hadoop cluster running now,How can I just get nutch
> >> > worked on this running hadoop cluster without bunch of works of
> >> > compile and configuration.
> >> > (I am using hadoop 0.20.2,and I want to use nutch v1.1,so few tutorial
> >> > and instruction are available on web )