It looks to me like you have a mismatch in the version of hadoop you are
using with Nutch. Nutch trunk is on 0.19. You might want to try
building from SVN and then retrying.
Dennis
Shirley Cohen wrote:
Hi,
I found a workaround to this problem. I was able to run the fetcher with
the nutch*.job command using the latest working nighly build from
12-28-2008.
Shirley
Shirley Cohen wrote:
Hi,
I'm new to nutch and am trying to run it on an existing hadoop 0.19.0
install. I'm using the command "hadoop jar
nutch-2008-12-02_04-01-57.job", as suggested by Dennis Kubes in an
earlier post. I've been able to crawl and generate segments
successfully using the following commands:
hadoop dfs -put dmoz dmoz
bin/hadoop jar nutch-2008-12-02_04-01-57.job
org.apache.nutch.crawl.Injector crawl/crawldb dmoz
bin/hadoop jar nutch-2008-12-02_04-01-57.job
org.apache.nutch.crawl.Generator crawl/crawldb crawl/segments
However, when I try to run the fetcher using the command:
bin/hadoop jar nutch-2008-12-02_04-01-57.job
org.apache.nutch.fetcher.Fetcher crawl/segments/20090104094558
I get the following error:
09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: starting
09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: segment:
crawl/segments/20090104094558
****calling init JobTracker*****
java.lang.NoSuchMethodError:
org.apache.nutch.fetcher.Fetcher$InputFormat.listPaths(Lorg/apache/hadoop/mapred/JobConf;)[Lorg/apache/hadoop/fs/Path;
at
org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:61)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:783)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1128)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:530)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:565)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:537)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Note: The subdirectory "20090104094558" was created by the generator.
I'm running the 0.9 release of nutch downloaded from:
http://mirrors.24-7-solutions.net/pub/apache/lucene/nutch/
Does anyone know what is going on?
Thanks in advance,
Shirley