Hi,
I found a workaround to this problem. I was able to run the fetcher with
the nutch*.job command using the latest working nighly build from
12-28-2008.
Shirley
Shirley Cohen wrote:
Hi,
I'm new to nutch and am trying to run it on an existing hadoop 0.19.0
install. I'm using the command "hadoop jar
nutch-2008-12-02_04-01-57.job", as suggested by Dennis Kubes in an
earlier post. I've been able to crawl and generate segments
successfully using the following commands:
hadoop dfs -put dmoz dmoz
bin/hadoop jar nutch-2008-12-02_04-01-57.job
org.apache.nutch.crawl.Injector crawl/crawldb dmoz
bin/hadoop jar nutch-2008-12-02_04-01-57.job
org.apache.nutch.crawl.Generator crawl/crawldb crawl/segments
However, when I try to run the fetcher using the command:
bin/hadoop jar nutch-2008-12-02_04-01-57.job
org.apache.nutch.fetcher.Fetcher crawl/segments/20090104094558
I get the following error:
09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: starting
09/01/04 10:20:31 INFO fetcher.Fetcher: Fetcher: segment:
crawl/segments/20090104094558
****calling init JobTracker*****
java.lang.NoSuchMethodError:
org.apache.nutch.fetcher.Fetcher$InputFormat.listPaths(Lorg/apache/hadoop/mapred/JobConf;)[Lorg/apache/hadoop/fs/Path;
at
org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:61)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:783)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1128)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:530)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:565)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:537)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Note: The subdirectory "20090104094558" was created by the generator.
I'm running the 0.9 release of nutch downloaded from:
http://mirrors.24-7-solutions.net/pub/apache/lucene/nutch/
Does anyone know what is going on?
Thanks in advance,
Shirley