Hello, I'm trying to upgrade from Nutch 0.9 to Nutch 1.0 and I've solved all of the issues that I seem be having, except for one.
When I run a web crawl, everything fetches fine until it gets to dedup, in which case, I get this stack trace: 2010-02-25 14:31:46,592 WARN mapred.LocalJobRunner - job_local_0001 java.lang.NullPointerException at org.apache.hadoop.io.Text. encode(Text.java:388) at org.apache.hadoop.io.Text.set(Text.java:178) at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:191) at org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:157) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) 2010-02-25 14:31:47,328 FATAL indexer.DeleteDuplicates - DeleteDuplicates: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1250) at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:448) at org.apache.nutch.indexer.DeleteDuplicates.run(DeleteDuplicates.java:515) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java:499) I'm running (I can't upgrade to 1.6) on a 1.5 JVM. I've tried with a version of hadoop that's old enough to run on 1.5 (0.18.3) and with a version of hadoop (0.20.2) that a co-worker modified to build and run on 1.5, but is it possible that I can't upgrade until I can upgrade my JVM? Maybe it's something else? If there's any more information you need, let me know, thanks! Thanks, Eddie PS. Sorry if this gets sent twice, I tried to send before I subscribed to this list.