Hi, I found an exception when I running nutch 2.1 with mysql. The command line is: bin/nutch crawl urls -depth 1 -topN 5 Here's the reproduce steps for the issue: 1. start nutch 2. stop it during it executing 3. start nutch again The problem can be recovered by clean up the table 'webpage'.
========================= Error in the console ===================================== Skipping http://blog.foofactory.fi/2007/03/perfomance-history-for-nutch.html; different batch id (null) Exception in thread "main" java.lang.RuntimeException: job failed: name=parse, jobid=job_local_0004 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54) at org.apache.nutch.parse.ParserJob.run(ParserJob.java:251) at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) at org.apache.nutch.crawl.Crawler.run(Crawler.java:171) at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawler.main(Crawler.java:257) ========================= Error in the logs/hadoop.log ===================================== 2012-12-12 22:26:33,379 INFO parse.ParserJob - Skipping http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html; different batch id (null) 2012-12-12 22:26:33,379 INFO parse.ParserJob - Skipping http://blog.foofactory.fi/2007/03/perfomance-history-for-nutch.html; different batch id (null) 2012-12-12 22:26:33,380 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2012-12-12 22:26:33,381 WARN mapred.LocalJobRunner - job_local_0004 java.io.IOException: java.io.EOFException at org.apache.gora.sql.query.SqlResult.nextInner(SqlResult.java:58) at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:112) at org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:111) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) Caused by: java.io.EOFException at org.apache.avro.io.BinaryDecoder$InputStreamByteSource.readRaw(BinaryDecoder.java:818) at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:340) at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:265) at org.apache.gora.mapreduce.FakeResolvingDecoder.readString(FakeResolvingDecoder.java:131) at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:280) at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:191) at org.apache.gora.avro.PersistentDatumReader.readMap(PersistentDatumReader.java:182) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:83) at org.apache.gora.avro.PersistentDatumReader.read(PersistentDatumReader.java:102) at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:259) at org.apache.gora.sql.store.SqlStore.readField(SqlStore.java:565) at org.apache.gora.sql.store.SqlStore.readObject(SqlStore.java:486) at org.apache.gora.sql.query.SqlResult.nextInner(SqlResult.java:54) ... 8 more Thanks. Regards, Rui

