Hi, There are multiple issues currently when dealing with mysql as backend for Nutch 2.x series.
Also, its not recommended to use the 'crawl' command anymore. Please check here (https://issues.apache.org/jira/browse/NUTCH-1087). Best, Kiran. On Wed, Dec 12, 2012 at 9:47 AM, 高睿 <[email protected]> wrote: > Hi, > > I found an exception when I running nutch 2.1 with mysql. The command line > is: bin/nutch crawl urls -depth 1 -topN 5 > Here's the reproduce steps for the issue: > 1. start nutch > 2. stop it during it executing > 3. start nutch again > The problem can be recovered by clean up the table 'webpage'. > > ========================= Error in the console > ===================================== > Skipping > http://blog.foofactory.fi/2007/03/perfomance-history-for-nutch.html; > different batch id (null) > Exception in thread "main" java.lang.RuntimeException: job failed: > name=parse, jobid=job_local_0004 > at > org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54) > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:251) > at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68) > at org.apache.nutch.crawl.Crawler.run(Crawler.java:171) > at org.apache.nutch.crawl.Crawler.run(Crawler.java:250) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.Crawler.main(Crawler.java:257) > > ========================= Error in the logs/hadoop.log > ===================================== > 2012-12-12 22:26:33,379 INFO parse.ParserJob - Skipping > http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html; > different batch id (null) > 2012-12-12 22:26:33,379 INFO parse.ParserJob - Skipping > http://blog.foofactory.fi/2007/03/perfomance-history-for-nutch.html; > different batch id (null) > 2012-12-12 22:26:33,380 WARN mapred.FileOutputCommitter - Output path is > null in cleanup > 2012-12-12 22:26:33,381 WARN mapred.LocalJobRunner - job_local_0004 > java.io.IOException: java.io.EOFException > at org.apache.gora.sql.query.SqlResult.nextInner(SqlResult.java:58) > at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:112) > at > org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:111) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > Caused by: java.io.EOFException > at > org.apache.avro.io.BinaryDecoder$InputStreamByteSource.readRaw(BinaryDecoder.java:818) > at > org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:340) > at > org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:265) > at > org.apache.gora.mapreduce.FakeResolvingDecoder.readString(FakeResolvingDecoder.java:131) > at > org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:280) > at > org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:191) > at > org.apache.gora.avro.PersistentDatumReader.readMap(PersistentDatumReader.java:182) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:83) > at > org.apache.gora.avro.PersistentDatumReader.read(PersistentDatumReader.java:102) > at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:259) > at org.apache.gora.sql.store.SqlStore.readField(SqlStore.java:565) > at org.apache.gora.sql.store.SqlStore.readObject(SqlStore.java:486) > at org.apache.gora.sql.query.SqlResult.nextInner(SqlResult.java:54) > ... 8 more > > Thanks. > > Regards, > Rui > -- Kiran Chitturi

