Hi,

There are multiple issues currently when dealing with mysql as backend for
Nutch 2.x series.

Also, its not recommended to use the 'crawl' command anymore.

Please check here (https://issues.apache.org/jira/browse/NUTCH-1087).

Best,
Kiran.

On Wed, Dec 12, 2012 at 9:47 AM, 高睿 <[email protected]> wrote:

> Hi,
>
> I found an exception when I running nutch 2.1 with mysql. The command line
> is: bin/nutch crawl urls -depth 1 -topN 5
> Here's the reproduce steps for the issue:
> 1. start nutch
> 2. stop it during it executing
> 3. start nutch again
> The problem can be recovered by clean up the table 'webpage'.
>
> ========================= Error in the console
> =====================================
> Skipping
> http://blog.foofactory.fi/2007/03/perfomance-history-for-nutch.html;
> different batch id (null)
> Exception in thread "main" java.lang.RuntimeException: job failed:
> name=parse, jobid=job_local_0004
>         at
> org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
>         at org.apache.nutch.parse.ParserJob.run(ParserJob.java:251)
>         at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
>         at org.apache.nutch.crawl.Crawler.run(Crawler.java:171)
>         at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
>
> ========================= Error in the logs/hadoop.log
> =====================================
> 2012-12-12 22:26:33,379 INFO  parse.ParserJob - Skipping
> http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html;
> different batch id (null)
> 2012-12-12 22:26:33,379 INFO  parse.ParserJob - Skipping
> http://blog.foofactory.fi/2007/03/perfomance-history-for-nutch.html;
> different batch id (null)
> 2012-12-12 22:26:33,380 WARN  mapred.FileOutputCommitter - Output path is
> null in cleanup
> 2012-12-12 22:26:33,381 WARN  mapred.LocalJobRunner - job_local_0004
> java.io.IOException: java.io.EOFException
>         at org.apache.gora.sql.query.SqlResult.nextInner(SqlResult.java:58)
>         at org.apache.gora.query.impl.ResultBase.next(ResultBase.java:112)
>         at
> org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:111)
>         at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
>         at
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Caused by: java.io.EOFException
>         at
> org.apache.avro.io.BinaryDecoder$InputStreamByteSource.readRaw(BinaryDecoder.java:818)
>         at
> org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:340)
>         at
> org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:265)
>         at
> org.apache.gora.mapreduce.FakeResolvingDecoder.readString(FakeResolvingDecoder.java:131)
>         at
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:280)
>         at
> org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:191)
>         at
> org.apache.gora.avro.PersistentDatumReader.readMap(PersistentDatumReader.java:182)
>         at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:83)
>         at
> org.apache.gora.avro.PersistentDatumReader.read(PersistentDatumReader.java:102)
>         at org.apache.gora.util.IOUtils.deserialize(IOUtils.java:259)
>         at org.apache.gora.sql.store.SqlStore.readField(SqlStore.java:565)
>         at org.apache.gora.sql.store.SqlStore.readObject(SqlStore.java:486)
>         at org.apache.gora.sql.query.SqlResult.nextInner(SqlResult.java:54)
>         ... 8 more
>
> Thanks.
>
> Regards,
> Rui
>



-- 
Kiran Chitturi

Reply via email to