Does io.skip.checksum.errors = true help?

On Monday 19 December 2011 10:17:54 Danicela nutch wrote:
> Hi,
> 
>  During segment updates, a persistent crawldb checksum error appeared :
> 
>  2011-12-18 02:06:30,703 WARN mapred.LocalJobRunner - job_local_0001
>  org.apache.hadoop.fs.ChecksumException: Checksum error:
> file:/home/nutch/nutch@beetween/runs/fr1/crawldb/current/part-00000/data
> at 1337333760
> 
>  Last time this problem occured, I removed both .crc in the crawldb and it
> worked.
> 
>  But now, removing the crcs brings another persistent error :
> 
>  2011-12-19 08:47:21,918 WARN mapred.LocalJobRunner - job_local_0001
>  A record version mismatch occured. Expecting v2, found v66
>  at
> org.apache.nutch.protocol.ProtocolStatus.readFields(ProtocolStatus.java:16
> 8) at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:167) at
> org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:272) at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer
> .deserialize(WritableSerialization.java:67) at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer
> .deserialize(WritableSerialization.java:40) at
> org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.jav
> a:1817) at
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java
> :1790) at
> org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(Sequence
> FileRecordReader.java:103) at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordR
> eader.java:78) at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.ja
> va:192) at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176
> ) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>  at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> 2011-12-19 08:47:22,861 FATAL crawl.CrawlDb - CrawlDb update:
> java.io.IOException: Job failed! at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at
> org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:94)
>  at org.apache.nutch.crawl.CrawlDb.run(CrawlDb.java:189)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:150)
> 
> What can I do ?
> 
>  Thanks.

-- 
Markus Jelsma - CTO - Openindex

Reply via email to