Hi,

 During segment updates, a persistent crawldb checksum error appeared :

 2011-12-18 02:06:30,703 WARN mapred.LocalJobRunner - job_local_0001
 org.apache.hadoop.fs.ChecksumException: Checksum error: 
file:/home/nutch/nutch@beetween/runs/fr1/crawldb/current/part-00000/data at 
1337333760

 Last time this problem occured, I removed both .crc in the crawldb and it 
worked.

 But now, removing the crcs brings another persistent error :

 2011-12-19 08:47:21,918 WARN mapred.LocalJobRunner - job_local_0001
 A record version mismatch occured. Expecting v2, found v66
 at org.apache.nutch.protocol.ProtocolStatus.readFields(ProtocolStatus.java:168)
 at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:167)
 at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:272)
 at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
 at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
 at 
org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1817)
 at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1790)
 at 
org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:103)
 at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:78)
 at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
 at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
 2011-12-19 08:47:22,861 FATAL crawl.CrawlDb - CrawlDb update: 
java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:94)
 at org.apache.nutch.crawl.CrawlDb.run(CrawlDb.java:189)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:150)

What can I do ?

 Thanks.

Reply via email to