I also experienced the same thing [checksum error] :( I couldn't avoid to delete segment and do refetch again...
Deleting .crc files, or other files inside segments didn't help much. Thanks.- On Tue, May 6, 2014 at 2:55 AM, Sebastian Nagel <[email protected]>wrote: > > Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: > > > file:/home/hduser/nutch-1.8/runtime/local/TestCrawl/segments/20140504110143/crawl_parse/part-00000 > > It may be caused by a broken disk or memory. > > Sebastian > > On 05/04/2014 01:46 PM, BlackIce wrote: > > I get this error now whendoing crawls at 120k each run: > > > > > > 2014-05-04 11:56:44,549 INFO crawl.CrawlDb - CrawlDb update: starting at > > 2014-05-04 11:56:44 > > 2014-05-04 11:56:44,549 INFO crawl.CrawlDb - CrawlDb update: db: > > TestCrawl/crawldb > > 2014-05-04 11:56:44,549 INFO crawl.CrawlDb - CrawlDb update: segments: > > [TestCrawl/segments/20140504110143] > > 2014-05-04 11:56:44,550 INFO crawl.CrawlDb - CrawlDb update: additions > > allowed: true > > 2014-05-04 11:56:44,550 INFO crawl.CrawlDb - CrawlDb update: URL > > normalizing: false > > 2014-05-04 11:56:44,550 INFO crawl.CrawlDb - CrawlDb update: URL > > filtering: false > > 2014-05-04 11:56:44,550 INFO crawl.CrawlDb - CrawlDb update: 404 > purging: > > false > > 2014-05-04 11:56:44,550 INFO crawl.CrawlDb - CrawlDb update: Merging > > segment data into db. > > 2014-05-04 11:57:49,615 ERROR mapred.MapTask - IO error in map input file > > > file:/home/hduser/nutch-1.8/runtime/local/TestCrawl/segments/20140504110143/crawl_parse/part-00000 > > 2014-05-04 11:58:36,732 WARN mapred.LocalJobRunner - > > job_local385844795_0001 > > java.lang.Exception: java.io.IOException: IO error in map input file > > > file:/home/hduser/nutch-1.8/runtime/local/TestCrawl/segments/20140504110143/crawl_parse/part-00000 > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) > > Caused by: java.io.IOException: IO error in map input file > > > file:/home/hduser/nutch-1.8/runtime/local/TestCrawl/segments/20140504110143/crawl_parse/part-00000 > > at > > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236) > > at > > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) > > at > > > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: > > > file:/home/hduser/nutch-1.8/runtime/local/TestCrawl/segments/20140504110143/crawl_parse/part-00000 > > at 55756800 > > at > > org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277) > > at > > > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241) > > at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176) > > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193) > > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) > > at java.io.DataInputStream.readFully(DataInputStream.java:195) > > at > > > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) > > at > > org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) > > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1992) > > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2124) > > at > > > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76) > > at > > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230) > > ... 10 more > > 2014-05-04 11:58:36,797 ERROR crawl.CrawlDb - CrawlDb update: > > java.io.IOException: Job failed! > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) > > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:105) > > at org.apache.nutch.crawl.CrawlDb.run(CrawlDb.java:207) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:166) > > > > -- wassalam, [bayu]

