> Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: > file:/home/hduser/nutch-1.8/runtime/local/TestCrawl/segments/20140504110143/crawl_parse/part-00000
It may be caused by a broken disk or memory. Sebastian On 05/04/2014 01:46 PM, BlackIce wrote: > I get this error now whendoing crawls at 120k each run: > > > 2014-05-04 11:56:44,549 INFO crawl.CrawlDb - CrawlDb update: starting at > 2014-05-04 11:56:44 > 2014-05-04 11:56:44,549 INFO crawl.CrawlDb - CrawlDb update: db: > TestCrawl/crawldb > 2014-05-04 11:56:44,549 INFO crawl.CrawlDb - CrawlDb update: segments: > [TestCrawl/segments/20140504110143] > 2014-05-04 11:56:44,550 INFO crawl.CrawlDb - CrawlDb update: additions > allowed: true > 2014-05-04 11:56:44,550 INFO crawl.CrawlDb - CrawlDb update: URL > normalizing: false > 2014-05-04 11:56:44,550 INFO crawl.CrawlDb - CrawlDb update: URL > filtering: false > 2014-05-04 11:56:44,550 INFO crawl.CrawlDb - CrawlDb update: 404 purging: > false > 2014-05-04 11:56:44,550 INFO crawl.CrawlDb - CrawlDb update: Merging > segment data into db. > 2014-05-04 11:57:49,615 ERROR mapred.MapTask - IO error in map input file > file:/home/hduser/nutch-1.8/runtime/local/TestCrawl/segments/20140504110143/crawl_parse/part-00000 > 2014-05-04 11:58:36,732 WARN mapred.LocalJobRunner - > job_local385844795_0001 > java.lang.Exception: java.io.IOException: IO error in map input file > file:/home/hduser/nutch-1.8/runtime/local/TestCrawl/segments/20140504110143/crawl_parse/part-00000 > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) > Caused by: java.io.IOException: IO error in map input file > file:/home/hduser/nutch-1.8/runtime/local/TestCrawl/segments/20140504110143/crawl_parse/part-00000 > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: > file:/home/hduser/nutch-1.8/runtime/local/TestCrawl/segments/20140504110143/crawl_parse/part-00000 > at 55756800 > at > org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277) > at > org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241) > at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176) > at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193) > at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) > at > org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1992) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2124) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230) > ... 10 more > 2014-05-04 11:58:36,797 ERROR crawl.CrawlDb - CrawlDb update: > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:105) > at org.apache.nutch.crawl.CrawlDb.run(CrawlDb.java:207) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.CrawlDb.main(CrawlDb.java:166) >

