OS: Linux Nutch Version: 9.0 Running Nutch from Eclipse Running org.apache.nutch.fetcher.Fetcher2 with Options crawl/segments/20070801150452 -threads 10 and VMargs -Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log -Xmx1024M
Nutch stops to crawl after a few hours of succesfully crawling. --schnipp--- 2007-08-02 04:04:46,937 WARN fs.FileSystem - Moving bad file /tmp/hadoop-eb/mapred/local/reduce_areo58/map_0.out to /tmp/bad_files/map_0.out.-783779377 2007-08-02 04:04:46,940 INFO fs.FSInputChecker - Found checksum error: org.apache.hadoop.fs.ChecksumException: Checksum error: /tmp/hadoop-eb/mapred/local/reduce_areo58/map_0.out at 212930560 at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.verifySum(ChecksumFileSystem.java:254) at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:211) at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167) at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41) at java.io.BufferedInputStream.read1(BufferedInputStream.java:254) at java.io.BufferedInputStream.read(BufferedInputStream.java:313) at java.io.DataInputStream.readFully(DataInputStream.java:176) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(SequenceFile.java:427) at org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700(SequenceFile.java:414) at org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java:1665) at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(SequenceFile.java:2579) at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2351) at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:180) at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:149) at org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:41) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155) 2007-08-02 04:04:46,940 WARN mapred.LocalJobRunner - job_gk05gy java.lang.NullPointerException at org.apache.hadoop.fs.FSDataInputStream$Buffer.seek(FSDataInputStream.java:74) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:121) at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.readBuffer(ChecksumFileSystem.java:221) at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.read(ChecksumFileSystem.java:167) at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41) at java.io.BufferedInputStream.read1(BufferedInputStream.java:254) at java.io.BufferedInputStream.read(BufferedInputStream.java:313) at java.io.DataInputStream.readFully(DataInputStream.java:176) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(SequenceFile.java:427) at org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$700(SequenceFile.java:414) at org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java:1665) at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(SequenceFile.java:2579) at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2351) at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:180) at org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:149) at org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:41) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155) --schnapp--- The amount of crawled sites is allways different, sometimes 10.000, sometimes 40.000. any Idea? Thanks, Eric ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers