SegmentMerger bug ----------------- Key: NUTCH-814 URL: https://issues.apache.org/jira/browse/NUTCH-814 Project: Nutch Issue Type: Bug Affects Versions: 1.1 Reporter: Dennis Kubes Assignee: Andrzej Bialecki Fix For: 1.1
Dennis reported: {quote} In the SegmentMerger.java file about line 150 we have this: final SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(job), fSplit.getPath(), job); Then about line 166 in the record reader we have this: boolean res = reader.next(key, w); If I am reading that right, that would mean that the map tap would loop over all records for a given file and not just a given split. {quote} Right, this should instead use SequenceFileRecordReader that already has the logic to handle splits. Patch coming shortly - thanks for spotting this! This could be the reason for "out of disk space" errors that many users reported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.