SegmentMerger bug

                 Key: NUTCH-814
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 1.1
            Reporter: Dennis Kubes
            Assignee: Andrzej Bialecki 
             Fix For: 1.1

Dennis reported:

In the file about line 150 we have this:

       final SequenceFile.Reader reader =
         new SequenceFile.Reader(FileSystem.get(job), fSplit.getPath(),

Then about line 166 in the record reader we have this:

boolean res =, w);

If I am reading that right, that would mean that the map tap would loop
over all records for a given file and not just a given split.
Right, this should instead use SequenceFileRecordReader that already has the 
logic to handle splits. Patch coming shortly - thanks for spotting this! This 
could be the reason for "out of disk space" errors that many users reported.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to