Jens Rabe created MAPREDUCE-6155:
------------------------------------

             Summary: MapFiles are not always correctly detected by 
SequenceFileInputFormat
                 Key: MAPREDUCE-6155
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6155
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.5.1
            Reporter: Jens Rabe


MapFiles are not correctly detected by SequenceFileInputFormat.

This is because the listStatus method only detects a MapFile correctly if the 
path it checks is a directory - it then replaces it by the path of the data 
file.

This is likely to fail if the data file does not exist, i.e., if the input path 
is a directory, but does not belong to a MapFile, or if recursion is turned on 
and the input format comes across a file (not a directory) which is indeed part 
of a MapFile.

The listStatus method should be changed to detect these cases correctly:
* if the current candidate is a file and its name is "index" or "data", check 
if its corresponding other file exists, and if the key types of both files 
match and if the value type of the index file is LongWritable
* If the current candidate is a directory, it is only a MapFile if (and only 
if) an index and a data file exist, they are both SequenceFiles and their key 
types match (and the index value type is LongWritable)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to