Rushabh S Shah created MAPREDUCE-6996:
-----------------------------------------

             Summary: FileInputFormat#getBlockIndex should include file name in 
the exception.
                 Key: MAPREDUCE-6996
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6996
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Rushabh S Shah
            Priority: Minor


{code:title=FileInputFormat..java|borderStyle=solid}
// Some comments here
 protected int getBlockIndex(BlockLocation[] blkLocations, 
                              long offset) {
{
...
...
BlockLocation last = blkLocations[blkLocations.length -1];
    long fileLength = last.getOffset() + last.getLength() -1;
    throw new IllegalArgumentException("Offset " + offset + 
                                       " is outside of file (0.." +
                                       fileLength + ")");
}
{code}
When the file is open for writing, the {{last.getLength()}} and 
{{last.getOffset()}} will be zero and we see the following exception stack 
trace.
{noformat}
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:288)
Caused by: java.lang.IllegalArgumentException: Offset 0 is outside of file 
(0..-1)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getBlockIndex(FileInputFormat.java:453)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:413)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
... 18 more
{noformat}
Its difficult to debug which file was open.
So creating this ticket to include the filename in the exception.
Since {{FileInputFormat#getBlockIndex}} is protected, we can't change the 
signature of that method and add file name to arguments.
The only way I can think to fix this is: 
{code:title=FileInputFormat..java|borderStyle=solid}
 public InputSplit[] getSplits(JobConf job, int numSplits)
    throws IOException {
{
...
...
   for (FileStatus file: files) {
      Path path = file.getPath();
      long length = file.getLen();
      if (length != 0) {
        FileSystem fs = path.getFileSystem(job);
        BlockLocation[] blkLocations;
        if (file instanceof LocatedFileStatus) {
          blkLocations = ((LocatedFileStatus) file).getBlockLocations();
        } else {
          blkLocations = fs.getFileBlockLocations(file, 0, length);
        }
        if (isSplitable(fs, path)) {
          long blockSize = file.getBlockSize();
          long splitSize = computeSplitSize(goalSize, minSize, blockSize);

          long bytesRemaining = length;
          while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {
            String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,
                length-bytesRemaining, splitSize, clusterMap);
            splits.add(makeSplit(path, length-bytesRemaining, splitSize,
                splitHosts[0], splitHosts[1]));
            bytesRemaining -= splitSize;
          }

          if (bytesRemaining != 0) {
            String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations, 
length
                - bytesRemaining, bytesRemaining, clusterMap);
            splits.add(makeSplit(path, length - bytesRemaining, bytesRemaining,
                splitHosts[0], splitHosts[1]));
          }
        } else {
          String[][] splitHosts = 
getSplitHostsAndCachedHosts(blkLocations,0,length,clusterMap);
          splits.add(makeSplit(path, 0, length, splitHosts[0], splitHosts[1]));
        }
      } else { 
        //Create empty hosts array for zero length files
        splits.add(makeSplit(path, 0, length, new String[0]));
      }
    }
{code}
Have a try-catch block around the above code chunk and catch 
{{IllegalArgumentException}} and check for message {{Offset 0 is outside of 
file (0..-1)}}.
If yes, add the file name and rethrow {{IllegalArgumentException}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to