Rushabh S Shah created MAPREDUCE-6996: -----------------------------------------
Summary: FileInputFormat#getBlockIndex should include file name in the exception. Key: MAPREDUCE-6996 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6996 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: Rushabh S Shah Priority: Minor {code:title=FileInputFormat..java|borderStyle=solid} // Some comments here protected int getBlockIndex(BlockLocation[] blkLocations, long offset) { { ... ... BlockLocation last = blkLocations[blkLocations.length -1]; long fileLength = last.getOffset() + last.getLength() -1; throw new IllegalArgumentException("Offset " + offset + " is outside of file (0.." + fileLength + ")"); } {code} When the file is open for writing, the {{last.getLength()}} and {{last.getOffset()}} will be zero and we see the following exception stack trace. {noformat} org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:288) Caused by: java.lang.IllegalArgumentException: Offset 0 is outside of file (0..-1) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getBlockIndex(FileInputFormat.java:453) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:413) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265) ... 18 more {noformat} Its difficult to debug which file was open. So creating this ticket to include the filename in the exception. Since {{FileInputFormat#getBlockIndex}} is protected, we can't change the signature of that method and add file name to arguments. The only way I can think to fix this is: {code:title=FileInputFormat..java|borderStyle=solid} public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException { { ... ... for (FileStatus file: files) { Path path = file.getPath(); long length = file.getLen(); if (length != 0) { FileSystem fs = path.getFileSystem(job); BlockLocation[] blkLocations; if (file instanceof LocatedFileStatus) { blkLocations = ((LocatedFileStatus) file).getBlockLocations(); } else { blkLocations = fs.getFileBlockLocations(file, 0, length); } if (isSplitable(fs, path)) { long blockSize = file.getBlockSize(); long splitSize = computeSplitSize(goalSize, minSize, blockSize); long bytesRemaining = length; while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) { String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations, length-bytesRemaining, splitSize, clusterMap); splits.add(makeSplit(path, length-bytesRemaining, splitSize, splitHosts[0], splitHosts[1])); bytesRemaining -= splitSize; } if (bytesRemaining != 0) { String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations, length - bytesRemaining, bytesRemaining, clusterMap); splits.add(makeSplit(path, length - bytesRemaining, bytesRemaining, splitHosts[0], splitHosts[1])); } } else { String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,0,length,clusterMap); splits.add(makeSplit(path, 0, length, splitHosts[0], splitHosts[1])); } } else { //Create empty hosts array for zero length files splits.add(makeSplit(path, 0, length, new String[0])); } } {code} Have a try-catch block around the above code chunk and catch {{IllegalArgumentException}} and check for message {{Offset 0 is outside of file (0..-1)}}. If yes, add the file name and rethrow {{IllegalArgumentException}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org