shanyu zhao created YARN-1219: --------------------------------- Summary: FSDownload changes file suffix making FileUtil.unTar() throw exception Key: YARN-1219 URL: https://issues.apache.org/jira/browse/YARN-1219 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: shanyu zhao
While running a Hive join operation on Yarn, I saw exception as described below. This is caused by FSDownload copy the files into a temp file and change the suffix into ".tmp" before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file is "gzipped" by looking at the file suffix: {code} boolean gzipped = inFile.toString().endsWith("gz"); {code} To fix this problem, we can remove the ".tmp" in the temp file name. Here is the detailed exception: org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240) at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676) at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625) at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira