[jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry
[ https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054562#comment-14054562 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5956: Here's what I think are potential solutions and their problems # YARN informs AM that it is the last retry as part of AM start-up or the register API # YARN informs the AM that this is the last retry as part of AM unregister # YARN has a way to run a separate cleanup container after it knows for sure that the application finished exhausting all its attempts (1) is not really possible. At best, RM can say that this 'mayBeTheLastAttempt'. So AM cannot really assume that this is the last retry and so cannot do stuff like cleaning the staging directory. (2) is fine enough for successful code-path. In fact, we already have a way of telling the AM that unregister succeeded and that this indeed is the last retry. We don't need a new API. If RM crashed/failed-over before that, app will have a new retry anyways. Downside of this approach is that, there are so many cases where app's last retry may have crashed (say OOM) and so doesn't cleanup stale files. In fact, any solution that relies on such RM-AM communication will not really solve those corner cases. (3) is an acknowledgement of the fact that a solution to the problem of cleanup of stale-files is not possible without explicit help from RM. The more I think, the more it appears to me that this is the right solution. Filing a ticket, but this will take a while and so we may have to just do (2) for the time being.. MapReduce AM should not use maxAttempts to determine if this is the last retry -- Key: MAPREDUCE-5956 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster, mrv2 Reporter: Vinod Kumar Vavilapalli Assignee: Wangda Tan Priority: Blocker Found this while reviewing YARN-2074. The problem is that after YARN-2074, we don't count AM preemption towards AM failures on RM side, but MapReduce AM itself checks the attempt id against the max-attempt count to determine if this is the last attempt. {code} public void computeIsLastAMRetry() { isLastAMRetry = appAttemptID.getAttemptId() = maxAppAttempts; } {code} This causes issues w.r.t deletion of staging directory etc.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry
[ https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054565#comment-14054565 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5956: bq. Filing a ticket, but this will take a while and so we may have to just do (2) for the time being.. Filed YARN-2261. MapReduce AM should not use maxAttempts to determine if this is the last retry -- Key: MAPREDUCE-5956 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster, mrv2 Reporter: Vinod Kumar Vavilapalli Assignee: Wangda Tan Priority: Blocker Found this while reviewing YARN-2074. The problem is that after YARN-2074, we don't count AM preemption towards AM failures on RM side, but MapReduce AM itself checks the attempt id against the max-attempt count to determine if this is the last attempt. {code} public void computeIsLastAMRetry() { isLastAMRetry = appAttemptID.getAttemptId() = maxAppAttempts; } {code} This causes issues w.r.t deletion of staging directory etc.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054848#comment-14054848 ] Hudson commented on MAPREDUCE-5517: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #607 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/607/]) MAPREDUCE-5517. Fixed MapReduce ApplicationMaster to not validate reduce side resource configuration for deciding uber-mode on map-only jobs. Contributed by Siqi Li. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608595) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb - Key: MAPREDUCE-5517 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Priority: Minor Fix For: 2.5.0 Attachments: MAPREDUCE_5517_v3.patch.txt, MAPREDUCE_5517_v4.patch.txt, MAPREDUCE_5517_v5.patch, MAPREDUCE_5517_v6.patch Since there is no reducer, the memory allocated to reducer is irrelevant to enable uber mode of a job -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5868) TestPipeApplication causing nightly build to fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054849#comment-14054849 ] Hudson commented on MAPREDUCE-5868: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #607 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/607/]) MAPREDUCE-5868. Fixed an issue with TestPipeApplication that was causing the nightly builds to fail. Contributed by Akira Ajisaka. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608579) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/pipes/TestPipeApplication.java TestPipeApplication causing nightly build to fail - Key: MAPREDUCE-5868 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5868 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk Reporter: Jason Lowe Assignee: Akira AJISAKA Fix For: 2.5.0 Attachments: MAPREDUCE-5868.2.patch, MAPREDUCE-5868.3.patch, MAPREDUCE-5868.4.patch, TestPipeApplication.stack, jstack.log, mapreduce-5868-v1.txt TestPipeApplication appears to be timing out which causes the nightly build to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5866) TestFixedLengthInputFormat fails in windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054850#comment-14054850 ] Hudson commented on MAPREDUCE-5866: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #607 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/607/]) MAPREDUCE-5866. TestFixedLengthInputFormat fails in windows. Contributed by Varun Vasudev. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608585) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFixedLengthInputFormat.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFixedLengthInputFormat.java TestFixedLengthInputFormat fails in windows --- Key: MAPREDUCE-5866 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5866 Project: Hadoop Map/Reduce Issue Type: Test Components: client, test Affects Versions: 3.0.0, 2.4.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 3.0.0, 2.6.0 Attachments: apache-mapreduce-5866.1.patch, apache-yarn-1992.0.patch org.apache.hadoop.mapred.TextFixedLengthInputFormat and org.apache.hadoop.mapreduce.lib.input.TestFixedLengthInputFormat tests fail in Windows -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5868) TestPipeApplication causing nightly build to fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054958#comment-14054958 ] Hudson commented on MAPREDUCE-5868: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1798 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1798/]) MAPREDUCE-5868. Fixed an issue with TestPipeApplication that was causing the nightly builds to fail. Contributed by Akira Ajisaka. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608579) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/pipes/TestPipeApplication.java TestPipeApplication causing nightly build to fail - Key: MAPREDUCE-5868 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5868 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk Reporter: Jason Lowe Assignee: Akira AJISAKA Fix For: 2.5.0 Attachments: MAPREDUCE-5868.2.patch, MAPREDUCE-5868.3.patch, MAPREDUCE-5868.4.patch, TestPipeApplication.stack, jstack.log, mapreduce-5868-v1.txt TestPipeApplication appears to be timing out which causes the nightly build to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5866) TestFixedLengthInputFormat fails in windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054959#comment-14054959 ] Hudson commented on MAPREDUCE-5866: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1798 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1798/]) MAPREDUCE-5866. TestFixedLengthInputFormat fails in windows. Contributed by Varun Vasudev. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608585) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFixedLengthInputFormat.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFixedLengthInputFormat.java TestFixedLengthInputFormat fails in windows --- Key: MAPREDUCE-5866 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5866 Project: Hadoop Map/Reduce Issue Type: Test Components: client, test Affects Versions: 3.0.0, 2.4.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 3.0.0, 2.6.0 Attachments: apache-mapreduce-5866.1.patch, apache-yarn-1992.0.patch org.apache.hadoop.mapred.TextFixedLengthInputFormat and org.apache.hadoop.mapreduce.lib.input.TestFixedLengthInputFormat tests fail in Windows -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054957#comment-14054957 ] Hudson commented on MAPREDUCE-5517: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1798 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1798/]) MAPREDUCE-5517. Fixed MapReduce ApplicationMaster to not validate reduce side resource configuration for deciding uber-mode on map-only jobs. Contributed by Siqi Li. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608595) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb - Key: MAPREDUCE-5517 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Priority: Minor Fix For: 2.5.0 Attachments: MAPREDUCE_5517_v3.patch.txt, MAPREDUCE_5517_v4.patch.txt, MAPREDUCE_5517_v5.patch, MAPREDUCE_5517_v6.patch Since there is no reducer, the memory allocated to reducer is irrelevant to enable uber mode of a job -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5866) TestFixedLengthInputFormat fails in windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055019#comment-14055019 ] Hudson commented on MAPREDUCE-5866: --- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1825 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1825/]) MAPREDUCE-5866. TestFixedLengthInputFormat fails in windows. Contributed by Varun Vasudev. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608585) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFixedLengthInputFormat.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFixedLengthInputFormat.java TestFixedLengthInputFormat fails in windows --- Key: MAPREDUCE-5866 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5866 Project: Hadoop Map/Reduce Issue Type: Test Components: client, test Affects Versions: 3.0.0, 2.4.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 3.0.0, 2.6.0 Attachments: apache-mapreduce-5866.1.patch, apache-yarn-1992.0.patch org.apache.hadoop.mapred.TextFixedLengthInputFormat and org.apache.hadoop.mapreduce.lib.input.TestFixedLengthInputFormat tests fail in Windows -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055017#comment-14055017 ] Hudson commented on MAPREDUCE-5517: --- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1825 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1825/]) MAPREDUCE-5517. Fixed MapReduce ApplicationMaster to not validate reduce side resource configuration for deciding uber-mode on map-only jobs. Contributed by Siqi Li. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608595) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb - Key: MAPREDUCE-5517 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Priority: Minor Fix For: 2.5.0 Attachments: MAPREDUCE_5517_v3.patch.txt, MAPREDUCE_5517_v4.patch.txt, MAPREDUCE_5517_v5.patch, MAPREDUCE_5517_v6.patch Since there is no reducer, the memory allocated to reducer is irrelevant to enable uber mode of a job -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5868) TestPipeApplication causing nightly build to fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055018#comment-14055018 ] Hudson commented on MAPREDUCE-5868: --- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1825 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1825/]) MAPREDUCE-5868. Fixed an issue with TestPipeApplication that was causing the nightly builds to fail. Contributed by Akira Ajisaka. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608579) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/pipes/TestPipeApplication.java TestPipeApplication causing nightly build to fail - Key: MAPREDUCE-5868 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5868 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk Reporter: Jason Lowe Assignee: Akira AJISAKA Fix For: 2.5.0 Attachments: MAPREDUCE-5868.2.patch, MAPREDUCE-5868.3.patch, MAPREDUCE-5868.4.patch, TestPipeApplication.stack, jstack.log, mapreduce-5868-v1.txt TestPipeApplication appears to be timing out which causes the nightly build to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5885) build/test/test.mapred.spill causes release audit warnings
[ https://issues.apache.org/jira/browse/MAPREDUCE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated MAPREDUCE-5885: --- Attachment: MAPREDUCE-5885.patch retrigger QA build/test/test.mapred.spill causes release audit warnings -- Key: MAPREDUCE-5885 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5885 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk Reporter: Jason Lowe Assignee: Chen He Attachments: MAPREDUCE-5885.patch, MAPREDUCE-5885.patch, MAPREDUCE-5885.patch Multiple unit tests are creating files under hadoop-mapreduce-client-jobclient/build/test/test.mapred.spill which are causing release audit warnings during Jenkins patch precommit builds. In addition to being in a poor location for test output and not cleaning up after the test, there are multiple tests using this location which will cause conflicts if tests are run in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MAPREDUCE-5962) Support CRC32C in IFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Thomas reassigned MAPREDUCE-5962: --- Assignee: James Thomas Support CRC32C in IFile --- Key: MAPREDUCE-5962 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5962 Project: Hadoop Map/Reduce Issue Type: Improvement Components: performance, task Affects Versions: 2.5.0 Reporter: Todd Lipcon Assignee: James Thomas Currently, the IFile format used by the MR shuffle checksums all data using the zlib CRC32 polynomial. If we allow use of CRC32C instead, we can get a large reduction in CPU usage by leveraging the native hardware CRC32C implementation (approx half a second of CPU time savings per GB checksummed). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5885) build/test/test.mapred.spill causes release audit warnings
[ https://issues.apache.org/jira/browse/MAPREDUCE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055249#comment-14055249 ] Hadoop QA commented on MAPREDUCE-5885: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654608/MAPREDUCE-5885.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4721//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4721//console This message is automatically generated. build/test/test.mapred.spill causes release audit warnings -- Key: MAPREDUCE-5885 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5885 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk Reporter: Jason Lowe Assignee: Chen He Attachments: MAPREDUCE-5885.patch, MAPREDUCE-5885.patch, MAPREDUCE-5885.patch Multiple unit tests are creating files under hadoop-mapreduce-client-jobclient/build/test/test.mapred.spill which are causing release audit warnings during Jenkins patch precommit builds. In addition to being in a poor location for test output and not cleaning up after the test, there are multiple tests using this location which will cause conflicts if tests are run in parallel. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry
[ https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055396#comment-14055396 ] Hitesh Shah commented on MAPREDUCE-5956: [~vinodkv] By definition, if an AM calls unregister, it is telling the RM that this is my last attempt and the app should not be retried. Are now you saying that all attempts should now call unregisterAttempt() which will tell the app whether it is the final attempt and should call a final unregister()? If not, I think something else is needed as an AM will only call unregister() on an error if it thinks it is the last attempt. MapReduce AM should not use maxAttempts to determine if this is the last retry -- Key: MAPREDUCE-5956 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster, mrv2 Reporter: Vinod Kumar Vavilapalli Assignee: Wangda Tan Priority: Blocker Found this while reviewing YARN-2074. The problem is that after YARN-2074, we don't count AM preemption towards AM failures on RM side, but MapReduce AM itself checks the attempt id against the max-attempt count to determine if this is the last attempt. {code} public void computeIsLastAMRetry() { isLastAMRetry = appAttemptID.getAttemptId() = maxAppAttempts; } {code} This causes issues w.r.t deletion of staging directory etc.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry
[ https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055422#comment-14055422 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5956: (AMRMClient|AMRMClientAsync).unregisterApplicationMaster() is a blocking call. If any attempt calls this API, and it succeeds, this AM is the last retry - the AM can go ahead and do its cleanup. All other attempts (which either don't call this API or which failed before the API returned) do not need to do any cleanup - of course there are corner cases where this is not sufficient. For that and all the failing cases, the only comprehensive solution I can think of is YARN-2261. MapReduce AM should not use maxAttempts to determine if this is the last retry -- Key: MAPREDUCE-5956 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster, mrv2 Reporter: Vinod Kumar Vavilapalli Assignee: Wangda Tan Priority: Blocker Found this while reviewing YARN-2074. The problem is that after YARN-2074, we don't count AM preemption towards AM failures on RM side, but MapReduce AM itself checks the attempt id against the max-attempt count to determine if this is the last attempt. {code} public void computeIsLastAMRetry() { isLastAMRetry = appAttemptID.getAttemptId() = maxAppAttempts; } {code} This causes issues w.r.t deletion of staging directory etc.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gian Merlino updated MAPREDUCE-2094: Attachment: MAPREDUCE-2094-FileInputFormat-docs.patch We just hit this bug too. I'd really prefer a safer default (like return false or something like Niels's patch) but if that is not doable, better docs would have helped. The current docs imply that the default implementation handles splittable vs non-splittable files, even though it doesn't. I've attached some wording that I think is more clear. org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. --- Key: MAPREDUCE-2094 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Niels Basjes Assignee: Niels Basjes Attachments: MAPREDUCE-2094-2011-05-19.patch, MAPREDUCE-2094-FileInputFormat-docs.patch When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 36 times in its entirety. Thus producing garbage results and taking up a lot more CPU time than needed. It took a while to figure out and what we found is that the default implementation of the isSplittable method in [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup ] is simply return true;. This is a very unsafe default and is in contradiction with the JavaDoc of the method which states: Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. . The actual implementation effectively does Is the given filename splitable? Always true, even if the file is stream compressed using an unsplittable compression codec. For our situation (where we always have Gzipped input) we took the easy way out and simply implemented an isSplittable in our class that does return false; Now there are essentially 3 ways I can think of for fixing this (in order of what I would find preferable): # Implement something that looks at the used compression of the file (i.e. do migrate the implementation from TextInputFormat to FileInputFormat). This would make the method do what the JavaDoc describes. # Force developers to think about it and make this method abstract. # Use a safe default (i.e. return false) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gian Merlino updated MAPREDUCE-2094: Attachment: (was: MAPREDUCE-2094-FileInputFormat-docs.patch) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. --- Key: MAPREDUCE-2094 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Niels Basjes Assignee: Niels Basjes Attachments: MAPREDUCE-2094-2011-05-19.patch When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 36 times in its entirety. Thus producing garbage results and taking up a lot more CPU time than needed. It took a while to figure out and what we found is that the default implementation of the isSplittable method in [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup ] is simply return true;. This is a very unsafe default and is in contradiction with the JavaDoc of the method which states: Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. . The actual implementation effectively does Is the given filename splitable? Always true, even if the file is stream compressed using an unsplittable compression codec. For our situation (where we always have Gzipped input) we took the easy way out and simply implemented an isSplittable in our class that does return false; Now there are essentially 3 ways I can think of for fixing this (in order of what I would find preferable): # Implement something that looks at the used compression of the file (i.e. do migrate the implementation from TextInputFormat to FileInputFormat). This would make the method do what the JavaDoc describes. # Force developers to think about it and make this method abstract. # Use a safe default (i.e. return false) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055613#comment-14055613 ] Chris Douglas commented on MAPREDUCE-5890: -- Yes, I'm OK with the current patch. This approach won't scale to another feature, but it can be preserved in a refactoring. My only remaining ask (fine to add during commit) is that {{CryptoUtils}} be annotated with {{@Private}} and {{@Unstable}}, so it's clearly marked as an implementation detail. If it could be package-private that would be even better, though I haven't checked to see if there's anything else in the {{o.a.h.mapreduce.task.crypto}} package. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gian Merlino updated MAPREDUCE-2094: Attachment: MAPREDUCE-2094-FileInputFormat-docs-v2.patch I just attached a different patch that also adjusts the class-level javadocs in addition to the methods. org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. --- Key: MAPREDUCE-2094 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Niels Basjes Assignee: Niels Basjes Attachments: MAPREDUCE-2094-2011-05-19.patch, MAPREDUCE-2094-FileInputFormat-docs-v2.patch When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 36 times in its entirety. Thus producing garbage results and taking up a lot more CPU time than needed. It took a while to figure out and what we found is that the default implementation of the isSplittable method in [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup ] is simply return true;. This is a very unsafe default and is in contradiction with the JavaDoc of the method which states: Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. . The actual implementation effectively does Is the given filename splitable? Always true, even if the file is stream compressed using an unsplittable compression codec. For our situation (where we always have Gzipped input) we took the easy way out and simply implemented an isSplittable in our class that does return false; Now there are essentially 3 ways I can think of for fixing this (in order of what I would find preferable): # Implement something that looks at the used compression of the file (i.e. do migrate the implementation from TextInputFormat to FileInputFormat). This would make the method do what the JavaDoc describes. # Force developers to think about it and make this method abstract. # Use a safe default (i.e. return false) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated MAPREDUCE-5890: --- Attachment: MAPREDUCE-5890.13.patch Uploaded updated patch.. Thanks [~chris.douglas] for all the feedback !! I've maked the class {{Private}} and {{Unstable}} but can't make the class itself package protected since it exposes public static methods used in a number of places.. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055718#comment-14055718 ] Chris Douglas commented on MAPREDUCE-5890: -- Sorry, I meant that if {{o.a.h.mapreduce.task.crypto}} only has {{CryptoUtils}} in it, then maybe the new package isn't necessary. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5963) ShuffleHandler DB schema should be versioned with compatible/incompatible changes
[ https://issues.apache.org/jira/browse/MAPREDUCE-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5963: -- Issue Type: Sub-task (was: New Feature) Parent: MAPREDUCE-4150 ShuffleHandler DB schema should be versioned with compatible/incompatible changes - Key: MAPREDUCE-5963 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5963 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 2.4.1 Reporter: Junping Du Assignee: Junping Du ShuffleHandler persist job shuffle info into DB schema, which should be versioned with compatible/incompatible changes to support rolling upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (MAPREDUCE-5963) ShuffleHandler DB schema should be versioned with compatible/incompatible changes
[ https://issues.apache.org/jira/browse/MAPREDUCE-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du moved YARN-2265 to MAPREDUCE-5963: - Component/s: (was: nodemanager) Target Version/s: 2.6.0 (was: 2.6.0) Affects Version/s: (was: 2.4.1) 2.4.1 Key: MAPREDUCE-5963 (was: YARN-2265) Project: Hadoop Map/Reduce (was: Hadoop YARN) ShuffleHandler DB schema should be versioned with compatible/incompatible changes - Key: MAPREDUCE-5963 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5963 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Junping Du Assignee: Junping Du ShuffleHandler persist job shuffle info into DB schema, which should be versioned with compatible/incompatible changes to support rolling upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)