[jira] [Updated] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-112: --- Attachment: yarn-112-20130409.patch Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Fix For: 2.0.5-beta Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112-20130326.patch, yarn-112-20130408.1.patch, yarn-112-20130408.patch, yarn-112-20130409.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626412#comment-13626412 ] Hadoop QA commented on YARN-112: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577749/yarn-112-20130409.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/693//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/693//console This message is automatically generated. Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Fix For: 2.0.5-beta Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112-20130326.patch, yarn-112-20130408.1.patch, yarn-112-20130408.patch, yarn-112-20130409.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
[ https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626466#comment-13626466 ] Hudson commented on YARN-524: - Integrated in Hadoop-Yarn-trunk #178 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/178/]) YARN-524. Moving CHANGES.txt entry to the correct section. (Revision 1465876) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465876 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt TestYarnVersionInfo failing if generated properties doesn't include an SVN URL -- Key: YARN-524 URL: https://issues.apache.org/jira/browse/YARN-524 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 3.0.0 Environment: OS/X with branch off github Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 3.0.0 Attachments: YARN-524.patch {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call returns {{Unknown}} when that is the value inserted into the property file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626468#comment-13626468 ] Hudson commented on YARN-479: - Integrated in Hadoop-Yarn-trunk #178 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/178/]) YARN-479. NM retry behavior for connection to RM should be similar for lost heartbeats (Jian He via bikas) (Revision 1465731) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465731 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java NM retry behavior for connection to RM should be similar for lost heartbeats Key: YARN-479 URL: https://issues.apache.org/jira/browse/YARN-479 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Fix For: 2.0.5-beta Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, YARN-479.8.patch, YARN-479.9.patch Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626469#comment-13626469 ] Hudson commented on YARN-99: Integrated in Hadoop-Yarn-trunk #178 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/178/]) YARN-99. Modify private distributed cache to localize files such that no local directory hits unix file count limits and thus prevent job failures. Contributed by Omkar Vinit Joshi. (Revision 1465853) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465853 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/ResourceLocalizationSpec.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/impl/pb/ResourceLocalizationSpecPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/LocalizerHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/LocalizerHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerBuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/TestPBRecordImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/MockLocalizerHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java Jobs fail during resource localization when private distributed-cache hits unix directory limits Key: YARN-99 URL: https://issues.apache.org/jira/browse/YARN-99 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Devaraj K Assignee: Omkar Vinit Joshi Fix For: 2.0.5-beta Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, yarn-99-20130403.patch, yarn-99-20130408.1.patch, yarn-99-20130408.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. {code:xml} java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
[jira] [Commented] (YARN-557) TestUnmanagedAMLauncher fails on Windows
[ https://issues.apache.org/jira/browse/YARN-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626555#comment-13626555 ] Hudson commented on YARN-557: - Integrated in Hadoop-Hdfs-trunk #1367 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1367/]) YARN-557. Fix TestUnmanagedAMLauncher failure on Windows. Contributed by Chris Nauroth. (Revision 1465869) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465869 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java TestUnmanagedAMLauncher fails on Windows Key: YARN-557 URL: https://issues.apache.org/jira/browse/YARN-557 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.0.0 Attachments: YARN-557.1.patch {{TestUnmanagedAMLauncher}} fails on Windows due to attempting to run a Unix-specific command in distributed shell and use of a Unix-specific environment variable to determine username for the {{ContainerLaunchContext}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626559#comment-13626559 ] Hudson commented on YARN-479: - Integrated in Hadoop-Hdfs-trunk #1367 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1367/]) YARN-479. NM retry behavior for connection to RM should be similar for lost heartbeats (Jian He via bikas) (Revision 1465731) Result = FAILURE bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465731 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java NM retry behavior for connection to RM should be similar for lost heartbeats Key: YARN-479 URL: https://issues.apache.org/jira/browse/YARN-479 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Fix For: 2.0.5-beta Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, YARN-479.8.patch, YARN-479.9.patch Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626560#comment-13626560 ] Hudson commented on YARN-99: Integrated in Hadoop-Hdfs-trunk #1367 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1367/]) YARN-99. Modify private distributed cache to localize files such that no local directory hits unix file count limits and thus prevent job failures. Contributed by Omkar Vinit Joshi. (Revision 1465853) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465853 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/ResourceLocalizationSpec.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/impl/pb/ResourceLocalizationSpecPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/LocalizerHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/LocalizerHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerBuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/TestPBRecordImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/MockLocalizerHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java Jobs fail during resource localization when private distributed-cache hits unix directory limits Key: YARN-99 URL: https://issues.apache.org/jira/browse/YARN-99 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Devaraj K Assignee: Omkar Vinit Joshi Fix For: 2.0.5-beta Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, yarn-99-20130403.patch, yarn-99-20130408.1.patch, yarn-99-20130408.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. {code:xml} java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
[jira] [Commented] (YARN-557) TestUnmanagedAMLauncher fails on Windows
[ https://issues.apache.org/jira/browse/YARN-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626579#comment-13626579 ] Hudson commented on YARN-557: - Integrated in Hadoop-Mapreduce-trunk #1394 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1394/]) YARN-557. Fix TestUnmanagedAMLauncher failure on Windows. Contributed by Chris Nauroth. (Revision 1465869) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465869 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java TestUnmanagedAMLauncher fails on Windows Key: YARN-557 URL: https://issues.apache.org/jira/browse/YARN-557 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.0.0 Attachments: YARN-557.1.patch {{TestUnmanagedAMLauncher}} fails on Windows due to attempting to run a Unix-specific command in distributed shell and use of a Unix-specific environment variable to determine username for the {{ContainerLaunchContext}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
[ https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626581#comment-13626581 ] Hudson commented on YARN-524: - Integrated in Hadoop-Mapreduce-trunk #1394 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1394/]) YARN-524. Moving CHANGES.txt entry to the correct section. (Revision 1465876) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465876 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt TestYarnVersionInfo failing if generated properties doesn't include an SVN URL -- Key: YARN-524 URL: https://issues.apache.org/jira/browse/YARN-524 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 3.0.0 Environment: OS/X with branch off github Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 3.0.0 Attachments: YARN-524.patch {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call returns {{Unknown}} when that is the value inserted into the property file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats
[ https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626583#comment-13626583 ] Hudson commented on YARN-479: - Integrated in Hadoop-Mapreduce-trunk #1394 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1394/]) YARN-479. NM retry behavior for connection to RM should be similar for lost heartbeats (Jian He via bikas) (Revision 1465731) Result = FAILURE bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465731 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java NM retry behavior for connection to RM should be similar for lost heartbeats Key: YARN-479 URL: https://issues.apache.org/jira/browse/YARN-479 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Jian He Fix For: 2.0.5-beta Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, YARN-479.4.patch, YARN-479.5.patch, YARN-479.6.patch, YARN-479.7.patch, YARN-479.8.patch, YARN-479.9.patch Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-99) Jobs fail during resource localization when private distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626584#comment-13626584 ] Hudson commented on YARN-99: Integrated in Hadoop-Mapreduce-trunk #1394 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1394/]) YARN-99. Modify private distributed cache to localize files such that no local directory hits unix file count limits and thus prevent job failures. Contributed by Omkar Vinit Joshi. (Revision 1465853) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1465853 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/ResourceLocalizationSpec.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/impl/pb/ResourceLocalizationSpecPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/LocalizerHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/LocalizerHeartbeatResponsePBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/NodeManagerBuilderUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/api/protocolrecords/impl/pb/TestPBRecordImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/MockLocalizerHeartbeatResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java Jobs fail during resource localization when private distributed-cache hits unix directory limits Key: YARN-99 URL: https://issues.apache.org/jira/browse/YARN-99 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Devaraj K Assignee: Omkar Vinit Joshi Fix For: 2.0.5-beta Attachments: yarn-99-20130324.patch, yarn-99-20130403.1.patch, yarn-99-20130403.patch, yarn-99-20130408.1.patch, yarn-99-20130408.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. {code:xml} java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at
[jira] [Resolved] (YARN-122) CompositeService should clone the Configurations it passes to children
[ https://issues.apache.org/jira/browse/YARN-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved YARN-122. - Resolution: Invalid Resolving as INVALID as some composite services (e.g. {{MiniYARNCluster}} use shared config object as way of sharing information -such as allocated ports. CompositeService should clone the Configurations it passes to children -- Key: YARN-122 URL: https://issues.apache.org/jira/browse/YARN-122 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Priority: Minor Original Estimate: 0.5h Remaining Estimate: 0.5h {{CompositeService.init(Configuration)}} saves the configuration passed in *and* passes the same instance down to all managed services. This means a change in the configuration of one child could propagate to all the others. Unless this is desired, the configuration should be cloned for each child. Fast and easy fix; tests can be added to those coming in MAPREDUCE-4014 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs
[ https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-535: Attachment: YARN-535.patch this patch verifies that the conf to write has picked up the hostnames, then saves the config XML to a string *before* trying to write it to the classpath'd resource TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs Key: YARN-535 URL: https://issues.apache.org/jira/browse/YARN-535 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 3.0.0 Environment: OS/X laptop, HFS+ filesystem Reporter: Steve Loughran Priority: Minor Attachments: YARN-535.patch the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. As {{Configuration.writeXml()}} does a reread of all resources, this will break if the (open-for-writing) resource is already visible as an empty file. This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks later test runs -because it is not overwritten by later incremental builds, due to timestamps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-530: Attachment: YARN-530-2.patch This patch * includes the static service notification tests of YARN-119 * catches when a service replaces or clones the original configuration it was started with, and updates the copy stored in the base service class. Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530.patch # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-2.patch This is an aggregate patch for jenkins and testing; YARN-530 is the subset for real review * contains a code-review of all {{innerInit()}}, {{innerStart()}} ops * contain a code review of all {{innerStop()}} operations to harden against null values (MAPREDUCE-3502). * incorporates YARN-535 to fix a race condition in one test Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-2.patch, YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers. h2. AbstractService state change doesn't defend against race conditions. There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads. h2. Static methods to choreograph of lifecycle operations Helper methods to move things through lifecycles. init-start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns. h2. state transition failures are something that registered service listeners may wish to be informed of. When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics. *Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface. h2. Service listener failures not handled Is this an error an error or not? Log and ignore may not be what is desired. *Proposed:* during {{stop()}} any exception
[jira] [Commented] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs
[ https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626867#comment-13626867 ] Hadoop QA commented on YARN-535: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577836/YARN-535.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/694//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/694//console This message is automatically generated. TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs Key: YARN-535 URL: https://issues.apache.org/jira/browse/YARN-535 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 3.0.0 Environment: OS/X laptop, HFS+ filesystem Reporter: Steve Loughran Priority: Minor Attachments: YARN-535.patch the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. As {{Configuration.writeXml()}} does a reread of all resources, this will break if the (open-for-writing) resource is already visible as an empty file. This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks later test runs -because it is not overwritten by later incremental builds, due to timestamps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-558) Add ability to completely remove nodemanager from resourcemanager.
Garth Goodson created YARN-558: -- Summary: Add ability to completely remove nodemanager from resourcemanager. Key: YARN-558 URL: https://issues.apache.org/jira/browse/YARN-558 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Garth Goodson Priority: Minor I would like to add the ability to completely remove a nodemanager from the resourcemanager's state. I run a cloud service where I want to dynamically bring up nodes to act as nodemanagers and then bring them down again when not needed. These nodes have dynamically assigned IPs, thus the alternative of decommissioning them via an excludes file leads to a large (unbounded) list of decommissioned nodes that may never be commissioned again. I would like the ability to move a node from a decommissioned state to completely removing it from the resource manager. I have thought of two ways of implementing this. 1) Add an optional timeout between the decommission state - being removed from the nodemanager. 2) Add an explicit RPC to remove a node that is decommissioned. Any additional thoughts/discussion are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626880#comment-13626880 ] Hadoop QA commented on YARN-530: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577837/YARN-530-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 33 warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/695//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/695//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/695//console This message is automatically generated. Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530.patch # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626895#comment-13626895 ] Hadoop QA commented on YARN-117: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577838/YARN-117-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 19 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 33 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy: org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/696//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/696//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/696//console This message is automatically generated. Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-2.patch, YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the
[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging
[ https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626930#comment-13626930 ] Sandy Ryza commented on YARN-366: - bq. Let's make it a class-name Sounds good. bq. I don't like how we are passing configuration around in the ContainerManagerImpl constructor, we have Service.init(Configuration) exactly for this purpose. This is what I tried first, but I came across the following problem, stemming from the fact that both WebServer and ContainerManager have a a ContainersMonitor: * ContainerManager needs to pass its dispatcher to its ContainersMonitor when it instantiates it. * To address this, we can instantiate ContainersMonitor in ContainerManager's init instead of its constructor. * The way that composite services work, parent service inits tend to happen before child service inits. * In NodeManager's init method, it instantiates a WebServer, which takes a ContainersMonitor as an argument. * Because of the above, the NodeManager's ContainerManager's ContainersMonitor will not be instantiated yet. To work around this, we'd need to do something like move the hierarchy around a little or set the WebServer's ContainersMonitor separately from the WebServer's instantiation. What do you think? Add a tracing async dispatcher to simplify debugging Key: YARN-366 URL: https://issues.apache.org/jira/browse/YARN-366 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366.patch Exceptions thrown in YARN/MR code with asynchronous event handling do not contain informative stack traces, as all handle() methods sit directly under the dispatcher thread's loop. This makes errors very difficult to debug for those who are not intimately familiar with the code, as it is difficult to see which chain of events caused a particular outcome. I propose adding an AsyncDispatcher that instruments events with tracing information. Whenever an event is dispatched during the handling of another event, the dispatcher would annotate that event with a pointer to its parent. When the dispatcher catches an exception, it could reconstruct a stack trace of the chain of events that led to it, and be able to log something informative. This would be an experimental feature, off by default, unless extensive testing showed that it did not have a significant performance impact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs
[ https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626943#comment-13626943 ] Chris Nauroth commented on YARN-535: Hi, Steve. The patch looks good. I verified that the test passes, running on Mac and Windows. Do you also want to include {{TestDistributedShell#setup}} in the patch? I expect the change would be nearly identical. TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs Key: YARN-535 URL: https://issues.apache.org/jira/browse/YARN-535 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 3.0.0 Environment: OS/X laptop, HFS+ filesystem Reporter: Steve Loughran Priority: Minor Attachments: YARN-535.patch the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. As {{Configuration.writeXml()}} does a reread of all resources, this will break if the (open-for-writing) resource is already visible as an empty file. This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks later test runs -because it is not overwritten by later incremental builds, due to timestamps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs
[ https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth reassigned YARN-535: -- Assignee: Steve Loughran TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs Key: YARN-535 URL: https://issues.apache.org/jira/browse/YARN-535 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 3.0.0 Environment: OS/X laptop, HFS+ filesystem Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Attachments: YARN-535.patch the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. As {{Configuration.writeXml()}} does a reread of all resources, this will break if the (open-for-writing) resource is already visible as an empty file. This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks later test runs -because it is not overwritten by later incremental builds, due to timestamps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-546) mapred.fairscheduler.eventlog.enabled removed from Hadoop 2.0
[ https://issues.apache.org/jira/browse/YARN-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lohit Vijayarenu updated YARN-546: -- Attachment: YARN-546.1.patch It looks like there is nothing much logged in event log and majority seems to be just node updates. If no one votes against to have this removed then here is a patch for review. mapred.fairscheduler.eventlog.enabled removed from Hadoop 2.0 - Key: YARN-546 URL: https://issues.apache.org/jira/browse/YARN-546 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Attachments: YARN-546.1.patch Hadoop 1.0 supported an option to turn on/off FairScheduler event logging using mapred.fairscheduler.eventlog.enabled. In Hadoop 2.0, it looks like this option has been removed (or not ported?) which causes event logging to be enabled by default and there is no way to turn it off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626999#comment-13626999 ] Vinod Kumar Vavilapalli commented on YARN-112: -- The patch passes eclipse on my laptop, I believe it is due to unclean .m2 on the build machines (HADOOP-9251). Checking this in. Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Fix For: 2.0.5-beta Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112-20130326.patch, yarn-112-20130408.1.patch, yarn-112-20130408.patch, yarn-112-20130409.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627015#comment-13627015 ] Hudson commented on YARN-112: - Integrated in Hadoop-trunk-Commit #3584 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3584/]) YARN-112. Fixed a race condition during localization that fails containers. Contributed by Omkar Vinit Joshi. MAPREDUCE-5138. Fix LocalDistributedCacheManager after YARN-112. Contributed by Omkar Vinit Joshi. (Revision 1466196) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1466196 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Fix For: 2.0.5-beta Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112-20130326.patch, yarn-112-20130408.1.patch, yarn-112-20130408.patch, yarn-112-20130409.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-534) AM max attempts is not checked when RM restart and try to recover attempts
[ https://issues.apache.org/jira/browse/YARN-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627018#comment-13627018 ] Vinod Kumar Vavilapalli commented on YARN-534: -- The latest patch looks good, +1. Eclipse problem is due to unclean .m2 on build machine, and it passes on my laptop. Checking this in. AM max attempts is not checked when RM restart and try to recover attempts -- Key: YARN-534 URL: https://issues.apache.org/jira/browse/YARN-534 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-534.1.patch, YARN-534.2.patch, YARN-534.3.patch, YARN-534.4.patch Currently,AM max attempts is only checked if the current attempt fails and check to see whether to create new attempt. If the RM restarts before the max-attempt fails, it'll not clean the state store, when RM comes back, it will retry attempt again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-559) Make all YARN API and libraries available through an api jar
Bikas Saha created YARN-559: --- Summary: Make all YARN API and libraries available through an api jar Key: YARN-559 URL: https://issues.apache.org/jira/browse/YARN-559 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha This should be the dependency for interacting with YARN and would prevent unnecessary leakage of other internal stuff. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-560) If AM fails due to overrunning resource limits, error not visible through UI sometimes
[ https://issues.apache.org/jira/browse/YARN-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah moved MAPREDUCE-3949 to YARN-560: - Affects Version/s: (was: 0.23.2) (was: 0.24.0) 0.23.3 2.0.0-alpha Key: YARN-560 (was: MAPREDUCE-3949) Project: Hadoop YARN (was: Hadoop Map/Reduce) If AM fails due to overrunning resource limits, error not visible through UI sometimes -- Key: YARN-560 URL: https://issues.apache.org/jira/browse/YARN-560 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.0-alpha, 0.23.3 Reporter: Todd Lipcon Assignee: Ravi Prakash Priority: Minor Attachments: MAPREDUCE-3949.patch I had a case where an MR AM eclipsed the configured memory limit. This caused the AM's container to get killed, but nowhere accessible through the web UI showed these diagnostics. I had to go view the NM's logs via ssh before I could figure out what had happened to my application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-560) If AM fails due to overrunning resource limits, error not visible through UI sometimes
[ https://issues.apache.org/jira/browse/YARN-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-560: - Labels: usability (was: ) If AM fails due to overrunning resource limits, error not visible through UI sometimes -- Key: YARN-560 URL: https://issues.apache.org/jira/browse/YARN-560 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Todd Lipcon Assignee: Ravi Prakash Priority: Minor Labels: usability Attachments: MAPREDUCE-3949.patch I had a case where an MR AM eclipsed the configured memory limit. This caused the AM's container to get killed, but nowhere accessible through the web UI showed these diagnostics. I had to go view the NM's logs via ssh before I could figure out what had happened to my application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-451) Add more metrics to RM page
[ https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627203#comment-13627203 ] Lohit Vijayarenu commented on YARN-451: --- Tried to see if adding total number of Containers was trivial change, but looks like there is no notion of application max resource available to resourcemangaer. This might be the reason why RM page did not have the information. Looking into RMAppImpl shows that this kind of information is not passed either from Client/AM to RM during application initialization. Something close to notion of job weight I could see was Resource demand, but that seems to be change based on how an application request containers. For example FairScheduler seem to recalculate fairshare based on how much resource demand is passed by applications. One option I can think of is to add an additional field in protobuf which specifies what is total number of containers/resource an application might use. This would be optional field which can be used only by MapReduce for now and Client can set this value based on number of mappers/reducers. I am not sure if this is the right approach, any other simpler ideas people can suggest? Add more metrics to RM page --- Key: YARN-451 URL: https://issues.apache.org/jira/browse/YARN-451 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Priority: Minor ResourceManager webUI shows list of RUNNING applications, but it does not tell which applications are requesting more resource compared to others. With cluster running hundreds of applications at once it would be useful to have some kind of metric to show high-resource usage applications vs low-resource usage ones. At the minimum showing number of containers is good option. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627215#comment-13627215 ] Steve Loughran commented on YARN-117: - one test failing, looks like a service isn't shutting down reliably, so the next test in the same JVM can't bind to a port. {code} org.apache.hadoop.yarn.YarnException: java.net.BindException: Problem binding to [localhost:12345] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139) at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:63) at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.innerStart(ContainerManagerImpl.java:230) at org.apache.hadoop.yarn.service.AbstractService.start(AbstractService.java:175) at org.apache.hadoop.yarn.service.CompositeService.innerStart(CompositeService.java:76) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.innerStart(NodeManager.java:209) at org.apache.hadoop.yarn.service.AbstractService.start(AbstractService.java:175) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeReboot(TestNodeStatusUpdater.java:726) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) Caused by: java.net.BindException: Problem binding to [localhost:12345] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:716) at org.apache.hadoop.ipc.Server.bind(Server.java:415) at org.apache.hadoop.ipc.Server$Listener.init(Server.java:518) at org.apache.hadoop.ipc.Server.init(Server.java:1963) at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:986) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.init(ProtobufRpcEngine.java:427) at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:402) at
[jira] [Updated] (YARN-117) Enhance YARN service model
[ https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-117: Attachment: YARN-117-3.patch Updated patch # fix javadoc warnings (enum @links that the IDE felt were valid, but not javadoc) # added more rigorous shutdown in the test teardown, and make sure the overridden {{NodeManager}} overrides {{innerStop()}} and not {{stop()}} Enhance YARN service model -- Key: YARN-117 URL: https://issues.apache.org/jira/browse/YARN-117 Project: Hadoop YARN Issue Type: Improvement Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.patch Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than stopped. MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers. h2. AbstractService state change doesn't defend against race conditions. There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads. h2. Static methods to choreograph of lifecycle operations Helper methods to move things through lifecycles. init-start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns. h2. state transition failures are something that registered service listeners may wish to be informed of. When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics. *Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface. h2. Service listener failures not handled Is this an error an error or not? Log and ignore may not be what is desired. *Proposed:* during {{stop()}} any exception by a listener is caught and discarded, to increase the likelihood
[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-530: Attachment: YARN-530-3.patch Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530-3.patch, YARN-530.patch # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-495) Change NM behavior of reboot to resync
[ https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-495: - Attachment: YARN-495.3.patch fix merge conflicts Change NM behavior of reboot to resync -- Key: YARN-495 URL: https://issues.apache.org/jira/browse/YARN-495 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-495.1.patch, YARN-495.2.patch, YARN-495.3.patch When a reboot command is sent from RM, the node manager doesn't clean up the containers while its stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services
[ https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627240#comment-13627240 ] Hadoop QA commented on YARN-530: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577911/YARN-530-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/697//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/697//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/697//console This message is automatically generated. Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services - Key: YARN-530 URL: https://issues.apache.org/jira/browse/YARN-530 Project: Hadoop YARN Issue Type: Sub-task Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530-3.patch, YARN-530.patch # Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-503) DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
[ https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated YARN-503: - Attachment: YARN-503.patch Fixed findbugs issues. Upon review of those issues, I realized I was trying too hard to prevent tight unlikely race conditions. I simplified the design further which allowed for the removal of virtually all the synchronization which could lead to tricky deadlocks. Updated tests to ensure apps and/or tokens don't leak. DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false -- Key: YARN-503 URL: https://issues.apache.org/jira/browse/YARN-503 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha Reporter: Siddharth Seth Assignee: Daryn Sharp Attachments: YARN-503.patch, YARN-503.patch The first Job/App to register a token is the one which DelegationTokenRenewer associates with a a specific Token. An attempt to remove/cancel these shared tokens by subsequent jobs doesn't work - since the JobId will not match. As a result, Even if subsequent jobs have MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set to true - tokens will not be cancelled when those jobs complete. Tokens will eventually be removed from the RM / JT when the service that issued them considers them to have expired or via an explicit cancelDelegationTokens call (not implemented yet in 23). A side affect of this is that the same delegation token will end up being renewed multiple times (a separate TimerTask for each job which uses the token). DelegationTokenRenewer could maintain a reference count/list of jobIds for shared tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-503) DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
[ https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627271#comment-13627271 ] Hadoop QA commented on YARN-503: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577915/YARN-503.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/698//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/698//console This message is automatically generated. DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false -- Key: YARN-503 URL: https://issues.apache.org/jira/browse/YARN-503 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha Reporter: Siddharth Seth Assignee: Daryn Sharp Attachments: YARN-503.patch, YARN-503.patch The first Job/App to register a token is the one which DelegationTokenRenewer associates with a a specific Token. An attempt to remove/cancel these shared tokens by subsequent jobs doesn't work - since the JobId will not match. As a result, Even if subsequent jobs have MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set to true - tokens will not be cancelled when those jobs complete. Tokens will eventually be removed from the RM / JT when the service that issued them considers them to have expired or via an explicit cancelDelegationTokens call (not implemented yet in 23). A side affect of this is that the same delegation token will end up being renewed multiple times (a separate TimerTask for each job which uses the token). DelegationTokenRenewer could maintain a reference count/list of jobIds for shared tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-518) Fair Scheduler's document link could be added to the hadoop 2.x main doc page
[ https://issues.apache.org/jira/browse/YARN-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-518: Attachment: YARN-518.patch Fair Scheduler's document link could be added to the hadoop 2.x main doc page - Key: YARN-518 URL: https://issues.apache.org/jira/browse/YARN-518 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.0.3-alpha Reporter: Dapeng Sun Attachments: YARN-518.patch Currently the doc page for Fair Scheduler looks good and it’s here, http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. It would be better to add the document link to the YARN section in the Hadoop 2.x main doc page, so that users can easily find the doc to experimentally try Fair Scheduler as Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-518) Fair Scheduler's document link could be added to the hadoop 2.x main doc page
[ https://issues.apache.org/jira/browse/YARN-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627282#comment-13627282 ] Sandy Ryza commented on YARN-518: - Uploaded a simple patch that adds a link to the fair scheduler to the index page Fair Scheduler's document link could be added to the hadoop 2.x main doc page - Key: YARN-518 URL: https://issues.apache.org/jira/browse/YARN-518 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.0.3-alpha Reporter: Dapeng Sun Assignee: Sandy Ryza Attachments: YARN-518.patch Currently the doc page for Fair Scheduler looks good and it’s here, http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. It would be better to add the document link to the YARN section in the Hadoop 2.x main doc page, so that users can easily find the doc to experimentally try Fair Scheduler as Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-518) Fair Scheduler's document link could be added to the hadoop 2.x main doc page
[ https://issues.apache.org/jira/browse/YARN-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627290#comment-13627290 ] Hadoop QA commented on YARN-518: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577924/YARN-518.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/699//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/699//console This message is automatically generated. Fair Scheduler's document link could be added to the hadoop 2.x main doc page - Key: YARN-518 URL: https://issues.apache.org/jira/browse/YARN-518 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 2.0.3-alpha Reporter: Dapeng Sun Assignee: Sandy Ryza Attachments: YARN-518.patch Currently the doc page for Fair Scheduler looks good and it’s here, http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. It would be better to add the document link to the YARN section in the Hadoop 2.x main doc page, so that users can easily find the doc to experimentally try Fair Scheduler as Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.
Hitesh Shah created YARN-561: Summary: Nodemanager should set some key information into the environment of every container that it launches. Key: YARN-561 URL: https://issues.apache.org/jira/browse/YARN-561 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Information such as containerId, nodemanager hostname, nodemanager port is not set in the environment when any container is launched. For an AM, the RM does all of this for it but for a container launched by an application, all of the above need to be set by the ApplicationMaster. At the minimum, container id would be a useful piece of information. If the container wishes to talk to its local NM, the nodemanager related information would also come in handy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.
[ https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-561: - Issue Type: Sub-task (was: Bug) Parent: YARN-386 Nodemanager should set some key information into the environment of every container that it launches. - Key: YARN-561 URL: https://issues.apache.org/jira/browse/YARN-561 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Labels: usability Information such as containerId, nodemanager hostname, nodemanager port is not set in the environment when any container is launched. For an AM, the RM does all of this for it but for a container launched by an application, all of the above need to be set by the ApplicationMaster. At the minimum, container id would be a useful piece of information. If the container wishes to talk to its local NM, the nodemanager related information would also come in handy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land
[ https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627467#comment-13627467 ] Vinod Kumar Vavilapalli commented on YARN-486: -- Please also fix long lines in ContainerRemoteLaunchEvent. Filed MAPREDUCE-5139 for tracking MR only changes. Let's continue the patch here and post the final MR changes there. Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land - Key: YARN-486 URL: https://issues.apache.org/jira/browse/YARN-486 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-486.1.patch, YARN-486.2.patch, YARN-486.3.patch, YARN-486.4.patch Currently, id, resource request etc need to be copied over from Container to ContainerLaunchContext. This can be brittle. Also it leads to duplication of information (such as Resource from CLC and Resource from Container and Container.tokens). Sending Container directly to startContainer solves these problems. It also makes CLC clean by only having stuff in it that it set by the client/AM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-514) Delayed store operations should not result in RM unavailability for app submission
[ https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-514: - Attachment: YARN-514.1.patch In this patch, I've changed RMStateStore#storeApplication from blocking API to non-blocking API. Therefore, it is no longer necessary to invoke the API in ClientRMService#submitApplication. Instead, I defined a new state, named SAVING, between NEW and SUBMITTED of RMApp. TestRMAppTransitions were modified to test the additional state transition, and to test whether the application is stored before SUBMITTED and removed after FINISHED. An additional issue is that the mapping between yarn and mapreduce states needs to be updated due to the newly added state. This will be filed and solved in a separate MR jira. Delayed store operations should not result in RM unavailability for app submission -- Key: YARN-514 URL: https://issues.apache.org/jira/browse/YARN-514 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: YARN-514.1.patch Currently, app submission is the only store operation performed synchronously because the app must be stored before the request returns with success. This makes the RM susceptible to blocking all client threads on slow store operations, resulting in RM being perceived as unavailable by clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission
[ https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627489#comment-13627489 ] Hadoop QA commented on YARN-514: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577953/YARN-514.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/700//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/700//console This message is automatically generated. Delayed store operations should not result in RM unavailability for app submission -- Key: YARN-514 URL: https://issues.apache.org/jira/browse/YARN-514 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: YARN-514.1.patch Currently, app submission is the only store operation performed synchronously because the app must be stored before the request returns with success. This makes the RM susceptible to blocking all client threads on slow store operations, resulting in RM being perceived as unavailable by clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira