[jira] [Commented] (YARN-884) AM expiry interval should be set to smaller of {am, nm}.liveness-monitor.expiry-interval-ms
[ https://issues.apache.org/jira/browse/YARN-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692695#comment-13692695 ] Karthik Kambatla commented on YARN-884: --- The test TestAMAuthorization fails on trunk as well. Don't think the patch can affect the test in any way. > AM expiry interval should be set to smaller of {am, > nm}.liveness-monitor.expiry-interval-ms > --- > > Key: YARN-884 > URL: https://issues.apache.org/jira/browse/YARN-884 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: configuration > Attachments: yarn-884-1.patch > > > As the AM can't outlive the NM on which it is running, it is a good idea to > disallow setting the am.liveness-monitor.expiry-interval-ms to a value higher > than nm.liveness-monitor.expiry-interval-ms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-884) AM expiry interval should be set to smaller of {am, nm}.liveness-monitor.expiry-interval-ms
[ https://issues.apache.org/jira/browse/YARN-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692689#comment-13692689 ] Hadoop QA commented on YARN-884: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589529/yarn-884-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1395//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1395//console This message is automatically generated. > AM expiry interval should be set to smaller of {am, > nm}.liveness-monitor.expiry-interval-ms > --- > > Key: YARN-884 > URL: https://issues.apache.org/jira/browse/YARN-884 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: configuration > Attachments: yarn-884-1.patch > > > As the AM can't outlive the NM on which it is running, it is a good idea to > disallow setting the am.liveness-monitor.expiry-interval-ms to a value higher > than nm.liveness-monitor.expiry-interval-ms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-885) TestBinaryTokenFile (and others) fail
[ https://issues.apache.org/jira/browse/YARN-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692687#comment-13692687 ] Kam Kasravi commented on YARN-885: -- Changing ContainerLocalizer.runLocalization to where the local context uses the same tokens as the user context seems to fix this problem. > TestBinaryTokenFile (and others) fail > - > > Key: YARN-885 > URL: https://issues.apache.org/jira/browse/YARN-885 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.4-alpha >Reporter: Kam Kasravi > > Seeing the following stack trace and the unit test goes into a infinite loop: > 2013-06-24 17:03:58,316 ERROR [LocalizerRunner for > container_1372118631537_0001_01_01] security.UserGroupInformation > (UserGroupInformation.java:doAs(1480)) - PriviledgedActionException > as:kamkasravi (auth:SIMPLE) cause:java.io.IOException: Server asks us to fall > back to SIMPLE auth, but this client is configured to only allow secure > connections. > 2013-06-24 17:03:58,317 WARN [LocalizerRunner for > container_1372118631537_0001_01_01] ipc.Client (Client.java:run(579)) - > Exception encountered while connecting to the server : java.io.IOException: > Server asks us to fall back to SIMPLE auth, but this client is configured to > only allow secure connections. > 2013-06-24 17:03:58,318 ERROR [LocalizerRunner for > container_1372118631537_0001_01_01] security.UserGroupInformation > (UserGroupInformation.java:doAs(1480)) - PriviledgedActionException > as:kamkasravi (auth:SIMPLE) cause:java.io.IOException: java.io.IOException: > Server asks us to fall back to SIMPLE auth, but this client is configured to > only allow secure connections. > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) > at > org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:56) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:247) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:859) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-874) Tracking YARN/MR test failures after HADOOP-9421 and YARN-827
[ https://issues.apache.org/jira/browse/YARN-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692675#comment-13692675 ] Hadoop QA commented on YARN-874: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589527/YARN-874.2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1394//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1394//console This message is automatically generated. > Tracking YARN/MR test failures after HADOOP-9421 and YARN-827 > - > > Key: YARN-874 > URL: https://issues.apache.org/jira/browse/YARN-874 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Attachments: YARN-874.1.txt, YARN-874.2.txt, YARN-874.txt > > > HADOOP-9421 and YARN-827 broke some YARN/MR tests. Tracking those.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-884) AM expiry interval should be set to smaller of {am, nm}.liveness-monitor.expiry-interval-ms
[ https://issues.apache.org/jira/browse/YARN-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692668#comment-13692668 ] Omkar Vinit Joshi commented on YARN-884: [~kkambatl] makes sense... > AM expiry interval should be set to smaller of {am, > nm}.liveness-monitor.expiry-interval-ms > --- > > Key: YARN-884 > URL: https://issues.apache.org/jira/browse/YARN-884 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: configuration > Attachments: yarn-884-1.patch > > > As the AM can't outlive the NM on which it is running, it is a good idea to > disallow setting the am.liveness-monitor.expiry-interval-ms to a value higher > than nm.liveness-monitor.expiry-interval-ms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692666#comment-13692666 ] Sandy Ryza commented on YARN-763: - Can we move all of this into the switch statement, replace break with return, and get rid of the stop variable? Unless the thinking is that returning from a method in the middle is bad, I think this would be a lot cleaner. > AMRMClientAsync should stop heartbeating after receiving shutdown from RM > - > > Key: YARN-763 > URL: https://issues.apache.org/jira/browse/YARN-763 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-763.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-874) Tracking YARN/MR test failures after HADOOP-9421 and YARN-827
[ https://issues.apache.org/jira/browse/YARN-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692664#comment-13692664 ] Omkar Vinit Joshi commented on YARN-874: tested YARN-872-2...on local cluster... with patch it is running now. > Tracking YARN/MR test failures after HADOOP-9421 and YARN-827 > - > > Key: YARN-874 > URL: https://issues.apache.org/jira/browse/YARN-874 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Attachments: YARN-874.1.txt, YARN-874.2.txt, YARN-874.txt > > > HADOOP-9421 and YARN-827 broke some YARN/MR tests. Tracking those.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-884) AM expiry interval should be set to smaller of {am, nm}.liveness-monitor.expiry-interval-ms
[ https://issues.apache.org/jira/browse/YARN-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-884: -- Attachment: yarn-884-1.patch Uploading a straight-forward patch. > AM expiry interval should be set to smaller of {am, > nm}.liveness-monitor.expiry-interval-ms > --- > > Key: YARN-884 > URL: https://issues.apache.org/jira/browse/YARN-884 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: configuration > Attachments: yarn-884-1.patch > > > As the AM can't outlive the NM on which it is running, it is a good idea to > disallow setting the am.liveness-monitor.expiry-interval-ms to a value higher > than nm.liveness-monitor.expiry-interval-ms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692641#comment-13692641 ] Hadoop QA commented on YARN-763: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589525/YARN-763.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1152 javac compiler warnings (more than the trunk's current 1151 warnings). {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1393//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1393//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1393//console This message is automatically generated. > AMRMClientAsync should stop heartbeating after receiving shutdown from RM > - > > Key: YARN-763 > URL: https://issues.apache.org/jira/browse/YARN-763 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-763.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-874) Tracking YARN/MR test failures after HADOOP-9421 and YARN-827
[ https://issues.apache.org/jira/browse/YARN-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-874: - Attachment: YARN-874.2.txt Updated patch with a new testing validating the common changes. > Tracking YARN/MR test failures after HADOOP-9421 and YARN-827 > - > > Key: YARN-874 > URL: https://issues.apache.org/jira/browse/YARN-874 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Attachments: YARN-874.1.txt, YARN-874.2.txt, YARN-874.txt > > > HADOOP-9421 and YARN-827 broke some YARN/MR tests. Tracking those.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-758) Fair scheduler has some bug that causes TestRMRestart to fail
[ https://issues.apache.org/jira/browse/YARN-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved YARN-758. - Resolution: Not A Problem > Fair scheduler has some bug that causes TestRMRestart to fail > - > > Key: YARN-758 > URL: https://issues.apache.org/jira/browse/YARN-758 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Sandy Ryza > > YARN-757 got fixed by changing the scheduler from Fair to default (which is > capacity). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-884) AM expiry interval should be set to smaller of {am, nm}.liveness-monitor.expiry-interval-ms
[ https://issues.apache.org/jira/browse/YARN-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692639#comment-13692639 ] Karthik Kambatla commented on YARN-884: --- If AM_EXPIRY < NM_EXPIRY, # the user has explicitly set AM_EXPIRY to be smaller than NM_EXPIRY # I agree it is possible that the RM might expire the first attempt and start another attempt, in case the NM fails to connect to the RM for a time 't' such that AM_EXPIRY < t < NM_EXPIRY. However, the user has asked for a shorter expiry interval for a reason. If AM_EXPIRY > NM_EXPIRY, # When NM dies, the AMs on it also would have died. However, IIUC, the RM wouldn't schedule another attempt until AM_EXPIRY is met. Correct me if I am wrong. > AM expiry interval should be set to smaller of {am, > nm}.liveness-monitor.expiry-interval-ms > --- > > Key: YARN-884 > URL: https://issues.apache.org/jira/browse/YARN-884 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: configuration > > As the AM can't outlive the NM on which it is running, it is a good idea to > disallow setting the am.liveness-monitor.expiry-interval-ms to a value higher > than nm.liveness-monitor.expiry-interval-ms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-808) ApplicationReport does not clearly tell that the attempt is running or not
[ https://issues.apache.org/jira/browse/YARN-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692632#comment-13692632 ] Xuan Gong commented on YARN-808: How about we expose the current attempt Id with attempt status as well as previous attempt Id with attempt status if they are exist ?? > ApplicationReport does not clearly tell that the attempt is running or not > -- > > Key: YARN-808 > URL: https://issues.apache.org/jira/browse/YARN-808 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > > When an app attempt fails and is being retried, ApplicationReport immediately > gives the new attemptId and non-null values of host etc. There is no way for > clients to know that the attempt is running other than connecting to it and > timing out on invalid host. Solution would be to expose the attempt state or > return a null value for host instead of "N/A" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-763: --- Attachment: YARN-763.1.patch > AMRMClientAsync should stop heartbeating after receiving shutdown from RM > - > > Key: YARN-763 > URL: https://issues.apache.org/jira/browse/YARN-763 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-763.1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692601#comment-13692601 ] Xuan Gong commented on YARN-873: At commandLine, if we type yarn application -status $UnKnowAppId, it will output Application with id $UnKnowAppId doesn't exist in RM. > YARNClient.getApplicationReport(unknownAppId) returns a null report > --- > > Key: YARN-873 > URL: https://issues.apache.org/jira/browse/YARN-873 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha >Assignee: Xuan Gong > > How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-885) TestBinaryTokenFile (and others) fail
Kam Kasravi created YARN-885: Summary: TestBinaryTokenFile (and others) fail Key: YARN-885 URL: https://issues.apache.org/jira/browse/YARN-885 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.4-alpha Reporter: Kam Kasravi Seeing the following stack trace and the unit test goes into a infinite loop: 2013-06-24 17:03:58,316 ERROR [LocalizerRunner for container_1372118631537_0001_01_01] security.UserGroupInformation (UserGroupInformation.java:doAs(1480)) - PriviledgedActionException as:kamkasravi (auth:SIMPLE) cause:java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections. 2013-06-24 17:03:58,317 WARN [LocalizerRunner for container_1372118631537_0001_01_01] ipc.Client (Client.java:run(579)) - Exception encountered while connecting to the server : java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections. 2013-06-24 17:03:58,318 ERROR [LocalizerRunner for container_1372118631537_0001_01_01] security.UserGroupInformation (UserGroupInformation.java:doAs(1480)) - PriviledgedActionException as:kamkasravi (auth:SIMPLE) cause:java.io.IOException: java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections. java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:56) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:247) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:181) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:859) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692572#comment-13692572 ] Chris Douglas commented on YARN-569: {{TestAMAuthorization}} also fails on trunk, YARN-878 > CapacityScheduler: support for preemption (using a capacity monitor) > > > Key: YARN-569 > URL: https://issues.apache.org/jira/browse/YARN-569 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, > preemption.2.patch, YARN-569.10.patch, YARN-569.1.patch, YARN-569.2.patch, > YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, > YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch > > > There is a tension between the fast-pace reactive role of the > CapacityScheduler, which needs to respond quickly to > applications resource requests, and node updates, and the more introspective, > time-based considerations > needed to observe and correct for capacity balance. To this purpose we opted > instead of hacking the delicate > mechanisms of the CapacityScheduler directly to add support for preemption by > means of a "Capacity Monitor", > which can be run optionally as a separate service (much like the > NMLivelinessMonitor). > The capacity monitor (similarly to equivalent functionalities in the fairness > scheduler) operates running on intervals > (e.g., every 3 seconds), observe the state of the assignment of resources to > queues from the capacity scheduler, > performs off-line computation to determine if preemption is needed, and how > best to "edit" the current schedule to > improve capacity, and generates events that produce four possible actions: > # Container de-reservations > # Resource-based preemptions > # Container-based preemptions > # Container killing > The actions listed above are progressively more costly, and it is up to the > policy to use them as desired to achieve the rebalancing goals. > Note that due to the "lag" in the effect of these actions the policy should > operate at the macroscopic level (e.g., preempt tens of containers > from a queue) and not trying to tightly and consistently micromanage > container allocations. > - Preemption policy (ProportionalCapacityPreemptionPolicy): > - > Preemption policies are by design pluggable, in the following we present an > initial policy (ProportionalCapacityPreemptionPolicy) we have been > experimenting with. The ProportionalCapacityPreemptionPolicy behaves as > follows: > # it gathers from the scheduler the state of the queues, in particular, their > current capacity, guaranteed capacity and pending requests (*) > # if there are pending requests from queues that are under capacity it > computes a new ideal balanced state (**) > # it computes the set of preemptions needed to repair the current schedule > and achieve capacity balance (accounting for natural completion rates, and > respecting bounds on the amount of preemption we allow for each round) > # it selects which applications to preempt from each over-capacity queue (the > last one in the FIFO order) > # it remove reservations from the most recently assigned app until the amount > of resource to reclaim is obtained, or until no more reservations exits > # (if not enough) it issues preemptions for containers from the same > applications (reverse chronological order, last assigned container first) > again until necessary or until no containers except the AM container are left, > # (if not enough) it moves onto unreserve and preempt from the next > application. > # containers that have been asked to preempt are tracked across executions. > If a containers is among the one to be preempted for more than a certain > time, the container is moved in a the list of containers to be forcibly > killed. > Notes: > (*) at the moment, in order to avoid double-counting of the requests, we only > look at the "ANY" part of pending resource requests, which means we might not > preempt on behalf of AMs that ask only for specific locations but not any. > (**) The ideal balance state is one in which each queue has at least its > guaranteed capacity, and the spare capacity is distributed among queues (that > wants some) as a weighted fair share. Where the weighting is based on the > guaranteed capacity of a queue, and the function runs to a fix point. > Tunables of the ProportionalCapacityPreemptionPolicy: > # observe-only mode (i.e., log the actions it would take, but behave as > read-only) > # how frequently to run the policy > # how long to wait between preemption and kill of a container > # wh
[jira] [Commented] (YARN-883) Expose Fair Scheduler-specific queue metrics
[ https://issues.apache.org/jira/browse/YARN-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692581#comment-13692581 ] Hadoop QA commented on YARN-883: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589510/YARN-883-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1392//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1392//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1392//console This message is automatically generated. > Expose Fair Scheduler-specific queue metrics > > > Key: YARN-883 > URL: https://issues.apache.org/jira/browse/YARN-883 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-883-1.patch, YARN-883.patch > > > When the Fair Scheduler is enabled, QueueMetrics should include fair share, > minimum share, and maximum share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692542#comment-13692542 ] Hadoop QA commented on YARN-569: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589506/YARN-569.10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1391//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1391//console This message is automatically generated. > CapacityScheduler: support for preemption (using a capacity monitor) > > > Key: YARN-569 > URL: https://issues.apache.org/jira/browse/YARN-569 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, > preemption.2.patch, YARN-569.10.patch, YARN-569.1.patch, YARN-569.2.patch, > YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, > YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch > > > There is a tension between the fast-pace reactive role of the > CapacityScheduler, which needs to respond quickly to > applications resource requests, and node updates, and the more introspective, > time-based considerations > needed to observe and correct for capacity balance. To this purpose we opted > instead of hacking the delicate > mechanisms of the CapacityScheduler directly to add support for preemption by > means of a "Capacity Monitor", > which can be run optionally as a separate service (much like the > NMLivelinessMonitor). > The capacity monitor (similarly to equivalent functionalities in the fairness > scheduler) operates running on intervals > (e.g., every 3 seconds), observe the state of the assignment of resources to > queues from the capacity scheduler, > performs off-line computation to determine if preemption is needed, and how > best to "edit" the current schedule to > improve capacity, and generates events that produce four possible actions: > # Container de-reservations > # Resource-based preemptions > # Container-based preemptions > # Container killing > The actions listed above are progressively more costly, and it is up to the > policy to use them as desired to achieve the rebalancing goals. > Note that due to the "lag" in the effect of these actions the policy should > operate at the macroscopic level (e.g., preempt tens of containers > from a queue) and not trying to tightly and consistently micromanage > container allocations. > - Preemption policy (ProportionalCapacityPreemptionPolicy): > - > Preemption policies are by design pluggable, in the following we present an > initial policy (ProportionalCapacityPreemptionPolicy) we have been > experimenting with. The ProportionalCapacityPreemptionPolicy behaves as > follows: > # it gathers from the scheduler the state of the queues, in particular, their > current capacity, guaranteed capacity and pending requests (*) > # if there are pending requests from queues that are under capacity it > computes a new ideal balanced state (**) > # it computes the set of preemptions needed to repair the current schedule > and achieve capacity balance (accounting for natural completion rates, and > respecting bounds on the amount of preemption we allow for each round) > # it selects which applications to preempt from each over-capacity queue (the > last one in the FIFO order) > # it remove reservations from
[jira] [Updated] (YARN-883) Expose Fair Scheduler-specific queue metrics
[ https://issues.apache.org/jira/browse/YARN-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-883: Attachment: YARN-883-1.patch > Expose Fair Scheduler-specific queue metrics > > > Key: YARN-883 > URL: https://issues.apache.org/jira/browse/YARN-883 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-883-1.patch, YARN-883.patch > > > When the Fair Scheduler is enabled, QueueMetrics should include fair share, > minimum share, and maximum share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-649) Make container logs available over HTTP in plain text
[ https://issues.apache.org/jira/browse/YARN-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692526#comment-13692526 ] Hadoop QA commented on YARN-649: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589505/YARN-649-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync org.apache.hadoop.yarn.server.nodemanager.containermanager.application.TestApplication org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1390//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1390//console This message is automatically generated. > Make container logs available over HTTP in plain text > - > > Key: YARN-649 > URL: https://issues.apache.org/jira/browse/YARN-649 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-649-2.patch, YARN-649-3.patch, YARN-649.patch, > YARN-752-1.patch > > > It would be good to make container logs available over the REST API for > MAPREDUCE-4362 and so that they can be accessed programatically in general. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-884) AM expiry interval should be set to smaller of {am, nm}.liveness-monitor.expiry-interval-ms
[ https://issues.apache.org/jira/browse/YARN-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692525#comment-13692525 ] Omkar Vinit Joshi commented on YARN-884: Probably these two are unrelated. First if NM goes down then obviously if AM is running on it has gone down but vis-a-versa is not true. In work preserving environment we would like to restart/resume the AM which will not be possible if we configure liveness interval of am = smallest of {am,nm}.. For example nm might be facing problems to connect to RM and may just end up heart beating with RM just before RM took the decision about starting new application attempt, marking earlier as failed... even if AM heartbeats immediately after that it would be waste... right?? I think we need am = larget of {am,nm} thoughts? > AM expiry interval should be set to smaller of {am, > nm}.liveness-monitor.expiry-interval-ms > --- > > Key: YARN-884 > URL: https://issues.apache.org/jira/browse/YARN-884 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.0.4-alpha >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: configuration > > As the AM can't outlive the NM on which it is running, it is a good idea to > disallow setting the am.liveness-monitor.expiry-interval-ms to a value higher > than nm.liveness-monitor.expiry-interval-ms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-569: --- Attachment: YARN-569.10.patch > CapacityScheduler: support for preemption (using a capacity monitor) > > > Key: YARN-569 > URL: https://issues.apache.org/jira/browse/YARN-569 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, > preemption.2.patch, YARN-569.10.patch, YARN-569.1.patch, YARN-569.2.patch, > YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, > YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch > > > There is a tension between the fast-pace reactive role of the > CapacityScheduler, which needs to respond quickly to > applications resource requests, and node updates, and the more introspective, > time-based considerations > needed to observe and correct for capacity balance. To this purpose we opted > instead of hacking the delicate > mechanisms of the CapacityScheduler directly to add support for preemption by > means of a "Capacity Monitor", > which can be run optionally as a separate service (much like the > NMLivelinessMonitor). > The capacity monitor (similarly to equivalent functionalities in the fairness > scheduler) operates running on intervals > (e.g., every 3 seconds), observe the state of the assignment of resources to > queues from the capacity scheduler, > performs off-line computation to determine if preemption is needed, and how > best to "edit" the current schedule to > improve capacity, and generates events that produce four possible actions: > # Container de-reservations > # Resource-based preemptions > # Container-based preemptions > # Container killing > The actions listed above are progressively more costly, and it is up to the > policy to use them as desired to achieve the rebalancing goals. > Note that due to the "lag" in the effect of these actions the policy should > operate at the macroscopic level (e.g., preempt tens of containers > from a queue) and not trying to tightly and consistently micromanage > container allocations. > - Preemption policy (ProportionalCapacityPreemptionPolicy): > - > Preemption policies are by design pluggable, in the following we present an > initial policy (ProportionalCapacityPreemptionPolicy) we have been > experimenting with. The ProportionalCapacityPreemptionPolicy behaves as > follows: > # it gathers from the scheduler the state of the queues, in particular, their > current capacity, guaranteed capacity and pending requests (*) > # if there are pending requests from queues that are under capacity it > computes a new ideal balanced state (**) > # it computes the set of preemptions needed to repair the current schedule > and achieve capacity balance (accounting for natural completion rates, and > respecting bounds on the amount of preemption we allow for each round) > # it selects which applications to preempt from each over-capacity queue (the > last one in the FIFO order) > # it remove reservations from the most recently assigned app until the amount > of resource to reclaim is obtained, or until no more reservations exits > # (if not enough) it issues preemptions for containers from the same > applications (reverse chronological order, last assigned container first) > again until necessary or until no containers except the AM container are left, > # (if not enough) it moves onto unreserve and preempt from the next > application. > # containers that have been asked to preempt are tracked across executions. > If a containers is among the one to be preempted for more than a certain > time, the container is moved in a the list of containers to be forcibly > killed. > Notes: > (*) at the moment, in order to avoid double-counting of the requests, we only > look at the "ANY" part of pending resource requests, which means we might not > preempt on behalf of AMs that ask only for specific locations but not any. > (**) The ideal balance state is one in which each queue has at least its > guaranteed capacity, and the spare capacity is distributed among queues (that > wants some) as a weighted fair share. Where the weighting is based on the > guaranteed capacity of a queue, and the function runs to a fix point. > Tunables of the ProportionalCapacityPreemptionPolicy: > # observe-only mode (i.e., log the actions it would take, but behave as > read-only) > # how frequently to run the policy > # how long to wait between preemption and kill of a container > # which fraction of the containers I would like to obtain should I preempt > (has to do with
[jira] [Updated] (YARN-649) Make container logs available over HTTP in plain text
[ https://issues.apache.org/jira/browse/YARN-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-649: Attachment: YARN-649-3.patch > Make container logs available over HTTP in plain text > - > > Key: YARN-649 > URL: https://issues.apache.org/jira/browse/YARN-649 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-649-2.patch, YARN-649-3.patch, YARN-649.patch, > YARN-752-1.patch > > > It would be good to make container logs available over the REST API for > MAPREDUCE-4362 and so that they can be accessed programatically in general. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-649) Make container logs available over HTTP in plain text
[ https://issues.apache.org/jira/browse/YARN-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692501#comment-13692501 ] Sandy Ryza commented on YARN-649: - Uploading a patch that takes Vinod's comments into account. It * Fixes the SecureIOUtils hole (doh!) * Makes separate ContainerLogsUtils#getContainerLogFile and getContainerLogDirs * Throws appropriate error codes instead of just returning a string * Uses StreamingOutput to avoid unbounded buffering * Marks the API as evolving I still need to add documentation. Regarding logs for old jobs, is there a reason that the implementation choice would change the API? > Make container logs available over HTTP in plain text > - > > Key: YARN-649 > URL: https://issues.apache.org/jira/browse/YARN-649 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-649-2.patch, YARN-649-3.patch, YARN-649.patch, > YARN-752-1.patch > > > It would be good to make container logs available over the REST API for > MAPREDUCE-4362 and so that they can be accessed programatically in general. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-864) YARN NM leaking containers with CGroups
[ https://issues.apache.org/jira/browse/YARN-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692476#comment-13692476 ] Chris Riccomini commented on YARN-864: -- Hey Jian, I re-deployed my test cluster with YARN-600, YARN-799, and your latest patch (.2.patch) from YARN-864. I simulated the timeout using kill -STOP (as described above), and your patch worked! :) I'm going to let the cluster run for 24h before declaring victory, but this looks promising. I'll follow up tomorrow, when I know more. Cheers, Chris > YARN NM leaking containers with CGroups > --- > > Key: YARN-864 > URL: https://issues.apache.org/jira/browse/YARN-864 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.5-alpha > Environment: YARN 2.0.5-alpha with patches applied for YARN-799 and > YARN-600. >Reporter: Chris Riccomini > Attachments: rm-log, YARN-864.1.patch, YARN-864.2.patch > > > Hey Guys, > I'm running YARN 2.0.5-alpha with CGroups and stateful RM turned on, and I'm > seeing containers getting leaked by the NMs. I'm not quite sure what's going > on -- has anyone seen this before? I'm concerned that maybe it's a > mis-understanding on my part about how YARN's lifecycle works. > When I look in my AM logs for my app (not an MR app master), I see: > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Got an exit code of -100. > This means that container container_1371141151815_0008_03_02 was killed > by YARN, either due to being released by the application master or being > 'lost' due to node failures etc. > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Released container > container_1371141151815_0008_03_02 was assigned task ID 0. Requesting a > new container for the task. > The AM has been running steadily the whole time. Here's what the NM logs say: > {noformat} > 05:34:59,783 WARN AsyncDispatcher:109 - Interrupted Exception while stopping > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1143) > at java.lang.Thread.join(Thread.java:1196) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:107) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:209) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:61) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) > at java.lang.Thread.run(Thread.java:619) > 05:35:00,314 WARN ContainersMonitorImpl:463 - > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > is interrupted. Exiting. > 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup > at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0006_01_001598 > 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup > at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0008_03_02 > 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. > java.io.IOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) > at org.apache.hadoop.util.Shell.run(Shell.java:129) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. > java.io.IOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) > at org.apache.hadoop.util.Shell.run(Shell.java:12
[jira] [Commented] (YARN-883) Expose Fair Scheduler-specific queue metrics
[ https://issues.apache.org/jira/browse/YARN-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692475#comment-13692475 ] Hadoop QA commented on YARN-883: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589493/YARN-883.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1389//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1389//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1389//console This message is automatically generated. > Expose Fair Scheduler-specific queue metrics > > > Key: YARN-883 > URL: https://issues.apache.org/jira/browse/YARN-883 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-883.patch > > > When the Fair Scheduler is enabled, QueueMetrics should include fair share, > minimum share, and maximum share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-884) AM expiry interval should be set to smaller of {am, nm}.liveness-monitor.expiry-interval-ms
Karthik Kambatla created YARN-884: - Summary: AM expiry interval should be set to smaller of {am, nm}.liveness-monitor.expiry-interval-ms Key: YARN-884 URL: https://issues.apache.org/jira/browse/YARN-884 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Karthik Kambatla Assignee: Karthik Kambatla As the AM can't outlive the NM on which it is running, it is a good idea to disallow setting the am.liveness-monitor.expiry-interval-ms to a value higher than nm.liveness-monitor.expiry-interval-ms -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-883) Expose Fair Scheduler-specific queue metrics
[ https://issues.apache.org/jira/browse/YARN-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692436#comment-13692436 ] Sandy Ryza commented on YARN-883: - Submitted patch that adds an FSQueueMetrics, which extends QueueMetrics. Verified that the metrics show up on a pseudo-distributed cluster. > Expose Fair Scheduler-specific queue metrics > > > Key: YARN-883 > URL: https://issues.apache.org/jira/browse/YARN-883 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-883.patch > > > When the Fair Scheduler is enabled, QueueMetrics should include fair share, > minimum share, and maximum share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-883) Expose Fair Scheduler-specific queue metrics
[ https://issues.apache.org/jira/browse/YARN-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-883: Attachment: YARN-883.patch > Expose Fair Scheduler-specific queue metrics > > > Key: YARN-883 > URL: https://issues.apache.org/jira/browse/YARN-883 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-883.patch > > > When the Fair Scheduler is enabled, QueueMetrics should include fair share, > minimum share, and maximum share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-883) Expose Fair Scheduler-specific queue metrics
[ https://issues.apache.org/jira/browse/YARN-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza moved MAPREDUCE-5350 to YARN-883: Component/s: (was: scheduler) scheduler Affects Version/s: (was: 2.0.5-alpha) 2.0.5-alpha Key: YARN-883 (was: MAPREDUCE-5350) Project: Hadoop YARN (was: Hadoop Map/Reduce) > Expose Fair Scheduler-specific queue metrics > > > Key: YARN-883 > URL: https://issues.apache.org/jira/browse/YARN-883 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-883.patch > > > When the Fair Scheduler is enabled, QueueMetrics should include fair share, > minimum share, and maximum share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-864) YARN NM leaking containers with CGroups
[ https://issues.apache.org/jira/browse/YARN-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692388#comment-13692388 ] Jian He commented on YARN-864: -- Hi Chris that failure was due to reboot starts even before stop fully completes. Uploaded a new patch, tested locally. let me know if that works, thx > YARN NM leaking containers with CGroups > --- > > Key: YARN-864 > URL: https://issues.apache.org/jira/browse/YARN-864 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.5-alpha > Environment: YARN 2.0.5-alpha with patches applied for YARN-799 and > YARN-600. >Reporter: Chris Riccomini > Attachments: rm-log, YARN-864.1.patch, YARN-864.2.patch > > > Hey Guys, > I'm running YARN 2.0.5-alpha with CGroups and stateful RM turned on, and I'm > seeing containers getting leaked by the NMs. I'm not quite sure what's going > on -- has anyone seen this before? I'm concerned that maybe it's a > mis-understanding on my part about how YARN's lifecycle works. > When I look in my AM logs for my app (not an MR app master), I see: > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Got an exit code of -100. > This means that container container_1371141151815_0008_03_02 was killed > by YARN, either due to being released by the application master or being > 'lost' due to node failures etc. > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Released container > container_1371141151815_0008_03_02 was assigned task ID 0. Requesting a > new container for the task. > The AM has been running steadily the whole time. Here's what the NM logs say: > {noformat} > 05:34:59,783 WARN AsyncDispatcher:109 - Interrupted Exception while stopping > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1143) > at java.lang.Thread.join(Thread.java:1196) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:107) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:209) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:61) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) > at java.lang.Thread.run(Thread.java:619) > 05:35:00,314 WARN ContainersMonitorImpl:463 - > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > is interrupted. Exiting. > 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup > at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0006_01_001598 > 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup > at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0008_03_02 > 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. > java.io.IOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) > at org.apache.hadoop.util.Shell.run(Shell.java:129) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. > java.io.IOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) > at org.apache.hadoop.util.Shell.run(Shell.java:129) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230)
[jira] [Updated] (YARN-864) YARN NM leaking containers with CGroups
[ https://issues.apache.org/jira/browse/YARN-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-864: - Attachment: YARN-864.2.patch > YARN NM leaking containers with CGroups > --- > > Key: YARN-864 > URL: https://issues.apache.org/jira/browse/YARN-864 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.5-alpha > Environment: YARN 2.0.5-alpha with patches applied for YARN-799 and > YARN-600. >Reporter: Chris Riccomini > Attachments: rm-log, YARN-864.1.patch, YARN-864.2.patch > > > Hey Guys, > I'm running YARN 2.0.5-alpha with CGroups and stateful RM turned on, and I'm > seeing containers getting leaked by the NMs. I'm not quite sure what's going > on -- has anyone seen this before? I'm concerned that maybe it's a > mis-understanding on my part about how YARN's lifecycle works. > When I look in my AM logs for my app (not an MR app master), I see: > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Got an exit code of -100. > This means that container container_1371141151815_0008_03_02 was killed > by YARN, either due to being released by the application master or being > 'lost' due to node failures etc. > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Released container > container_1371141151815_0008_03_02 was assigned task ID 0. Requesting a > new container for the task. > The AM has been running steadily the whole time. Here's what the NM logs say: > {noformat} > 05:34:59,783 WARN AsyncDispatcher:109 - Interrupted Exception while stopping > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1143) > at java.lang.Thread.join(Thread.java:1196) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:107) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:209) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:61) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) > at java.lang.Thread.run(Thread.java:619) > 05:35:00,314 WARN ContainersMonitorImpl:463 - > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > is interrupted. Exiting. > 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup > at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0006_01_001598 > 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup > at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0008_03_02 > 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. > java.io.IOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) > at org.apache.hadoop.util.Shell.run(Shell.java:129) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. > java.io.IOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) > at org.apache.hadoop.util.Shell.run(Shell.java:129) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242) > at > org.apache.hadoop.yarn.server.nodemanage
[jira] [Commented] (YARN-864) YARN NM leaking containers with CGroups
[ https://issues.apache.org/jira/browse/YARN-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692296#comment-13692296 ] Chris Riccomini commented on YARN-864: -- Hey Jian, With your patch applied, the new error (in the NM) is: {noformat} 19:33:36,741 INFO NodeStatusUpdaterImpl:365 - Node is out of sync with ResourceManager, hence rebooting. 19:33:36,764 INFO ContainersMonitorImpl:399 - Memory usage of ProcessTree 14751 for container-id container_1372091455469_0002_01_02: 779.3 MB of 1.3 GB physical memory used; 1.6 GB of 10 GB virtual memory used 19:33:37,239 INFO NodeManager:315 - Rebooting the node manager. 19:33:37,261 INFO NodeManager:229 - Containers still running on shutdown: [container_1372091455469_0002_01_02] 19:33:37,278 FATAL AsyncDispatcher:137 - Error in dispatcher thread org.apache.hadoop.metrics2.MetricsException: Metrics source JvmMetrics already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217) at org.apache.hadoop.metrics2.source.JvmMetrics.create(JvmMetrics.java:79) at org.apache.hadoop.yarn.server.nodemanager.metrics.NodeManagerMetrics.create(NodeManagerMetrics.java:49) at org.apache.hadoop.yarn.server.nodemanager.metrics.NodeManagerMetrics.create(NodeManagerMetrics.java:45) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.(NodeManager.java:75) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.createNewNodeManager(NodeManager.java:357) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.reboot(NodeManager.java:316) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:348) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:61) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:619) {noformat} For the record, you can reproduce this yourself by: 1. Start a YARN RM and NM. 2. Run a YARN job on the cluster that uses at least one container. 3. Run kill -STOP on the NM. 4. Wait 65 seconds (enough for the NM to time out). 5. Run kill -CONT You will see the NM trigger a reboot since it's out of sync with the RM. > YARN NM leaking containers with CGroups > --- > > Key: YARN-864 > URL: https://issues.apache.org/jira/browse/YARN-864 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.5-alpha > Environment: YARN 2.0.5-alpha with patches applied for YARN-799 and > YARN-600. >Reporter: Chris Riccomini > Attachments: rm-log, YARN-864.1.patch > > > Hey Guys, > I'm running YARN 2.0.5-alpha with CGroups and stateful RM turned on, and I'm > seeing containers getting leaked by the NMs. I'm not quite sure what's going > on -- has anyone seen this before? I'm concerned that maybe it's a > mis-understanding on my part about how YARN's lifecycle works. > When I look in my AM logs for my app (not an MR app master), I see: > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Got an exit code of -100. > This means that container container_1371141151815_0008_03_02 was killed > by YARN, either due to being released by the application master or being > 'lost' due to node failures etc. > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Released container > container_1371141151815_0008_03_02 was assigned task ID 0. Requesting a > new container for the task. > The AM has been running steadily the whole time. Here's what the NM logs say: > {noformat} > 05:34:59,783 WARN AsyncDispatcher:109 - Interrupted Exception while stopping > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1143) > at java.lang.Thread.join(Thread.java:1196) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:107) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:209) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:61) > at > org.apache.hadoop.yarn.event.AsyncDis
[jira] [Created] (YARN-882) Specify per user quota for private/application cache and user log files
Omkar Vinit Joshi created YARN-882: -- Summary: Specify per user quota for private/application cache and user log files Key: YARN-882 URL: https://issues.apache.org/jira/browse/YARN-882 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi At present there is no limit on the number of files / size of the files localized by single user. Similarly there is no limit on the size of the log files created by user via running containers. We need to restrict the user for this. For LocalizedResources; this has serious concerns in case of secured environment where malicious user can start one container and localize resources whose total size >= DEFAULT_NM_LOCALIZER_CACHE_TARGET_SIZE_MB. Thereafter it will either fail (if no extra space is present on disk) or deletion service will keep removing localized files for other containers/applications. The limit for logs/localized resource should be decided by RM and sent to NM via secured containerToken. All these configurations should per container instead of per user or per nm. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-882) Specify per user quota for private/application cache and user log files
[ https://issues.apache.org/jira/browse/YARN-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-882: --- Description: At present there is no limit on the number of files / size of the files localized by single user. Similarly there is no limit on the size of the log files created by user via running containers. We need to restrict the user for this. For LocalizedResources; this has serious concerns in case of secured environment where malicious user can start one container and localize resources whose total size >= DEFAULT_NM_LOCALIZER_CACHE_TARGET_SIZE_MB. Thereafter it will either fail (if no extra space is present on disk) or deletion service will keep removing localized files for other containers/applications. The limit for logs/localized resources should be decided by RM and sent to NM via secured containerToken. All these configurations should per container instead of per user or per nm. was: At present there is no limit on the number of files / size of the files localized by single user. Similarly there is no limit on the size of the log files created by user via running containers. We need to restrict the user for this. For LocalizedResources; this has serious concerns in case of secured environment where malicious user can start one container and localize resources whose total size >= DEFAULT_NM_LOCALIZER_CACHE_TARGET_SIZE_MB. Thereafter it will either fail (if no extra space is present on disk) or deletion service will keep removing localized files for other containers/applications. The limit for logs/localized resource should be decided by RM and sent to NM via secured containerToken. All these configurations should per container instead of per user or per nm. > Specify per user quota for private/application cache and user log files > --- > > Key: YARN-882 > URL: https://issues.apache.org/jira/browse/YARN-882 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > At present there is no limit on the number of files / size of the files > localized by single user. Similarly there is no limit on the size of the log > files created by user via running containers. > We need to restrict the user for this. > For LocalizedResources; this has serious concerns in case of secured > environment where malicious user can start one container and localize > resources whose total size >= DEFAULT_NM_LOCALIZER_CACHE_TARGET_SIZE_MB. > Thereafter it will either fail (if no extra space is present on disk) or > deletion service will keep removing localized files for other > containers/applications. > The limit for logs/localized resources should be decided by RM and sent to NM > via secured containerToken. All these configurations should per container > instead of per user or per nm. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-339) TestResourceTrackerService is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692270#comment-13692270 ] Ravi Prakash commented on YARN-339: --- Hi Vinod! Nopes! I can't reproduce this anymore. Closing as fixed. Please re-open if you think the patch should still go in. Thanks Jianhe and Vinod! > TestResourceTrackerService is failing intermittently > > > Key: YARN-339 > URL: https://issues.apache.org/jira/browse/YARN-339 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0, 0.23.5 >Reporter: Ravi Prakash >Assignee: Jian He > Attachments: YARN-339.patch > > > The test after testReconnectNode() is failing usually. This might be a race > condition in Metrics2 code. > Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.127 sec <<< > FAILURE! > testDecommissionWithIncludeHosts(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) > Time elapsed: 55 sec <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source ClusterMetrics > already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:134) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:115) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.registerMetrics(ClusterMetrics.java:71) > at > org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.getMetrics(ClusterMetrics.java:58) > at > org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testDecommissionWithIncludeHosts(TestResourceTrackerService.java:74) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-736) Add a multi-resource fair sharing metric
[ https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692241#comment-13692241 ] Hudson commented on YARN-736: - Integrated in Hadoop-trunk-Commit #4005 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4005/]) YARN-736. Add a multi-resource fair sharing metric. (sandyr via tucu) (Revision 1496153) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1496153 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/Resources.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/Schedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/ComputeFairShares.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FifoPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FakeSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestComputeFairShares.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/TestDominantResourceFairnessPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm > Add a multi-resource fair sharing metric > > > Key: YARN-736 > URL: https://issues.apache.org/jira/browse/YARN-736 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.2.0 > > Attachments: YARN-736-1.patch, YARN-736-2.patch, YARN-736-3.patch, > YARN-736-4.patch, YARN-736.patch > > > Currently, at a regular interval, the fair scheduler computes a fair memory > share for each queue and application inside it. This fair share is not used > for scheduling decisions, but is displayed in the web UI, exposed as a > metric, and used for preemption decisions. > With DRF and multi-resource scheduling, assigning a memory share as the fair > share metric to every queue no longer makes sense. It's not obvious what the > replacement should be, but probably something like fractional fairness within > a queue, or distance from an ideal cluster state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-881) Priority#compareTo method seems to be wrong.
[ https://issues.apache.org/jira/browse/YARN-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692224#comment-13692224 ] Sandy Ryza commented on YARN-881: - There are places in the code that rely on the current ordering, AppSchedulingInfo, for example. The thinking may have been that we most commonly want to traverse priorities from high to low, which is more straightforward if the higher ones are at the front of the list. > Priority#compareTo method seems to be wrong. > > > Key: YARN-881 > URL: https://issues.apache.org/jira/browse/YARN-881 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > > if lower int value means higher priority, shouldn't we "return > other.getPriority() - this.getPriority() " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-736) Add a multi-resource fair sharing metric
[ https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692212#comment-13692212 ] Alejandro Abdelnur commented on YARN-736: - +1 > Add a multi-resource fair sharing metric > > > Key: YARN-736 > URL: https://issues.apache.org/jira/browse/YARN-736 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-736-1.patch, YARN-736-2.patch, YARN-736-3.patch, > YARN-736-4.patch, YARN-736.patch > > > Currently, at a regular interval, the fair scheduler computes a fair memory > share for each queue and application inside it. This fair share is not used > for scheduling decisions, but is displayed in the web UI, exposed as a > metric, and used for preemption decisions. > With DRF and multi-resource scheduling, assigning a memory share as the fair > share metric to every queue no longer makes sense. It's not obvious what the > replacement should be, but probably something like fractional fairness within > a queue, or distance from an ideal cluster state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-881) Priority#compareTo method seems to be wrong.
Jian He created YARN-881: Summary: Priority#compareTo method seems to be wrong. Key: YARN-881 URL: https://issues.apache.org/jira/browse/YARN-881 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He if lower int value means higher priority, shouldn't we "return other.getPriority() - this.getPriority() " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-864) YARN NM leaking containers with CGroups
[ https://issues.apache.org/jira/browse/YARN-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692138#comment-13692138 ] Chris Riccomini commented on YARN-864: -- Hey Jian, Awesome. I've patched and started the cluster with YARN-600, YARN-799, and YARN-864. I'll keep you posted. Cheers, Chris > YARN NM leaking containers with CGroups > --- > > Key: YARN-864 > URL: https://issues.apache.org/jira/browse/YARN-864 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.5-alpha > Environment: YARN 2.0.5-alpha with patches applied for YARN-799 and > YARN-600. >Reporter: Chris Riccomini > Attachments: rm-log, YARN-864.1.patch > > > Hey Guys, > I'm running YARN 2.0.5-alpha with CGroups and stateful RM turned on, and I'm > seeing containers getting leaked by the NMs. I'm not quite sure what's going > on -- has anyone seen this before? I'm concerned that maybe it's a > mis-understanding on my part about how YARN's lifecycle works. > When I look in my AM logs for my app (not an MR app master), I see: > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Got an exit code of -100. > This means that container container_1371141151815_0008_03_02 was killed > by YARN, either due to being released by the application master or being > 'lost' due to node failures etc. > 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Released container > container_1371141151815_0008_03_02 was assigned task ID 0. Requesting a > new container for the task. > The AM has been running steadily the whole time. Here's what the NM logs say: > {noformat} > 05:34:59,783 WARN AsyncDispatcher:109 - Interrupted Exception while stopping > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Thread.join(Thread.java:1143) > at java.lang.Thread.join(Thread.java:1196) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:107) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:209) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:61) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) > at java.lang.Thread.run(Thread.java:619) > 05:35:00,314 WARN ContainersMonitorImpl:463 - > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > is interrupted. Exiting. > 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup > at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0006_01_001598 > 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup > at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0008_03_02 > 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. > java.io.IOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) > at org.apache.hadoop.util.Shell.run(Shell.java:129) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. > java.io.IOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) > at org.apache.hadoop.util.Shell.run(Shell.java:129) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230) > at > org
[jira] [Commented] (YARN-871) Failed to run MR example against latest trunk
[ https://issues.apache.org/jira/browse/YARN-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692108#comment-13692108 ] Zhijie Shen commented on YARN-871: -- [~devaraj.k], the posted exception seems to be related to HADOOP-9421 and YARN-827. YARN-874 is tracking the issue. > Failed to run MR example against latest trunk > - > > Key: YARN-871 > URL: https://issues.apache.org/jira/browse/YARN-871 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen > Attachments: yarn-zshen-resourcemanager-ZShens-MacBook-Pro.local.log > > > Built the latest trunk, deployed a single node cluster and ran examples, such > as > {code} > hadoop jar > hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar > teragen 10 out1 > {code} > The job failed with the following console message: > {code} > 13/06/21 12:51:25 INFO mapreduce.Job: Running job: job_1371844267731_0001 > 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 running in > uber mode : false > 13/06/21 12:51:31 INFO mapreduce.Job: map 0% reduce 0% > 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 failed with > state FAILED due to: Application application_1371844267731_0001 failed 2 > times due to AM Container for appattempt_1371844267731_0001_02 exited > with exitCode: 127 due to: > .Failing this attempt.. Failing the application. > 13/06/21 12:51:31 INFO mapreduce.Job: Counters: 0 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-880) Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution
Nishan Shetty created YARN-880: -- Summary: Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution Key: YARN-880 URL: https://issues.apache.org/jira/browse/YARN-880 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.1-alpha Reporter: Nishan Shetty Priority: Critical Scenario: = Cluster is installed with 2 Nodemanagers Configuraiton: NM memory (yarn.nodemanager.resource.memory-mb): 8 gb map and reduce memory : 8 gb Appmaster memory: 2 gb If map task is reserved on the same nodemanager where appmaster of the same job is running then job execution hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-871) Failed to run MR example against latest trunk
[ https://issues.apache.org/jira/browse/YARN-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691868#comment-13691868 ] Devaraj K commented on YARN-871: {code:xml} 2013-06-24 20:58:05,102 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1372087479441_0002 failed 2 times due to Error launching appattempt_1372087479441_0002_02. Got exception: java.io.IOException: Failed on local exception: java.io.IOException: java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.; Host Details : local host is: "HOST-10-18-91-57/10.18.91.57"; destination host is: "HOST-10-18-91-57":12356; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1318) at org.apache.hadoop.ipc.Client.call(Client.java:1266) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy23.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainer(ContainerManagementProtocolPBClientImpl.java:110) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:110) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:228) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections. at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:589) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:552) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:635) at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:258) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1367) at org.apache.hadoop.ipc.Client.call(Client.java:1285) ... 9 more Caused by: java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections. at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:250) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:464) at org.apache.hadoop.ipc.Client$Connection.access$1500(Client.java:258) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:628) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:625) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:624) ... 12 more . Failing the application. {code} > Failed to run MR example against latest trunk > - > > Key: YARN-871 > URL: https://issues.apache.org/jira/browse/YARN-871 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen > Attachments: yarn-zshen-resourcemanager-ZShens-MacBook-Pro.local.log > > > Built the latest trunk, deployed a single node cluster and ran examples, such > as > {code} > hadoop jar > hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar > teragen 10 out1 > {code} > The job failed with the following console message: > {code} > 13/06/21 12:51:25 INFO mapreduce.Job: Running job: job_1371844267731_0001 > 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 running in > uber mode : false > 13/06/21 12:51:31 INFO mapreduce.Job: map 0% reduce 0% > 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 failed with > state FAILED due to: Application application_1371844267731_0001 failed 2 > times due to AM Container for appattempt_1371844267731_0001_02 exited > with exitCode: 127 due to: > .Failing this attempt.. Failing the application. > 13/06/21 12:51:31 INFO mapreduce.Job: Counters: 0 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JI