[jira] [Comment Edited] (YARN-8541) RM startup failure on recovery after user deletion
[ https://issues.apache.org/jira/browse/YARN-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553819#comment-16553819 ] Bibin A Chundatt edited comment on YARN-8541 at 7/24/18 5:19 AM: - On recovery the application context will have the previously set queueName. At scheduler side if queue doesn't exists the application will get killed. Hope i was able to clarify your doubt. was (Author: bibinchundatt): On recovery the application context will have the previously set queueName. At scheduler side if queue doesn't exists the application will get killed. Hope i hope was able to clarify your doubt. > RM startup failure on recovery after user deletion > -- > > Key: YARN-8541 > URL: https://issues.apache.org/jira/browse/YARN-8541 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: yimeng >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: YARN-8541.001.patch, YARN-8541.002.patch, > YARN-8541.003.patch > > > My hadoop version 3.1.0. I found that a problem RM startup failure on > recovery as the follow test step: > 1.create a user "user1" have the permisson to submit app. > 2.use user1 to submit a job ,wait job finished. > 3.delete user "user1" > 4.restart yarn > 5.the RM restart failed > RM logs: > 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized root queue > root: numChildQueue= 3, capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.0, numApps=0, > numContainers=0 | CapacitySchedulerQueueManager.java:163 > 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized queue > mappings, override: false | UserGroupMappingPlacementRule.java:232 > 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized > CapacityScheduler with calculator=class > org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, > minimumAllocation=<>, maximumAllocation=< vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms | > CapacityScheduler.java:392 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | dynamic-resources.xml not > found | Configuration.java:2767 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | Initializing AMS > Processing chain. Root > Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor]. > | AMSProcessingChain.java:62 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | disabled placement > handler will be used, all scheduling requests will be rejected. | > ApplicationMasterService.java:130 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | Adding > [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor] > tp top of AMS Processing chain. | AMSProcessingChain.java:75 > 2018-07-16 16:24:59,713 | WARN | main-EventThread | Exception handling the > winning of election | ActiveStandbyElector.java:897 > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > ... 4 more > Caused by: org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application > application_1531624956005_0001 submitted by user super reason: No groups > found for user super > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1204) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1245) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1241) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at >
[jira] [Commented] (YARN-8541) RM startup failure on recovery after user deletion
[ https://issues.apache.org/jira/browse/YARN-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553819#comment-16553819 ] Bibin A Chundatt commented on YARN-8541: On recovery the application context will have the previously set queueName. At scheduler side if queue doesn't exists the application will get killed. Hope i hope was able to clarify your doubt. > RM startup failure on recovery after user deletion > -- > > Key: YARN-8541 > URL: https://issues.apache.org/jira/browse/YARN-8541 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: yimeng >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: YARN-8541.001.patch, YARN-8541.002.patch, > YARN-8541.003.patch > > > My hadoop version 3.1.0. I found that a problem RM startup failure on > recovery as the follow test step: > 1.create a user "user1" have the permisson to submit app. > 2.use user1 to submit a job ,wait job finished. > 3.delete user "user1" > 4.restart yarn > 5.the RM restart failed > RM logs: > 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized root queue > root: numChildQueue= 3, capacity=1.0, absoluteCapacity=1.0, > usedResources=usedCapacity=0.0, numApps=0, > numContainers=0 | CapacitySchedulerQueueManager.java:163 > 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized queue > mappings, override: false | UserGroupMappingPlacementRule.java:232 > 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized > CapacityScheduler with calculator=class > org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, > minimumAllocation=<>, maximumAllocation=< vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms | > CapacityScheduler.java:392 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | dynamic-resources.xml not > found | Configuration.java:2767 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | Initializing AMS > Processing chain. Root > Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor]. > | AMSProcessingChain.java:62 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | disabled placement > handler will be used, all scheduling requests will be rejected. | > ApplicationMasterService.java:130 > 2018-07-16 16:24:59,709 | INFO | main-EventThread | Adding > [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor] > tp top of AMS Processing chain. | AMSProcessingChain.java:75 > 2018-07-16 16:24:59,713 | WARN | main-EventThread | Exception handling the > winning of election | ActiveStandbyElector.java:897 > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > ... 4 more > Caused by: org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application > application_1531624956005_0001 submitted by user super reason: No groups > found for user super > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1204) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1245) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1241) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1241) > at >
[jira] [Commented] (YARN-8418) App local logs could leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553817#comment-16553817 ] Bibin A Chundatt commented on YARN-8418: Thank you [~suma.shivaprasad] for looking in to patch {quote} Wouldnt this cause leaks in case of other Exception cases (not Invalid token cases) {quote} {{aggregatorWrapper}} thread takes cares of closing once aggregator thread is compelte {code} // Schedule the aggregator. Runnable aggregatorWrapper = new Runnable() { public void run() { try { appLogAggregator.run(); } finally { appLogAggregators.remove(appId); closeFileSystems(userUgi); } } }; {code} {quote} when the scheduled aggregator runs? {quote} Once exception we have disableFlag per aggregator . So if aggregator is disabled then the upload will be skipped and only the postCleanUp is done. {{AppLogAggregatorImpl#run()}} > App local logs could leaked if log aggregation fails to initialize for the app > -- > > Key: YARN-8418 > URL: https://issues.apache.org/jira/browse/YARN-8418 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-8418.001.patch, YARN-8418.002.patch, > YARN-8418.003.patch, YARN-8418.004.patch, YARN-8418.005.patch > > > If log aggregation fails init createApp directory container logs could get > leaked in NM directory > For log running application restart of NM after token renewal this case is > possible/ Application submission with invalid delegation token -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed
[ https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553795#comment-16553795 ] genericqa commented on YARN-7748: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 3s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}129m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-7748 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932819/YARN-7748.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d2115bfcfef0 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8688a0c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21349/testReport/ | | Max. process+thread count | 865 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21349/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. >
[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553745#comment-16553745 ] Sunil Govindan commented on YARN-8548: -- Thanks. Patch looks good to me. > AllocationRespose proto setNMToken initBuilder not done > --- > > Key: YARN-8548 > URL: https://issues.apache.org/jira/browse/YARN-8548 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-8548-001.patch, YARN-8548-002.patch, > YARN-8548-003.patch, YARN-8548-004.patch, YARN-8548-005.patch > > > Distributed Scheduling allocate failing > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1355) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy85.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed
[ https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553742#comment-16553742 ] Sunil Govindan commented on YARN-7748: -- Thanks. Yes, this change looks good to me. Pending jenkins. > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > failed > > > Key: YARN-7748 > URL: https://issues.apache.org/jira/browse/YARN-7748 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0 >Reporter: Haibo Chen >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-7748.001.patch, YARN-7748.002.patch, > YARN-7748.003.patch > > > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > Failing for the past 1 build (Since Failed#19244 ) > Took 0.4 sec. > *Error Message* > expected null, but > was: > *Stacktrace* > {code} > java.lang.AssertionError: expected null, but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed
[ https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553698#comment-16553698 ] Weiwei Yang edited comment on YARN-7748 at 7/24/18 2:29 AM: Thanks [~snemeth], I am taking this one over. [~haibochen], [~sunilg], could you please take a look at this fix? was (Author: cheersyang): Thanks [~snemeth], I am taking this one over. [~sunilg], could you please take a look at this fix? > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > failed > > > Key: YARN-7748 > URL: https://issues.apache.org/jira/browse/YARN-7748 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0 >Reporter: Haibo Chen >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-7748.001.patch, YARN-7748.002.patch, > YARN-7748.003.patch > > > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > Failing for the past 1 build (Since Failed#19244 ) > Took 0.4 sec. > *Error Message* > expected null, but > was: > *Stacktrace* > {code} > java.lang.AssertionError: expected null, but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed
[ https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned YARN-7748: - Assignee: Weiwei Yang (was: Szilard Nemeth) > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > failed > > > Key: YARN-7748 > URL: https://issues.apache.org/jira/browse/YARN-7748 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0 >Reporter: Haibo Chen >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-7748.001.patch, YARN-7748.002.patch, > YARN-7748.003.patch > > > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > Failing for the past 1 build (Since Failed#19244 ) > Took 0.4 sec. > *Error Message* > expected null, but > was: > *Stacktrace* > {code} > java.lang.AssertionError: expected null, but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed
[ https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553698#comment-16553698 ] Weiwei Yang commented on YARN-7748: --- Thanks [~snemeth], I am taking this one over. [~sunilg], could you please take a look at this fix? > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > failed > > > Key: YARN-7748 > URL: https://issues.apache.org/jira/browse/YARN-7748 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0 >Reporter: Haibo Chen >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-7748.001.patch, YARN-7748.002.patch, > YARN-7748.003.patch > > > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > Failing for the past 1 build (Since Failed#19244 ) > Took 0.4 sec. > *Error Message* > expected null, but > was: > *Stacktrace* > {code} > java.lang.AssertionError: expected null, but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed
[ https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-7748: -- Attachment: YARN-7748.003.patch > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > failed > > > Key: YARN-7748 > URL: https://issues.apache.org/jira/browse/YARN-7748 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0 >Reporter: Haibo Chen >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-7748.001.patch, YARN-7748.002.patch, > YARN-7748.003.patch > > > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > Failing for the past 1 build (Since Failed#19244 ) > Took 0.4 sec. > *Error Message* > expected null, but > was: > *Stacktrace* > {code} > java.lang.AssertionError: expected null, but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553693#comment-16553693 ] Weiwei Yang commented on YARN-8559: --- cc [~leftnoteasy], [~sunilg] for review. > Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint > > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8559.001.patch, YARN-8559.002.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8559: -- Priority: Major (was: Minor) > Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint > > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8559.001.patch, YARN-8559.002.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler in Resource Manager's /scheduler-conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8559: -- Summary: Expose mutable-conf scheduler in Resource Manager's /scheduler-conf endpoint (was: Expose scheduling configuration info in Resource Manager's /conf endpoint) > Expose mutable-conf scheduler in Resource Manager's /scheduler-conf endpoint > > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Minor > Attachments: YARN-8559.001.patch, YARN-8559.002.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8559: -- Summary: Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint (was: Expose mutable-conf scheduler in Resource Manager's /scheduler-conf endpoint) > Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint > > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Minor > Attachments: YARN-8559.001.patch, YARN-8559.002.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes
[ https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553691#comment-16553691 ] Naganarasimha G R commented on YARN-7863: - Have done just a first pass review, few higher level comments : # I think Distributed shell related modifications can be pulled into separate Jira (YARN-8289) as discussed earlier # I could not get in the meeting wrt to Attributes and Partitions/labels, IIUC in ResourceRequest we specify the label expression directly but in SchedulingRequest we can give it as part of PlacementConstraint ? And if so we club it with the attribute expression ? May be we need capture that explicitly over here.can you share a sample request with both of them available? # Attach the doc which captured our conclusions on the expression or capture the essense of it as Jira's descritpion Other minor aspects: * PlacementConstraints.java ln 52- 53, better to use NodeAttribute.PREFIX_DISTRIBUTED & PREFIX_CENTRALIZED * PlacementConstraintParser ln 363-436 NodeConstraintsTokenizer seems to be not used anywhere ? * ResourceManager ln no 652-654 createNodeAttributesManager should either return NodeAttributesManagerImpl or take parameter as rmContext instead of type casting * .bowerrc ln no 3, might not be related to this patch ? * TestPlacementConstraintParser : captures a single test case , we need to capture more examples of capturing the positive and negative expression validation cases * TestPlacementConstraintParser ln no 457 : is this the approach which we finalised ? i thought its much more expressive like the document in PlacementConstraintParser.parsePlacementSpecForAttributes As well change the status of the Jira to patch submitted so that jenkins build will be triggered. > Modify placement constraints to support node attributes > --- > > Key: YARN-7863 > URL: https://issues.apache.org/jira/browse/YARN-7863 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-7863-YARN-3409.002.patch, YARN-7863.v0.patch > > > This Jira will track to *Modify existing placement constraints to support > node attributes.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-8564: Description: I wish to have application lifetime monitor for queue level in FairSheduler. In our large yarn cluster, sometimes there are too many small jobs in one minor queue but may run too long, it may affect our our high priority and very important queue . If we can have queue level application lifetime monitor in the queue level, and set small lifetime in the minor queue. (was: I wish to have application lifetime monitor for queue level in FairSheduler. In our large yarn cluster, sometimes there are too many small jobs in one minor queue but may run too long, it may cause our long time job in our high priority and very important queue pending too many times. If we can have queue level application lifetime monitor in the queue level, and set small lifetime in the minor queue, i think it may can solve our problems before the global schedulering can be used in FairScheduler.) > Add queue level application lifetime monitor in FairScheduler > -- > > Key: YARN-8564 > URL: https://issues.apache.org/jira/browse/YARN-8564 > Project: Hadoop YARN > Issue Type: Wish > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: zhuqi >Priority: Major > > I wish to have application lifetime monitor for queue level in FairSheduler. > In our large yarn cluster, sometimes there are too many small jobs in one > minor queue but may run too long, it may affect our our high priority and > very important queue . If we can have queue level application lifetime > monitor in the queue level, and set small lifetime in the minor queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8380) Support bind propagation options for mounts in docker runtime
[ https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553624#comment-16553624 ] Hudson commented on YARN-8380: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14619 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14619/]) YARN-8380. Support bind propagation options for mounts in docker (eyang: rev 8688a0c7f88f2adf1a7fce695e06f3dd1f745080) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerRunCommand.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV1CommandPlugin.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md > Support bind propagation options for mounts in docker runtime > - > > Key: YARN-8380 > URL: https://issues.apache.org/jira/browse/YARN-8380 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch > > > The docker run command supports bind propagation options such as shared, but > currently we are only supporting ro and rw mount modes in the docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553540#comment-16553540 ] Jason Lowe commented on YARN-8330: -- One of the envisioned use-cases of ATSv2 was to record every container allocation for an application in order to do analysis like what is done in YARN-415. That requires recording every container that enters the ALLOCATED state. bq. If information is collected for end user consumption to understand their application usage, then by collecting RUNNING state might be sufficient. That does not cover the case where an application receives containers but for some reason holds onto the allocations for a while before launching them. Tez, for example, has some corner cases in its scheduler where it can hold onto container allocations for prolonged periods before launching them while it reuses other, already active containers. Holding onto unlaunched container allocations is part of its footprint on the cluster. Only showing containers that made it to the RUNNING is only telling part of the application's usage story. Fixing the yarn container -list command problem does not mean the fix has to solely be in ATSv2. If the data for a container in ATS is sufficient to distinguish allocated containers from running containers then this can, and arguably should be, filtered for the yarn container -list use-case. If the data recorded for a container can't distinguish this then we should look into fixing that so it can be. But I don't think we should pre-filter ALLOCATED containers on the RM publishing side and preclude the proper app footprint analysis use-case entirely. > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch, YARN-8330.2.patch, YARN-8330.3.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3
[jira] [Created] (YARN-8570) GPU support in SLS
Jonathan Hung created YARN-8570: --- Summary: GPU support in SLS Key: YARN-8570 URL: https://issues.apache.org/jira/browse/YARN-8570 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung Assignee: Jonathan Hung Currently resource requests in SLS only support memory and vcores. Since GPU is natively supported by YARN, it will be useful to support requesting GPU resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553500#comment-16553500 ] genericqa commented on YARN-8545: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 26s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 63m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8545 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932779/YARN-8545.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 56cd137fb41c 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 17e2616 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21348/testReport/ | | Max. process+thread count | 755 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21348/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > YARN native service
[jira] [Comment Edited] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553426#comment-16553426 ] Chandni Singh edited comment on YARN-8545 at 7/23/18 9:36 PM: -- In Patch 1 : - releasing containers that failed - removing failed containers from live instances was (Author: csingh): In Patch 1 : - releasing containers that failed - removing containers from live instances > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8545.001.patch > > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8545: Attachment: YARN-8545.001.patch > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8545.001.patch > > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553422#comment-16553422 ] Chandni Singh commented on YARN-8545: - [~gsaha] [~billie.rinaldi] could you please review the patch? > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8569) Create an interface to provide cluster information to application
Eric Yang created YARN-8569: --- Summary: Create an interface to provide cluster information to application Key: YARN-8569 URL: https://issues.apache.org/jira/browse/YARN-8569 Project: Hadoop YARN Issue Type: Sub-task Reporter: Eric Yang Some program requires container hostnames to be known for application to run. For example, distributed tensorflow requires launch_command that looks like: {code} # On ps0.example.com: $ python trainer.py \ --ps_hosts=ps0.example.com:,ps1.example.com: \ --worker_hosts=worker0.example.com:,worker1.example.com: \ --job_name=ps --task_index=0 # On ps1.example.com: $ python trainer.py \ --ps_hosts=ps0.example.com:,ps1.example.com: \ --worker_hosts=worker0.example.com:,worker1.example.com: \ --job_name=ps --task_index=1 # On worker0.example.com: $ python trainer.py \ --ps_hosts=ps0.example.com:,ps1.example.com: \ --worker_hosts=worker0.example.com:,worker1.example.com: \ --job_name=worker --task_index=0 # On worker1.example.com: $ python trainer.py \ --ps_hosts=ps0.example.com:,ps1.example.com: \ --worker_hosts=worker0.example.com:,worker1.example.com: \ --job_name=worker --task_index=1 {code} This is a bit cumbersome to orchestrate via Distributed Shell, or YARN services launch_command. In addition, the dynamic parameters do not work with YARN flex command. This is the classic pain point for application developer attempt to automate system environment settings as parameter to end user application. It would be great if YARN Docker integration can provide a simple option to expose hostnames of the yarn service via a mounted file. The file content gets updated when flex command is performed. This allows application developer to consume system environment settings via a standard interface. It is like /proc/devices for Linux, but for Hadoop. This may involve updating a file in distributed cache, and allow mounting of the file via container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553345#comment-16553345 ] genericqa commented on YARN-8330: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m 8s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8330 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932745/YARN-8330.3.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 218bb78e037a 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3a9e25e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21346/testReport/ | | Max. process+thread count | 902 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21346/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > An extra container got launched by RM for
[jira] [Comment Edited] (YARN-8380) Support bind propagation options for mounts in docker runtime
[ https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553321#comment-16553321 ] Eric Yang edited comment on YARN-8380 at 7/23/18 7:47 PM: -- +1 for patch 003. was (Author: eyang): +1 > Support bind propagation options for mounts in docker runtime > - > > Key: YARN-8380 > URL: https://issues.apache.org/jira/browse/YARN-8380 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch > > > The docker run command supports bind propagation options such as shared, but > currently we are only supporting ro and rw mount modes in the docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8380) Support bind propagation options for mounts in docker runtime
[ https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553321#comment-16553321 ] Eric Yang commented on YARN-8380: - +1 > Support bind propagation options for mounts in docker runtime > - > > Key: YARN-8380 > URL: https://issues.apache.org/jira/browse/YARN-8380 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch > > > The docker run command supports bind propagation options such as shared, but > currently we are only supporting ro and rw mount modes in the docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8380) Support bind propagation options for mounts in docker runtime
[ https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553316#comment-16553316 ] genericqa commented on YARN-8380: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 0s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 46s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 15s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 92m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8380 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932751/YARN-8380.3.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient
[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553259#comment-16553259 ] genericqa commented on YARN-8566: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 21s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 44s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 34s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}140m 0s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | Value of errorMessage from previous case is overwritten here due to switch statement fall through At DefaultAMSProcessor.java:case is overwritten here due to switch statement fall through At DefaultAMSProcessor.java:[line 354] | | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.policies.TestDominantResourceFairnessPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8566 | | JIRA Patch URL |
[jira] [Commented] (YARN-8418) App local logs could leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553244#comment-16553244 ] Suma Shivaprasad commented on YARN-8418: [~bibinchundatt] Thanks for the patch. Had a couple of questions on the patch The following lines have been deleted in the new patch. Wouldnt this cause leaks in case of other Exception cases (not Invalid token cases)? 253 appLogAggregators.remove(appId); 254 closeFileSystems(userUgi); and I see that the exception is being thrown later i.e after the log aggregation wrapper has been scheduled? 285 if (appDirException != null) { 286 throw appDirException; 287 } This would also cause issues in exception cases, when the scheduled aggregator runs? > App local logs could leaked if log aggregation fails to initialize for the app > -- > > Key: YARN-8418 > URL: https://issues.apache.org/jira/browse/YARN-8418 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-8418.001.patch, YARN-8418.002.patch, > YARN-8418.003.patch, YARN-8418.004.patch, YARN-8418.005.patch > > > If log aggregation fails init createApp directory container logs could get > leaked in NM directory > For log running application restart of NM after token renewal this case is > possible/ Application submission with invalid delegation token -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8418) App local logs could leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553244#comment-16553244 ] Suma Shivaprasad edited comment on YARN-8418 at 7/23/18 6:36 PM: - [~bibinchundatt] Thanks for the patch. Had a couple of questions on the patch The following lines have been deleted in the new patch. Wouldnt this cause leaks in case of other Exception cases (not Invalid token cases)? {noformat} 253 appLogAggregators.remove(appId); 254 closeFileSystems(userUgi); {noformat} and I see that the exception is being thrown later i.e after the log aggregation wrapper has been scheduled? {noformat} 285 if (appDirException != null) { 286 throw appDirException; 287 } {noformat} This would also cause issues in exception cases, when the scheduled aggregator runs? was (Author: suma.shivaprasad): [~bibinchundatt] Thanks for the patch. Had a couple of questions on the patch The following lines have been deleted in the new patch. Wouldnt this cause leaks in case of other Exception cases (not Invalid token cases)? {noformat} 253 appLogAggregators.remove(appId); 254 closeFileSystems(userUgi); {noformat} and I see that the exception is being thrown later i.e after the log aggregation wrapper has been scheduled? {noformat} 285 if (appDirException != null) { 286 throw appDirException; 287 } This would also cause issues in exception cases, when the scheduled aggregator runs? {noformat} > App local logs could leaked if log aggregation fails to initialize for the app > -- > > Key: YARN-8418 > URL: https://issues.apache.org/jira/browse/YARN-8418 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-8418.001.patch, YARN-8418.002.patch, > YARN-8418.003.patch, YARN-8418.004.patch, YARN-8418.005.patch > > > If log aggregation fails init createApp directory container logs could get > leaked in NM directory > For log running application restart of NM after token renewal this case is > possible/ Application submission with invalid delegation token -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8418) App local logs could leaked if log aggregation fails to initialize for the app
[ https://issues.apache.org/jira/browse/YARN-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553244#comment-16553244 ] Suma Shivaprasad edited comment on YARN-8418 at 7/23/18 6:36 PM: - [~bibinchundatt] Thanks for the patch. Had a couple of questions on the patch The following lines have been deleted in the new patch. Wouldnt this cause leaks in case of other Exception cases (not Invalid token cases)? {noformat} 253 appLogAggregators.remove(appId); 254 closeFileSystems(userUgi); {noformat} and I see that the exception is being thrown later i.e after the log aggregation wrapper has been scheduled? {noformat} 285 if (appDirException != null) { 286 throw appDirException; 287 } This would also cause issues in exception cases, when the scheduled aggregator runs? {noformat} was (Author: suma.shivaprasad): [~bibinchundatt] Thanks for the patch. Had a couple of questions on the patch The following lines have been deleted in the new patch. Wouldnt this cause leaks in case of other Exception cases (not Invalid token cases)? 253 appLogAggregators.remove(appId); 254 closeFileSystems(userUgi); and I see that the exception is being thrown later i.e after the log aggregation wrapper has been scheduled? 285 if (appDirException != null) { 286 throw appDirException; 287 } This would also cause issues in exception cases, when the scheduled aggregator runs? > App local logs could leaked if log aggregation fails to initialize for the app > -- > > Key: YARN-8418 > URL: https://issues.apache.org/jira/browse/YARN-8418 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-8418.001.patch, YARN-8418.002.patch, > YARN-8418.003.patch, YARN-8418.004.patch, YARN-8418.005.patch > > > If log aggregation fails init createApp directory container logs could get > leaked in NM directory > For log running application restart of NM after token renewal this case is > possible/ Application submission with invalid delegation token -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart
[ https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553227#comment-16553227 ] Hudson commented on YARN-6966: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14617 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14617/]) YARN-6966. NodeManager metrics may return wrong negative values when NM (haibochen: rev 9d3c39e9dd88b8f32223c01328581bb68507d415) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/TestContainerSchedulerRecovery.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/metrics/TestNodeManagerMetrics.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/ContainerScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java > NodeManager metrics may return wrong negative values when NM restart > > > Key: YARN-6966 > URL: https://issues.apache.org/jira/browse/YARN-6966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-6966.001.patch, YARN-6966.002.patch, > YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, > YARN-6966.005.patch, YARN-6966.006.patch > > > Just as YARN-6212. However, I think it is not a duplicate of YARN-3933. > The primary cause of negative values is that metrics do not recover properly > when NM restart. > AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores > in metrics also need to recover when NM restart. > This should be done in ContainerManagerImpl#recoverContainer. > The scenario could be reproduction by the following steps: > # Make sure > YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true > in NM > # Submit an application and keep running > # Restart NM > # Stop the application > # Now you get the negative values > {code} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {code} > {code} > { > name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", > modelerType: "NodeManagerMetrics", > tag.Context: "yarn", > tag.Hostname: "hadoop.com", > ContainersLaunched: 0, > ContainersCompleted: 0, > ContainersFailed: 2, > ContainersKilled: 0, > ContainersIniting: 0, > ContainersRunning: 0, > AllocatedGB: 0, > AllocatedContainers: -2, > AvailableGB: 160, > AllocatedVCores: -11, > AvailableVCores: 3611, > ContainerLaunchDurationNumOps: 2, > ContainerLaunchDurationAvgTime: 6, > BadLocalDirs: 0, > BadLogDirs: 0, > GoodLocalDirsDiskUtilizationPerc: 2, > GoodLogDirsDiskUtilizationPerc: 2 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7133) Clean up lock-try order in fair scheduler
[ https://issues.apache.org/jira/browse/YARN-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553202#comment-16553202 ] Daniel Templeton commented on YARN-7133: For one, it's the format set as the standard by Sun. See https://docs.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html > Clean up lock-try order in fair scheduler > - > > Key: YARN-7133 > URL: https://issues.apache.org/jira/browse/YARN-7133 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Szilard Nemeth >Priority: Major > Labels: newbie > Attachments: YARN-7133.001.patch > > > There are many places that follow the pattern:{code}try { > lock.lock(); > ... > } finally { > lock.unlock(); > }{code} > There are a couple of reasons that's a bad idea. The correct pattern > is:{code}lock.lock(); > try { > ... > } finally { > lock.unlock(); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8525) RegistryDNS tcp channel stops working on interrupts
[ https://issues.apache.org/jira/browse/YARN-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553201#comment-16553201 ] genericqa commented on YARN-8525: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 29s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 55s{color} | {color:green} hadoop-yarn-registry in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8525 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932739/YARN-8525.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 854ade677451 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 84d7bf1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21345/testReport/ | | Max. process+thread count | 304 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21345/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. >
[jira] [Commented] (YARN-7133) Clean up lock-try order in fair scheduler
[ https://issues.apache.org/jira/browse/YARN-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553197#comment-16553197 ] Haibo Chen commented on YARN-7133: -- [~snemeth] I am not quite familiar with the reasons why putting lock() inside try-catch block is a bad idea. Do you mind sharing some insights in case folks like me are wondering? > Clean up lock-try order in fair scheduler > - > > Key: YARN-7133 > URL: https://issues.apache.org/jira/browse/YARN-7133 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.0.0-alpha4 >Reporter: Daniel Templeton >Assignee: Szilard Nemeth >Priority: Major > Labels: newbie > Attachments: YARN-7133.001.patch > > > There are many places that follow the pattern:{code}try { > lock.lock(); > ... > } finally { > lock.unlock(); > }{code} > There are a couple of reasons that's a bad idea. The correct pattern > is:{code}lock.lock(); > try { > ... > } finally { > lock.unlock(); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8380) Support bind propagation options for mounts in docker runtime
[ https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8380: - Description: The docker run command supports bind propagation options such as shared, but currently we are only supporting ro and rw mount modes in the docker runtime. (was: The docker run command supports the mount type shared, but currently we are only supporting ro and rw mount types in the docker runtime.) > Support bind propagation options for mounts in docker runtime > - > > Key: YARN-8380 > URL: https://issues.apache.org/jira/browse/YARN-8380 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch > > > The docker run command supports bind propagation options such as shared, but > currently we are only supporting ro and rw mount modes in the docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8380) Support bind propagation options for mounts in docker runtime
[ https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8380: - Summary: Support bind propagation options for mounts in docker runtime (was: Support shared mounts in docker runtime) > Support bind propagation options for mounts in docker runtime > - > > Key: YARN-8380 > URL: https://issues.apache.org/jira/browse/YARN-8380 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch > > > The docker run command supports the mount type shared, but currently we are > only supporting ro and rw mount types in the docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8380) Support shared mounts in docker runtime
[ https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553193#comment-16553193 ] Billie Rinaldi commented on YARN-8380: -- Patch 3 supports formats 1, 2, 3, and 4 with a + sign separating the propagation option in format 2. It keeps the original configs docker.allowed.rw-mounts and docker.allowed.ro-mounts. > Support shared mounts in docker runtime > --- > > Key: YARN-8380 > URL: https://issues.apache.org/jira/browse/YARN-8380 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch > > > The docker run command supports the mount type shared, but currently we are > only supporting ro and rw mount types in the docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8380) Support shared mounts in docker runtime
[ https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8380: - Attachment: YARN-8380.3.patch > Support shared mounts in docker runtime > --- > > Key: YARN-8380 > URL: https://issues.apache.org/jira/browse/YARN-8380 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch > > > The docker run command supports the mount type shared, but currently we are > only supporting ro and rw mount types in the docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart
[ https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553183#comment-16553183 ] Haibo Chen commented on YARN-6966: -- TestContainerSchedulerRecovery.createRecoveredContainerState() is still longer than 80 characters, I'll fix it while committing the patch. +1. I'll create another Jira for the TODO comment I came across. > NodeManager metrics may return wrong negative values when NM restart > > > Key: YARN-6966 > URL: https://issues.apache.org/jira/browse/YARN-6966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-6966.001.patch, YARN-6966.002.patch, > YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, > YARN-6966.005.patch, YARN-6966.006.patch > > > Just as YARN-6212. However, I think it is not a duplicate of YARN-3933. > The primary cause of negative values is that metrics do not recover properly > when NM restart. > AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores > in metrics also need to recover when NM restart. > This should be done in ContainerManagerImpl#recoverContainer. > The scenario could be reproduction by the following steps: > # Make sure > YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true > in NM > # Submit an application and keep running > # Restart NM > # Stop the application > # Now you get the negative values > {code} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {code} > {code} > { > name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", > modelerType: "NodeManagerMetrics", > tag.Context: "yarn", > tag.Hostname: "hadoop.com", > ContainersLaunched: 0, > ContainersCompleted: 0, > ContainersFailed: 2, > ContainersKilled: 0, > ContainersIniting: 0, > ContainersRunning: 0, > AllocatedGB: 0, > AllocatedContainers: -2, > AvailableGB: 160, > AllocatedVCores: -11, > AvailableVCores: 3611, > ContainerLaunchDurationNumOps: 2, > ContainerLaunchDurationAvgTime: 6, > BadLocalDirs: 0, > BadLogDirs: 0, > GoodLocalDirsDiskUtilizationPerc: 2, > GoodLogDirsDiskUtilizationPerc: 2 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553180#comment-16553180 ] Eric Yang commented on YARN-8330: - If information is collected for end user consumption to understand their application usage, then by collecting RUNNING state might be sufficient. End user should not be penalized for YARN framework deficiency. If information is collected for system administrator to understand the cluster health and isolate which node is potentially causing container to fail, then reporting ALLOCATED/ACQUIRED is preferable. Timeline server is optimized for end user application reporting, the extra data seems unnecessary at this time However, more information is collected, it is easier to avoid writing similar code twice. The report filtering can be done at Timeline server to fulfill both use cases. It could be a problem to handy cap the data collection toward one use case only. > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch, YARN-8330.2.patch, YARN-8330.3.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional
[jira] [Commented] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553178#comment-16553178 ] Giovanni Matteo Fumarola commented on YARN-6539: Hi [~Dillon.] . I do not have any knowledge about what should be done in "secureLogin". ResourceManager and NodeManager have the same function. I added it to keep some legacy with the RM and NM. > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553169#comment-16553169 ] Suma Shivaprasad commented on YARN-8330: Thanks [~jlowe] [~rohithsharma] [~leftnoteasy] [~eyang] for reviewing the patch and feedback. Have updated the patch to publish containerCreated events only when the container reaches RUNNING state. > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch, YARN-8330.2.patch, YARN-8330.3.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8330: --- Attachment: YARN-8330.3.patch > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch, YARN-8330.2.patch, YARN-8330.3.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8330: --- Attachment: YARN-8360.2.patch > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch, YARN-8330.2.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8330: --- Attachment: (was: YARN-8360.2.patch) > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch, YARN-8330.2.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8360) Yarn service conflict between restart policy and NM configuration
[ https://issues.apache.org/jira/browse/YARN-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553142#comment-16553142 ] Hudson commented on YARN-8360: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14615 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14615/]) YARN-8360. Improve YARN service restart policy and node manager auto (eyang: rev 84d7bf1eeff6b9418361afa4aa713e5e6f771365) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/ComponentRestartPolicy.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/AlwaysRestartPolicy.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/containerlaunch/TestAbstractLauncher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/AbstractProviderService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/OnFailureRestartPolicy.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/ServiceTestUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/NeverRestartPolicy.java > Yarn service conflict between restart policy and NM configuration > -- > > Key: YARN-8360 > URL: https://issues.apache.org/jira/browse/YARN-8360 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Chandni Singh >Assignee: Suma Shivaprasad >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8360.1.patch > > > For the below spec, the service will not stop even after container failures > because of the NM auto retry properties : > * "yarn.service.container-failure.retry.max": 1, > * "yarn.service.container-failure.validity-interval-ms": 5000 > The NM will continue auto-restarting containers. > {{fail_after 20}} fails after 20 seconds. Since the validity failure > interval is 5 seconds, NM will auto restart the container. > {code:java} > { > "name": "fail-demo2", > "version": "1.0.0", > "components" : > [ > { > "name": "comp1", > "number_of_containers": 1, > "launch_command": "fail_after 20", > "restart_policy": "NEVER", > "resource": { > "cpus": 1, > "memory": "256" > }, > "configuration": { > "properties": { > "yarn.service.container-failure.retry.max": 1, > "yarn.service.container-failure.validity-interval-ms": 5000 > } > } > } > ] > } > {code} > If {{restart_policy}} is NEVER, then the service should stop after the > container fails. > Since we have introduced, the service level Restart Policies, I think we > should make the NM auto retry configurations part of the {{RetryPolicy}} and > get rid of all {{yarn.service.container-failure.**}} properties. Otherwise it > gets confusing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8360) Yarn service conflict between restart policy and NM configuration
[ https://issues.apache.org/jira/browse/YARN-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8360: Fix Version/s: 3.1.2 3.2.0 > Yarn service conflict between restart policy and NM configuration > -- > > Key: YARN-8360 > URL: https://issues.apache.org/jira/browse/YARN-8360 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Chandni Singh >Assignee: Suma Shivaprasad >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8360.1.patch > > > For the below spec, the service will not stop even after container failures > because of the NM auto retry properties : > * "yarn.service.container-failure.retry.max": 1, > * "yarn.service.container-failure.validity-interval-ms": 5000 > The NM will continue auto-restarting containers. > {{fail_after 20}} fails after 20 seconds. Since the validity failure > interval is 5 seconds, NM will auto restart the container. > {code:java} > { > "name": "fail-demo2", > "version": "1.0.0", > "components" : > [ > { > "name": "comp1", > "number_of_containers": 1, > "launch_command": "fail_after 20", > "restart_policy": "NEVER", > "resource": { > "cpus": 1, > "memory": "256" > }, > "configuration": { > "properties": { > "yarn.service.container-failure.retry.max": 1, > "yarn.service.container-failure.validity-interval-ms": 5000 > } > } > } > ] > } > {code} > If {{restart_policy}} is NEVER, then the service should stop after the > container fails. > Since we have introduced, the service level Restart Policies, I think we > should make the NM auto retry configurations part of the {{RetryPolicy}} and > get rid of all {{yarn.service.container-failure.**}} properties. Otherwise it > gets confusing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553130#comment-16553130 ] genericqa commented on YARN-8566: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 19s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}153m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8566 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932719/YARN-8566.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 32e9cf8cba11 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bbe2f62 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | whitespace |
[jira] [Commented] (YARN-8525) RegistryDNS tcp channel stops working on interrupts
[ https://issues.apache.org/jira/browse/YARN-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553126#comment-16553126 ] Eric Yang commented on YARN-8525: - - Patch 003 prevents thread from exiting on InterruptException or ClosedByInterruptException. > RegistryDNS tcp channel stops working on interrupts > --- > > Key: YARN-8525 > URL: https://issues.apache.org/jira/browse/YARN-8525 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0, 3.1.1 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8525.001.patch, YARN-8525.002.patch, > YARN-8525.003.patch > > > While waiting for request for registryDNS, Thread.sleep might send interrupt > exception. This is currently not properly handled in registryDNS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8525) RegistryDNS tcp channel stops working on interrupts
[ https://issues.apache.org/jira/browse/YARN-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8525: Attachment: YARN-8525.003.patch > RegistryDNS tcp channel stops working on interrupts > --- > > Key: YARN-8525 > URL: https://issues.apache.org/jira/browse/YARN-8525 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0, 3.1.1 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8525.001.patch, YARN-8525.002.patch, > YARN-8525.003.patch > > > While waiting for request for registryDNS, Thread.sleep might send interrupt > exception. This is currently not properly handled in registryDNS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8567) Fetching yarn logs fails for long running application if it is not present in timeline store
[ https://issues.apache.org/jira/browse/YARN-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553114#comment-16553114 ] genericqa commented on YARN-8567: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 1s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 81m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8567 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932723/YARN-8567.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 60c67b3d24de 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bbe2f62 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21343/testReport/ | | Max. process+thread count | 706 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21343/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Fetching
[jira] [Commented] (YARN-8360) Yarn service conflict between restart policy and NM configuration
[ https://issues.apache.org/jira/browse/YARN-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553099#comment-16553099 ] Eric Yang commented on YARN-8360: - [~suma.shivaprasad] Thanks for the information. I had an environment that one of the node manager is failing, and this skewed the result of the container restart count by NM and AM. The patch looks correct. > Yarn service conflict between restart policy and NM configuration > -- > > Key: YARN-8360 > URL: https://issues.apache.org/jira/browse/YARN-8360 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Chandni Singh >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8360.1.patch > > > For the below spec, the service will not stop even after container failures > because of the NM auto retry properties : > * "yarn.service.container-failure.retry.max": 1, > * "yarn.service.container-failure.validity-interval-ms": 5000 > The NM will continue auto-restarting containers. > {{fail_after 20}} fails after 20 seconds. Since the validity failure > interval is 5 seconds, NM will auto restart the container. > {code:java} > { > "name": "fail-demo2", > "version": "1.0.0", > "components" : > [ > { > "name": "comp1", > "number_of_containers": 1, > "launch_command": "fail_after 20", > "restart_policy": "NEVER", > "resource": { > "cpus": 1, > "memory": "256" > }, > "configuration": { > "properties": { > "yarn.service.container-failure.retry.max": 1, > "yarn.service.container-failure.validity-interval-ms": 5000 > } > } > } > ] > } > {code} > If {{restart_policy}} is NEVER, then the service should stop after the > container fails. > Since we have introduced, the service level Restart Policies, I think we > should make the NM auto retry configurations part of the {{RetryPolicy}} and > get rid of all {{yarn.service.container-failure.**}} properties. Otherwise it > gets confusing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8566: - Attachment: YARN-8566.003.patch > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8566.001.patch, YARN-8566.002.patch, > YARN-8566.003.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553063#comment-16553063 ] Szilard Nemeth commented on YARN-8566: -- Hi [~bsteinbach]! Thanks for your comments. The code indeed more readable with your suggestions. Please see my updated patch. > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8566.001.patch, YARN-8566.002.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553038#comment-16553038 ] Eric Payne commented on YARN-4606: -- +1 Thanks for all of your work on this JIRA, [~maniraj...@gmail.com]. I will be committing this later today or tomorrow. > CapacityScheduler: applications could get starved because computation of > #activeUsers considers pending apps > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh >Assignee: Manikandan R >Priority: Critical > Attachments: YARN-4606.001.patch, YARN-4606.002.patch, > YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, > YARN-4606.006.patch, YARN-4606.007.patch, YARN-4606.1.poc.patch, > YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch > > > Currently, if all applications belong to same user in LeafQueue are pending > (caused by max-am-percent, etc.), ActiveUsersManager still considers the user > is an active user. This could lead to starvation of active applications, for > example: > - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to > user3)/app4(belongs to user4) are pending > - ActiveUsersManager returns #active-users=4 > - However, there're only two users (user1/user2) are able to allocate new > resources. So computed user-limit-resource could be lower than expected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553032#comment-16553032 ] Antal Bálint Steinbach commented on YARN-8566: -- Hi [~snemeth] ! Thanks for the patch. I only have some minor comments: * Maybe it would be good, to add diagnostic text for the 3rd case (UNKNOWN) * Using a switch for enums can be less verbose * you can extract app.getRMAppAttempt(appAttemptId).updateAMLaunchDiagnostics(... {code:java} // String errorMsg = ""; switch (e.getInvalidResourceType()){ case GREATER_THEN_MAX_ALLOCATION: errorMsg = "Cannot allocate containers as resource request is " + "greater than the maximum allowed allocation!"; break; case LESS_THAN_ZERO: errorMsg = "Cannot allocate containers as resource request is " + "less than zero!"; case UNKNOWN: default: errorMsg = "Cannot allocate containers for some unknown reasons!"; } app.getRMAppAttempt(appAttemptId).updateAMLaunchDiagnostics(errorMsg); {code} > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8566.001.patch, YARN-8566.002.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8567) Fetching yarn logs fails for long running application if it is not present in timeline store
[ https://issues.apache.org/jira/browse/YARN-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552958#comment-16552958 ] Tarun Parimi commented on YARN-8567: {{AHSClientImpl#getContainers}} failed because the application entity got deleted as it exceeded {{yarn.timeline-service.ttl-ms .}} I checked in the debug logs that ClientRMService#getContainers is successful since the application is still running and is present in the ResourceManager. We seem to be only catching IOException here. Ideally we should catch YarnException also in this case so that the response from RM is at least returned if the application is found in RM. Attaching a patch for the same. > Fetching yarn logs fails for long running application if it is not present in > timeline store > > > Key: YARN-8567 > URL: https://issues.apache.org/jira/browse/YARN-8567 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8567.001.patch > > > Using yarn logs command for a long running application which has been running > longer than the configured timeline service ttl > {{yarn.timeline-service.ttl-ms }} fails with the following exception. > {code:java} > Exception in thread "main" > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The entity > for application application_152347939332_1 doesn't exist in the timeline > store > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:670) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:219) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:211) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172) > at > org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2309) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getContainers(ApplicationHistoryProtocolPBClientImpl.java:183) > at > org.apache.hadoop.yarn.client.api.impl.AHSClientImpl.getContainers(AHSClientImpl.java:151) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainers(YarnClientImpl.java:720) > at > org.apache.hadoop.yarn.client.cli.LogsCLI.getContainerReportsFromRunningApplication(LogsCLI.java:1089) > at > org.apache.hadoop.yarn.client.cli.LogsCLI.getContainersLogRequestForRunningApplication(LogsCLI.java:1064) > at > org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:976) > at org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:300) > at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:107) > at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:327) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8567) Fetching yarn logs fails for long running application if it is not present in timeline store
[ https://issues.apache.org/jira/browse/YARN-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tarun Parimi updated YARN-8567: --- Attachment: YARN-8567.001.patch > Fetching yarn logs fails for long running application if it is not present in > timeline store > > > Key: YARN-8567 > URL: https://issues.apache.org/jira/browse/YARN-8567 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-8567.001.patch > > > Using yarn logs command for a long running application which has been running > longer than the configured timeline service ttl > {{yarn.timeline-service.ttl-ms }} fails with the following exception. > {code:java} > Exception in thread "main" > org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The entity > for application application_152347939332_1 doesn't exist in the timeline > store > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:670) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:219) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:211) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172) > at > org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2309) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getContainers(ApplicationHistoryProtocolPBClientImpl.java:183) > at > org.apache.hadoop.yarn.client.api.impl.AHSClientImpl.getContainers(AHSClientImpl.java:151) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainers(YarnClientImpl.java:720) > at > org.apache.hadoop.yarn.client.cli.LogsCLI.getContainerReportsFromRunningApplication(LogsCLI.java:1089) > at > org.apache.hadoop.yarn.client.cli.LogsCLI.getContainersLogRequestForRunningApplication(LogsCLI.java:1064) > at > org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:976) > at org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:300) > at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:107) > at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:327) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8568) Replace the deprecated zk-address property in the HA config example in ResourceManagerHA.md
Antal Bálint Steinbach created YARN-8568: Summary: Replace the deprecated zk-address property in the HA config example in ResourceManagerHA.md Key: YARN-8568 URL: https://issues.apache.org/jira/browse/YARN-8568 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 3.0.x Reporter: Antal Bálint Steinbach Assignee: Antal Bálint Steinbach yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address In the example, "yarn.resourcemanager.zk-address" is used which is deprecated. In the description, the property name is correct "hadoop.zk.address". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8566: - Attachment: YARN-8566.002.patch > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8566.001.patch, YARN-8566.002.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8566: - Attachment: YARN-8566.001.patch > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8566.001.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers
[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552928#comment-16552928 ] Szilard Nemeth commented on YARN-8566: -- The first patch will add this to the App attempt's diagnostic info: Cannot allocate containers as resource request is greater than the maximum allowed allocation! > Add diagnostic message for unschedulable containers > --- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > > 10 > > 1 mb,2vcores > 9 mb,4vcores, 0gpu > 50 > -1.0f > 2.0 > fair > > > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: branch-2.7.2.gpu-port-20180723.patch > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Attachments: GPU locality support for Job scheduling.pdf, > branch-2.7.2.gpu-port-20180723.patch, hadoop-2.7.2.gpu-port-20180711.patch, > hadoop-2.7.2.gpu-port.patch, hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8567) Fetching yarn logs fails for long running application if it is not present in timeline store
Tarun Parimi created YARN-8567: -- Summary: Fetching yarn logs fails for long running application if it is not present in timeline store Key: YARN-8567 URL: https://issues.apache.org/jira/browse/YARN-8567 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.7.0 Reporter: Tarun Parimi Assignee: Tarun Parimi Using yarn logs command for a long running application which has been running longer than the configured timeline service ttl {{yarn.timeline-service.ttl-ms }} fails with the following exception. {code:java} Exception in thread "main" org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The entity for application application_152347939332_1 doesn't exist in the timeline store at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:670) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:219) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:211) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172) at org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2309) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getContainers(ApplicationHistoryProtocolPBClientImpl.java:183) at org.apache.hadoop.yarn.client.api.impl.AHSClientImpl.getContainers(AHSClientImpl.java:151) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainers(YarnClientImpl.java:720) at org.apache.hadoop.yarn.client.cli.LogsCLI.getContainerReportsFromRunningApplication(LogsCLI.java:1089) at org.apache.hadoop.yarn.client.cli.LogsCLI.getContainersLogRequestForRunningApplication(LogsCLI.java:1064) at org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:976) at org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:300) at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:107) at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:327) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552885#comment-16552885 ] Antal Bálint Steinbach edited comment on YARN-8468 at 7/23/18 1:47 PM: --- Hi [~haibochen] ! I only commented "Thanks for the feedback [~wilfreds]", but I also fixed his suggestions. I am sorry for that, please find my responses inline. - a {{FSLeafQueue}} and {{FSParentQueue}} always have a parent doing a null check on the parent is unneeded. The only queue that does not have a parent is the root queue which you already have special cased. _(In some tests sub-queue does not have a parent)_ - {{getMaximumResourceCapability}} must support resource types and not just memory and vcores, same as YARN-7556 for this setting (_It returns Resource I assume than it is ok with resource types_) - {{getMaxAllowedAllocation}} from the NodeTracker support more than just memory and vcores, needs to flow through (_It returns Resource I assume than it is ok with resource types_) - {{FairScheduler}}: Why change the static imports only for a part of the config values, either change all or none (none is preferred) (_Fixed_) - {{FairSchedulerPage}}: missing toString on the ResourceInfo (_added but I can't see why is it necessary_) - Testing must also use resource types not only the old configuration type like: "memory-mb=5120, test1=4, vcores=2" _(Test added)_ - {{TestFairScheduler}} Testing must also include failure cases for sub queues not just the root queue: setting value on root queue should throw and should not be applied (_Fixed_) - If this TestQueueMaxContainerAllocationValidator is a new file make sure that you add the license etc (_license text added for the new files_) Balint was (Author: bsteinbach): Hi [~haibochen] ! I only commented "Thanks for the feedback [~wilfreds]", but I also fixed his suggestions. I am sorry for that, please find my responses inline. - a {{FSLeafQueue}} and {{FSParentQueue}} always have a parent doing a null check on the parent is unneeded. The only queue that does not have a parent is the root queue which you already have special cased. _(In some tests sub-queue does not have a parent)_ - {{getMaximumResourceCapability}} must support resource types and not just memory and vcores, same as YARN-7556 for this setting (_It supports Resource I assume than it is ok with resource types_) - {{getMaxAllowedAllocation}} from the NodeTracker support more than just memory and vcores, needs to flow through (_It supports Resource I assume than it is ok with resource types_) - {{FairScheduler}}: Why change the static imports only for a part of the config values, either change all or none (none is preferred) (_Fixed_) - {{FairSchedulerPage}}: missing toString on the ResourceInfo (_added but I can't see why is it necessary_) - Testing must also use resource types not only the old configuration type like: "memory-mb=5120, test1=4, vcores=2" _(Test added)_ - {{TestFairScheduler}} Testing must also include failure cases for sub queues not just the root queue: setting value on root queue should throw and should not be applied (_Fixed_) - If this TestQueueMaxContainerAllocationValidator is a new file make sure that you add the license etc (_license text added for the new files_) Balint > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Labels: patch > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552885#comment-16552885 ] Antal Bálint Steinbach commented on YARN-8468: -- Hi [~haibochen] ! I only commented "Thanks for the feedback [~wilfreds]", but I also fixed his suggestions. I am sorry for that, please find my responses inline. - a {{FSLeafQueue}} and {{FSParentQueue}} always have a parent doing a null check on the parent is unneeded. The only queue that does not have a parent is the root queue which you already have special cased. _(In some tests sub-queue does not have a parent)_ - {{getMaximumResourceCapability}} must support resource types and not just memory and vcores, same as YARN-7556 for this setting (_It supports Resource I assume than it is ok with resource types_) - {{getMaxAllowedAllocation}} from the NodeTracker support more than just memory and vcores, needs to flow through (_It supports Resource I assume than it is ok with resource types_) - {{FairScheduler}}: Why change the static imports only for a part of the config values, either change all or none (none is preferred) (_Fixed_) - {{FairSchedulerPage}}: missing toString on the ResourceInfo (_added but I can't see why is it necessary_) - Testing must also use resource types not only the old configuration type like: "memory-mb=5120, test1=4, vcores=2" _(Test added)_ - {{TestFairScheduler}} Testing must also include failure cases for sub queues not just the root queue: setting value on root queue should throw and should not be applied (_Fixed_) - If this TestQueueMaxContainerAllocationValidator is a new file make sure that you add the license etc (_license text added for the new files_) Balint > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Labels: patch > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8566) Add diagnostic message for unschedulable containers
Szilard Nemeth created YARN-8566: Summary: Add diagnostic message for unschedulable containers Key: YARN-8566 URL: https://issues.apache.org/jira/browse/YARN-8566 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Szilard Nemeth Assignee: Szilard Nemeth If a queue is configured with maxResources set to 0 for a resource, and an application is submitted to that queue that requests that resource, that application will remain pending until it is removed or moved to a different queue. This behavior can be realized without extended resources, but it’s unlikely a user will create a queue that allows 0 memory or CPU. As the number of resources in the system increases, this scenario will become more common, and it will become harder to recognize these cases. Therefore, the scheduler should indicate in the diagnostic string for an application if it was not scheduled because of a 0 maxResources setting. Example configuration (fair-scheduler.xml) : {code:java} 10 1 mb,2vcores 9 mb,4vcores, 0gpu 50 -1.0f 2.0 fair {code} Command: {code:java} yarn jar "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; {code} The job hangs and the application diagnostic info is empty. Given that an exception is thrown before any mapper/reducer container is created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552812#comment-16552812 ] genericqa commented on YARN-8559: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 32s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}134m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8559 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932684/YARN-8559.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8fc74729784c 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bbe2f62 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21341/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21341/testReport/ | | Max. process+thread count | 929 (vs. ulimit of 1) | | modules | C:
[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552698#comment-16552698 ] Manikandan R commented on YARN-4606: Unit test failure is not related to this patch > CapacityScheduler: applications could get starved because computation of > #activeUsers considers pending apps > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh >Assignee: Manikandan R >Priority: Critical > Attachments: YARN-4606.001.patch, YARN-4606.002.patch, > YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, > YARN-4606.006.patch, YARN-4606.007.patch, YARN-4606.1.poc.patch, > YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch > > > Currently, if all applications belong to same user in LeafQueue are pending > (caused by max-am-percent, etc.), ActiveUsersManager still considers the user > is an active user. This could lead to starvation of active applications, for > example: > - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to > user3)/app4(belongs to user4) are pending > - ActiveUsersManager returns #active-users=4 > - However, there're only two users (user1/user2) are able to allocate new > resources. So computed user-limit-resource could be lower than expected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552697#comment-16552697 ] Anna Savarin commented on YARN-8559: Sweet! > Expose scheduling configuration info in Resource Manager's /conf endpoint > - > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Minor > Attachments: YARN-8559.001.patch, YARN-8559.002.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552646#comment-16552646 ] Weiwei Yang commented on YARN-8559: --- Sure [~banditka], I updated that in v2 patch. Doing an extra check before and after making changes in one case, hope that helps. > Expose scheduling configuration info in Resource Manager's /conf endpoint > - > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Minor > Attachments: YARN-8559.001.patch, YARN-8559.002.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8559: -- Attachment: YARN-8559.002.patch > Expose scheduling configuration info in Resource Manager's /conf endpoint > - > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Minor > Attachments: YARN-8559.001.patch, YARN-8559.002.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552616#comment-16552616 ] Anna Savarin commented on YARN-8559: Thank you for the patch, [~cheersyang]. Do you think it makes sense to first make a change to the config in your test and making sure that the updated value is retrieved, in addition to the standard retrieval? > Expose scheduling configuration info in Resource Manager's /conf endpoint > - > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Minor > Attachments: YARN-8559.001.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552584#comment-16552584 ] genericqa commented on YARN-8559: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 5s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 10s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.policies.TestDominantResourceFairnessPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8559 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932663/YARN-8559.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f9cabc63c9d8 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bbe2f62 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21338/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21338/testReport/ | | Max. process+thread count | 941 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552575#comment-16552575 ] genericqa commented on YARN-8548: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 40s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 77m 35s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}176m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8548 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932657/YARN-8548-005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2edbc766dbe6 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bbe2f62 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | whitespace |
[jira] [Comment Edited] (YARN-8553) Reduce complexity of AHSWebService getApps method
[ https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552541#comment-16552541 ] Szilard Nemeth edited comment on YARN-8553 at 7/23/18 9:52 AM: --- Hi [~rohithsharma]! First patch is ready for review! I performed a similar refactor like in RMWebServices, i.e. building the GetAppRequest with the builder. The loop iterating on appReports with plenty of continue statements still looks pretty bad but I think that's not in the scope of this jira. was (Author: snemeth): Hi [~rohithsharma]! First patch is ready for review! I performed a similar refactor like in RMWebServices. The loop iterating on appReports with plenty of continue statements still looks pretty bad but I think that's not in the scope of this jira. > Reduce complexity of AHSWebService getApps method > - > > Key: YARN-8553 > URL: https://issues.apache.org/jira/browse/YARN-8553 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8553.001.patch > > > YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in > AHSWebservice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8553) Reduce complexity of AHSWebService getApps method
[ https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552541#comment-16552541 ] Szilard Nemeth edited comment on YARN-8553 at 7/23/18 9:51 AM: --- Hi [~rohithsharma]! First patch is ready for review! I performed a similar refactor like in RMWebServices. The loop iterating on appReports with plenty of continue statements still looks pretty bad but I think that's not in the scope of this jira. was (Author: snemeth): Hi [~rohithsharma]! First patch is ready for review! > Reduce complexity of AHSWebService getApps method > - > > Key: YARN-8553 > URL: https://issues.apache.org/jira/browse/YARN-8553 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8553.001.patch > > > YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in > AHSWebservice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552559#comment-16552559 ] Dillon Zhang commented on YARN-6539: Hi [~giovanni.fumarola], can you please describe what's "secureLogin" mean. If you haven't working on this issue, I'd like to take it, thank you. > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8553) Reduce complexity of AHSWebService getApps method
[ https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552541#comment-16552541 ] Szilard Nemeth commented on YARN-8553: -- Hi [~rohithsharma]! First patch is ready for review! > Reduce complexity of AHSWebService getApps method > - > > Key: YARN-8553 > URL: https://issues.apache.org/jira/browse/YARN-8553 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8553.001.patch > > > YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in > AHSWebservice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8553) Reduce complexity of AHSWebService getApps method
[ https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8553: - Attachment: YARN-8553.001.patch > Reduce complexity of AHSWebService getApps method > - > > Key: YARN-8553 > URL: https://issues.apache.org/jira/browse/YARN-8553 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8553.001.patch > > > YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in > AHSWebservice. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552536#comment-16552536 ] niu edited comment on YARN-8513 at 7/23/18 9:21 AM: We also met this problem in 2.9.1. It is caused by deadlock. was (Author: hustnn): We also met this problem in 2.9.1. > CapacityScheduler infinite loop when queue is near fully utilized > - > > Key: YARN-8513 > URL: https://issues.apache.org/jira/browse/YARN-8513 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.9.1 > Environment: Ubuntu 14.04.5 > YARN is configured with one label and 5 queues. >Reporter: Chen Yufei >Priority: Major > Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, > jstack-5.log, top-during-lock.log, top-when-normal.log > > > ResourceManager does not respond to any request when queue is near fully > utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM > restart, it can recover running jobs and start accepting new ones. > > Seems like CapacityScheduler is in an infinite loop printing out the > following log messages (more than 25,000 lines in a second): > > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.99816763 > absoluteUsedCapacity=0.99816763 used= > cluster=}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application attempt=appattempt_1530619767030_1652_01 > container=null > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 > clusterResource= type=NODE_LOCAL > requestedPartition=}} > > I encounter this problem several times after upgrading to YARN 2.9.1, while > the same configuration works fine under version 2.7.3. > > YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a > similar problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized
[ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552536#comment-16552536 ] niu commented on YARN-8513: --- We also met this problem in 2.9.1. > CapacityScheduler infinite loop when queue is near fully utilized > - > > Key: YARN-8513 > URL: https://issues.apache.org/jira/browse/YARN-8513 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.9.1 > Environment: Ubuntu 14.04.5 > YARN is configured with one label and 5 queues. >Reporter: Chen Yufei >Priority: Major > Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, > jstack-5.log, top-during-lock.log, top-when-normal.log > > > ResourceManager does not respond to any request when queue is near fully > utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM > restart, it can recover running jobs and start accepting new ones. > > Seems like CapacityScheduler is in an infinite loop printing out the > following log messages (more than 25,000 lines in a second): > > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > assignedContainer queue=root usedCapacity=0.99816763 > absoluteUsedCapacity=0.99816763 used= > cluster=}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal}} > {{2018-07-10 17:16:29,227 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application attempt=appattempt_1530619767030_1652_01 > container=null > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 > clusterResource= type=NODE_LOCAL > requestedPartition=}} > > I encounter this problem several times after upgrading to YARN 2.9.1, while > the same configuration works fine under version 2.7.3. > > YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a > similar problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552507#comment-16552507 ] Weiwei Yang commented on YARN-8559: --- A simple patch is now available for this issue ;) > Expose scheduling configuration info in Resource Manager's /conf endpoint > - > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Minor > Attachments: YARN-8559.001.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed
[ https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-7748: -- Attachment: YARN-7748.002.patch > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > failed > > > Key: YARN-7748 > URL: https://issues.apache.org/jira/browse/YARN-7748 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0 >Reporter: Haibo Chen >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-7748.001.patch, YARN-7748.002.patch > > > TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted > Failing for the past 1 build (Since Failed#19244 ) > Took 0.4 sec. > *Error Message* > expected null, but > was: > *Stacktrace* > {code} > java.lang.AssertionError: expected null, but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotNull(Assert.java:664) > at org.junit.Assert.assertNull(Assert.java:646) > at org.junit.Assert.assertNull(Assert.java:656) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8036) Memory Available shows a negative value after running updateNodeResource
[ https://issues.apache.org/jira/browse/YARN-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552493#comment-16552493 ] Charan Hebri commented on YARN-8036: [~Zian Chen] This may have been fixed by YARN-8464. I haven't seen this issue recently. > Memory Available shows a negative value after running updateNodeResource > > > Key: YARN-8036 > URL: https://issues.apache.org/jira/browse/YARN-8036 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Zian Chen >Priority: Major > > Running updateNodeResource for a node that already has applications running > on it doesn't update Memory Available with the right values. It may end up > showing negative values based on the requirements of the application. > Attached a screenshot for reference. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-8564: Description: I wish to have application lifetime monitor for queue level in FairSheduler. In our large yarn cluster, sometimes there are too many small jobs in one minor queue but may run too long, it may cause our long time job in our high priority and very important queue pending too many times. If we can have queue level application lifetime monitor in the queue level, and set small lifetime in the minor queue, i think it may can solve our problems before the global schedulering can be used in FairScheduler. (was: I wish to have application lifetime monitor for queue level. In our large yarn cluster, sometimes there are too many small jobs in one minor queue but may run too long, it may cause our long time job in our high priority and very important queue pending too many times. If we can have queue level application lifetime monitor in the queue level, and set small lifetime in the minor queue, i think it may can solve our problems before the global schedulering can be used in FairScheduler.) > Add queue level application lifetime monitor in FairScheduler > -- > > Key: YARN-8564 > URL: https://issues.apache.org/jira/browse/YARN-8564 > Project: Hadoop YARN > Issue Type: Wish > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: zhuqi >Priority: Major > > I wish to have application lifetime monitor for queue level in FairSheduler. > In our large yarn cluster, sometimes there are too many small jobs in one > minor queue but may run too long, it may cause our long time job in our high > priority and very important queue pending too many times. If we can have > queue level application lifetime monitor in the queue level, and set small > lifetime in the minor queue, i think it may can solve our problems before the > global schedulering can be used in FairScheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-8564: Description: I wish to have application lifetime monitor for queue level. In our large yarn cluster, sometimes there are too many small jobs in one minor queue but may run too long, it may cause our long time job in our high priority and very important queue pending too many times. If we can have queue level application lifetime monitor in the queue level, and set small lifetime in the minor queue, i think it may can solve our problems before the global schedulering can be used in FairScheduler. (was: I wish to have application lifetime monitor for queue level. In our large yarn cluster, sometimes there are too many small jobs in one minor queue but may run too long, it may cause our long time job in our high priority queue pending too many times. If we can have queue level application lifetime monitor in the queue level, and set small lifetime in the minor queue, i think it can solve our problems.) > Add queue level application lifetime monitor in FairScheduler > -- > > Key: YARN-8564 > URL: https://issues.apache.org/jira/browse/YARN-8564 > Project: Hadoop YARN > Issue Type: Wish > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: zhuqi >Priority: Major > > I wish to have application lifetime monitor for queue level. In our large > yarn cluster, sometimes there are too many small jobs in one minor queue but > may run too long, it may cause our long time job in our high priority and > very important queue pending too many times. If we can have queue level > application lifetime monitor in the queue level, and set small lifetime in > the minor queue, i think it may can solve our problems before the global > schedulering can be used in FairScheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-8564: Description: I wish to have application lifetime monitor for queue level. In our large yarn cluster, sometimes there are too many small jobs in one minor queue but may run too long, it may cause our long time job in our high priority queue pending too many times. If we can have queue level application lifetime monitor in the queue level, and set small lifetime in the minor queue, i think it can solve our problems. (was: Wish to have application lifetime monitor for queue level. In our large yarn cluster, sometimes there are too many small jobs in one minor queue but may run too long, it may cause our long time job in our high priority queue pending too many times. If we can have queue level application lifetime monitor in the queue level, and set small lifetime in the minor queue, i think it can solve our problems.) > Add queue level application lifetime monitor in FairScheduler > -- > > Key: YARN-8564 > URL: https://issues.apache.org/jira/browse/YARN-8564 > Project: Hadoop YARN > Issue Type: Wish > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: zhuqi >Priority: Major > > I wish to have application lifetime monitor for queue level. In our large > yarn cluster, sometimes there are too many small jobs in one minor queue but > may run too long, it may cause our long time job in our high priority queue > pending too many times. If we can have queue level application lifetime > monitor in the queue level, and set small lifetime in the minor queue, i > think it can solve our problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8564) Add queue level application lifetime monitor in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552453#comment-16552453 ] zhuqi commented on YARN-8564: - Hi [~rohithsharma] : I have changed my description, thanks. > Add queue level application lifetime monitor in FairScheduler > -- > > Key: YARN-8564 > URL: https://issues.apache.org/jira/browse/YARN-8564 > Project: Hadoop YARN > Issue Type: Wish > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: zhuqi >Priority: Major > > Wish to have application lifetime monitor for queue level. In our large yarn > cluster, sometimes there are too many small jobs in one minor queue but may > run too long, it may cause our long time job in our high priority queue > pending too many times. If we can have queue level application lifetime > monitor in the queue level, and set small lifetime in the minor queue, i > think it can solve our problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-8564: Description: Wish to have application lifetime monitor for queue level. In our large yarn cluster, sometimes there are too many small jobs in one minor queue but may run too long, it may cause our long time job in our high priority queue pending too many times. If we can have queue level application lifetime monitor in the queue level, and set small lifetime in the minor queue, i think it can solve our problems. (was: Now the application life time monitor feature is only worked with the Capacity Scheduler, but it is not supported with Fair Scheduler. Our company wish to use this feature in our product environment in order to make our big growing clusters more stable and efficient .) > Add queue level application lifetime monitor in FairScheduler > -- > > Key: YARN-8564 > URL: https://issues.apache.org/jira/browse/YARN-8564 > Project: Hadoop YARN > Issue Type: Wish > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: zhuqi >Priority: Major > > Wish to have application lifetime monitor for queue level. In our large yarn > cluster, sometimes there are too many small jobs in one minor queue but may > run too long, it may cause our long time job in our high priority queue > pending too many times. If we can have queue level application lifetime > monitor in the queue level, and set small lifetime in the minor queue, i > think it can solve our problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned YARN-8559: - Assignee: Weiwei Yang > Expose scheduling configuration info in Resource Manager's /conf endpoint > - > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Assignee: Weiwei Yang >Priority: Minor > Attachments: YARN-8559.001.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8559: -- Attachment: YARN-8559.001.patch > Expose scheduling configuration info in Resource Manager's /conf endpoint > - > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Priority: Minor > Attachments: YARN-8559.001.patch > > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8559: -- Target Version/s: 3.2.0, 3.0.3, 3.1.2 (was: 3.0.0, 3.1.0) > Expose scheduling configuration info in Resource Manager's /conf endpoint > - > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Priority: Minor > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552245#comment-16552245 ] Weiwei Yang edited comment on YARN-8559 at 7/23/18 7:42 AM: Thanks [~leftnoteasy], that's correct. "scheduler-conf" right now is a PUT for refreshing CS configs. There is "http://RM_ADDR/scheduler; to retrieve scheduler info, see more at [apache doc|http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API]. It dumps scheduler info but not directly from conf. I agree to add a GET for this endpoint too, so user could first query existing CS configuration, then update via mutable API and at last query again to make sure it is all good. Let me reopen the ticket to get this done. Thanks was (Author: cheersyang): Thanks [~leftnoteasy], that's correct. "scheduler-conf" right now is a PUT for refreshing CS configs. There is "http://RM_ADDR/scheduler; to retrieve scheduler info, see more at [apache doc|http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API]. It dumps scheduler info but not directly from conf. I agree to add a GET for this endpoint too. Let me reopen this JIRA for more discussion then. Thanks > Expose scheduling configuration info in Resource Manager's /conf endpoint > - > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Priority: Minor > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint
[ https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552245#comment-16552245 ] Weiwei Yang edited comment on YARN-8559 at 7/23/18 7:40 AM: Thanks [~leftnoteasy], that's correct. "scheduler-conf" right now is a PUT for refreshing CS configs. There is "http://RM_ADDR/scheduler; to retrieve scheduler info, see more at [apache doc|http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API]. It dumps scheduler info but not directly from conf. I agree to add a GET for this endpoint too. Let me reopen this JIRA for more discussion then. Thanks was (Author: cheersyang): Thanks [~leftnoteasy], that's correct. "scheduler-conf" right now is a PUT for refreshing CS configs. There is "http://RM_ADDR/scheduler; to retrieve scheduler info, see more at [apache doc|http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API]. It dumps scheduler info but not directly from conf. I am thinking since we support the mutable configuration via REST (scheduler-conf), we should support GET too. Thoughts [~leftnoteasy], [~banditka]? Let me reopen this JIRA for more discussion then. Thanks > Expose scheduling configuration info in Resource Manager's /conf endpoint > - > > Key: YARN-8559 > URL: https://issues.apache.org/jira/browse/YARN-8559 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Anna Savarin >Priority: Minor > > All Hadoop services provide a set of common endpoints (/stacks, /logLevel, > /metrics, /jmx, /conf). In the case of the Resource Manager, part of the > configuration comes from the scheduler being used. Currently, these > configuration key/values are not exposed through the /conf endpoint, thereby > revealing an incomplete configuration picture. > Make an improvement and expose the scheduling configuration info through the > RM's /conf endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated YARN-8564: Summary: Add queue level application lifetime monitor in FairScheduler (was: Add application lifetime monitor in FairScheduler) > Add queue level application lifetime monitor in FairScheduler > -- > > Key: YARN-8564 > URL: https://issues.apache.org/jira/browse/YARN-8564 > Project: Hadoop YARN > Issue Type: Wish > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: zhuqi >Priority: Major > > Now the application life time monitor feature is only worked with the > Capacity Scheduler, but it is not supported with Fair Scheduler. Our company > wish to use this feature in our product environment in order to make our big > growing clusters more stable and efficient . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552403#comment-16552403 ] Bilwa S T commented on YARN-8548: - [~sunilg] thanks for reviewing...comment is handled > AllocationRespose proto setNMToken initBuilder not done > --- > > Key: YARN-8548 > URL: https://issues.apache.org/jira/browse/YARN-8548 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-8548-001.patch, YARN-8548-002.patch, > YARN-8548-003.patch, YARN-8548-004.patch, YARN-8548-005.patch > > > Distributed Scheduling allocate failing > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1355) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy85.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T updated YARN-8548: Attachment: YARN-8548-005.patch > AllocationRespose proto setNMToken initBuilder not done > --- > > Key: YARN-8548 > URL: https://issues.apache.org/jira/browse/YARN-8548 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-8548-001.patch, YARN-8548-002.patch, > YARN-8548-003.patch, YARN-8548-004.patch, YARN-8548-005.patch > > > Distributed Scheduling allocate failing > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1355) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy85.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org