[jira] [Comment Edited] (YARN-8541) RM startup failure on recovery after user deletion

2018-07-23 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553819#comment-16553819
 ] 

Bibin A Chundatt edited comment on YARN-8541 at 7/24/18 5:19 AM:
-

On recovery the application context  will have the previously set queueName. At 
scheduler side if queue doesn't exists the application will get killed. 

Hope i  was  able to clarify your doubt.



was (Author: bibinchundatt):
On recovery the application context  will have the previously set queueName. At 
scheduler side if queue doesn't exists the application will get killed. 

Hope i hope was  able to clarify your doubt.


> RM startup failure on recovery after user deletion
> --
>
> Key: YARN-8541
> URL: https://issues.apache.org/jira/browse/YARN-8541
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: yimeng
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8541.001.patch, YARN-8541.002.patch, 
> YARN-8541.003.patch
>
>
> My hadoop version 3.1.0. I found that  a problem RM startup failure on 
> recovery as the follow test step:
> 1.create a user "user1" have the permisson to submit app.
> 2.use user1 to submit a job ,wait job finished.
> 3.delete user "user1"
> 4.restart yarn 
> 5.the RM restart failed
> RM logs:
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized root queue 
> root: numChildQueue= 3, capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.0, numApps=0, 
> numContainers=0 | CapacitySchedulerQueueManager.java:163
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized queue 
> mappings, override: false | UserGroupMappingPlacementRule.java:232
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized 
> CapacityScheduler with calculator=class 
> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, 
> minimumAllocation=<>, maximumAllocation=< vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms | 
> CapacityScheduler.java:392
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | dynamic-resources.xml not 
> found | Configuration.java:2767
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Initializing AMS 
> Processing chain. Root 
> Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor].
>  | AMSProcessingChain.java:62
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | disabled placement 
> handler will be used, all scheduling requests will be rejected. | 
> ApplicationMasterService.java:130
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Adding 
> [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor]
>  tp top of AMS Processing chain. | AMSProcessingChain.java:75
> 2018-07-16 16:24:59,713 | WARN | main-EventThread | Exception handling the 
> winning of election | ActiveStandbyElector.java:897
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application 
> application_1531624956005_0001 submitted by user super reason: No groups 
> found for user super
>  at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1204)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1245)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1241)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> 

[jira] [Commented] (YARN-8541) RM startup failure on recovery after user deletion

2018-07-23 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553819#comment-16553819
 ] 

Bibin A Chundatt commented on YARN-8541:


On recovery the application context  will have the previously set queueName. At 
scheduler side if queue doesn't exists the application will get killed. 

Hope i hope was  able to clarify your doubt.


> RM startup failure on recovery after user deletion
> --
>
> Key: YARN-8541
> URL: https://issues.apache.org/jira/browse/YARN-8541
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: yimeng
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8541.001.patch, YARN-8541.002.patch, 
> YARN-8541.003.patch
>
>
> My hadoop version 3.1.0. I found that  a problem RM startup failure on 
> recovery as the follow test step:
> 1.create a user "user1" have the permisson to submit app.
> 2.use user1 to submit a job ,wait job finished.
> 3.delete user "user1"
> 4.restart yarn 
> 5.the RM restart failed
> RM logs:
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized root queue 
> root: numChildQueue= 3, capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.0, numApps=0, 
> numContainers=0 | CapacitySchedulerQueueManager.java:163
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized queue 
> mappings, override: false | UserGroupMappingPlacementRule.java:232
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized 
> CapacityScheduler with calculator=class 
> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, 
> minimumAllocation=<>, maximumAllocation=< vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms | 
> CapacityScheduler.java:392
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | dynamic-resources.xml not 
> found | Configuration.java:2767
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Initializing AMS 
> Processing chain. Root 
> Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor].
>  | AMSProcessingChain.java:62
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | disabled placement 
> handler will be used, all scheduling requests will be rejected. | 
> ApplicationMasterService.java:130
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Adding 
> [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor]
>  tp top of AMS Processing chain. | AMSProcessingChain.java:75
> 2018-07-16 16:24:59,713 | WARN | main-EventThread | Exception handling the 
> winning of election | ActiveStandbyElector.java:897
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application 
> application_1531624956005_0001 submitted by user super reason: No groups 
> found for user super
>  at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1204)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1245)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1241)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1241)
>  at 
> 

[jira] [Commented] (YARN-8418) App local logs could leaked if log aggregation fails to initialize for the app

2018-07-23 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553817#comment-16553817
 ] 

Bibin A Chundatt commented on YARN-8418:


Thank you [~suma.shivaprasad] for looking in to patch

{quote}
Wouldnt this cause leaks in case of other Exception cases (not Invalid token 
cases)
{quote}
{{aggregatorWrapper}} thread takes cares of closing once aggregator thread is 
compelte

{code}
// Schedule the aggregator.
Runnable aggregatorWrapper = new Runnable() {
  public void run() {
try {
  appLogAggregator.run();
} finally {
  appLogAggregators.remove(appId);
  closeFileSystems(userUgi);
}
  }
};
{code}

{quote}
when the scheduled aggregator runs?
{quote}

Once exception we have disableFlag per aggregator . So if aggregator is 
disabled then the upload will be skipped and only the postCleanUp is done. 
{{AppLogAggregatorImpl#run()}}




> App local logs could leaked if log aggregation fails to initialize for the app
> --
>
> Key: YARN-8418
> URL: https://issues.apache.org/jira/browse/YARN-8418
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8418.001.patch, YARN-8418.002.patch, 
> YARN-8418.003.patch, YARN-8418.004.patch, YARN-8418.005.patch
>
>
> If log aggregation fails init createApp directory container logs could get 
> leaked in NM directory
> For log running application restart of NM after token renewal this case is 
> possible/  Application submission with invalid delegation token



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553795#comment-16553795
 ] 

genericqa commented on YARN-7748:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m  
3s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}129m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-7748 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932819/YARN-7748.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d2115bfcfef0 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8688a0c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21349/testReport/ |
| Max. process+thread count | 865 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21349/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> 

[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-23 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553745#comment-16553745
 ] 

Sunil Govindan commented on YARN-8548:
--

Thanks. Patch looks good to me.

> AllocationRespose proto setNMToken initBuilder not done
> ---
>
> Key: YARN-8548
> URL: https://issues.apache.org/jira/browse/YARN-8548
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-8548-001.patch, YARN-8548-002.patch, 
> YARN-8548-003.patch, YARN-8548-004.patch, YARN-8548-005.patch
>
>
> Distributed Scheduling allocate failing
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1445)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1355)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy85.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed

2018-07-23 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553742#comment-16553742
 ] 

Sunil Govindan commented on YARN-7748:
--

Thanks. Yes, this change looks good to me. Pending jenkins.

> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted 
> failed
> 
>
> Key: YARN-7748
> URL: https://issues.apache.org/jira/browse/YARN-7748
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0
>Reporter: Haibo Chen
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-7748.001.patch, YARN-7748.002.patch, 
> YARN-7748.003.patch
>
>
> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted
> Failing for the past 1 build (Since Failed#19244 )
> Took 0.4 sec.
> *Error Message*
> expected null, but 
> was:
> *Stacktrace*
> {code}
> java.lang.AssertionError: expected null, but 
> was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed

2018-07-23 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553698#comment-16553698
 ] 

Weiwei Yang edited comment on YARN-7748 at 7/24/18 2:29 AM:


Thanks [~snemeth], I am taking this one over.

[~haibochen], [~sunilg], could you please take a look at this fix?


was (Author: cheersyang):
Thanks [~snemeth], I am taking this one over.

[~sunilg], could you please take a look at this fix?

> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted 
> failed
> 
>
> Key: YARN-7748
> URL: https://issues.apache.org/jira/browse/YARN-7748
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0
>Reporter: Haibo Chen
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-7748.001.patch, YARN-7748.002.patch, 
> YARN-7748.003.patch
>
>
> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted
> Failing for the past 1 build (Since Failed#19244 )
> Took 0.4 sec.
> *Error Message*
> expected null, but 
> was:
> *Stacktrace*
> {code}
> java.lang.AssertionError: expected null, but 
> was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed

2018-07-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned YARN-7748:
-

Assignee: Weiwei Yang  (was: Szilard Nemeth)

> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted 
> failed
> 
>
> Key: YARN-7748
> URL: https://issues.apache.org/jira/browse/YARN-7748
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0
>Reporter: Haibo Chen
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-7748.001.patch, YARN-7748.002.patch, 
> YARN-7748.003.patch
>
>
> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted
> Failing for the past 1 build (Since Failed#19244 )
> Took 0.4 sec.
> *Error Message*
> expected null, but 
> was:
> *Stacktrace*
> {code}
> java.lang.AssertionError: expected null, but 
> was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed

2018-07-23 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553698#comment-16553698
 ] 

Weiwei Yang commented on YARN-7748:
---

Thanks [~snemeth], I am taking this one over.

[~sunilg], could you please take a look at this fix?

> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted 
> failed
> 
>
> Key: YARN-7748
> URL: https://issues.apache.org/jira/browse/YARN-7748
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0
>Reporter: Haibo Chen
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-7748.001.patch, YARN-7748.002.patch, 
> YARN-7748.003.patch
>
>
> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted
> Failing for the past 1 build (Since Failed#19244 )
> Took 0.4 sec.
> *Error Message*
> expected null, but 
> was:
> *Stacktrace*
> {code}
> java.lang.AssertionError: expected null, but 
> was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed

2018-07-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7748:
--
Attachment: YARN-7748.003.patch

> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted 
> failed
> 
>
> Key: YARN-7748
> URL: https://issues.apache.org/jira/browse/YARN-7748
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0
>Reporter: Haibo Chen
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-7748.001.patch, YARN-7748.002.patch, 
> YARN-7748.003.patch
>
>
> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted
> Failing for the past 1 build (Since Failed#19244 )
> Took 0.4 sec.
> *Error Message*
> expected null, but 
> was:
> *Stacktrace*
> {code}
> java.lang.AssertionError: expected null, but 
> was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553693#comment-16553693
 ] 

Weiwei Yang commented on YARN-8559:
---

cc [~leftnoteasy], [~sunilg] for review.

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8559:
--
Priority: Major  (was: Minor)

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler in Resource Manager's /scheduler-conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8559:
--
Summary: Expose mutable-conf scheduler in Resource Manager's 
/scheduler-conf endpoint  (was: Expose scheduling configuration info in 
Resource Manager's /conf endpoint)

> Expose mutable-conf scheduler in Resource Manager's /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8559:
--
Summary: Expose mutable-conf scheduler's configuration in RM 
/scheduler-conf endpoint  (was: Expose mutable-conf scheduler in Resource 
Manager's /scheduler-conf endpoint)

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-07-23 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553691#comment-16553691
 ] 

Naganarasimha G R commented on YARN-7863:
-

Have done just a first pass review,
few higher level comments :
 # I think Distributed shell related modifications can be pulled into separate 
Jira (YARN-8289) as discussed earlier
 # I could not get in the meeting wrt to Attributes and Partitions/labels, IIUC 
in ResourceRequest we specify the label expression directly but in 
SchedulingRequest we can give it as part of PlacementConstraint ? And if so we 
club it with the attribute expression ? May be we need capture that explicitly 
over here.can you share a sample request with both of them available?
 # Attach the doc which captured our conclusions on the expression or capture 
the essense of it as Jira's descritpion

Other minor aspects:
 * PlacementConstraints.java ln 52- 53, better to use 
NodeAttribute.PREFIX_DISTRIBUTED & PREFIX_CENTRALIZED
 * PlacementConstraintParser ln 363-436 NodeConstraintsTokenizer seems to be 
not used anywhere ?
 * ResourceManager ln no 652-654 createNodeAttributesManager should either 
return NodeAttributesManagerImpl or take parameter as rmContext instead of type 
casting
 * .bowerrc ln no 3, might not be related to this patch ?
 * TestPlacementConstraintParser : captures a single test case , we need to 
capture more examples of capturing the positive and negative expression 
validation cases
 * TestPlacementConstraintParser ln no 457 : is this the approach which we 
finalised ? i thought its much more expressive like the document in  
PlacementConstraintParser.parsePlacementSpecForAttributes

As well change the status of the Jira to patch submitted so that jenkins build 
will be triggered.

 

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler

2018-07-23 Thread zhuqi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-8564:

Description: I wish to have application lifetime monitor for queue level in 
FairSheduler. In our large yarn cluster, sometimes there are too many small 
jobs in one minor queue but may run too long, it may affect our our high 
priority and very important queue . If we can have queue level application 
lifetime monitor in the queue level, and set small lifetime in the minor queue. 
 (was: I wish to have application lifetime monitor for queue level in 
FairSheduler. In our large yarn cluster, sometimes there are too many small 
jobs in one minor queue but may run too long, it may cause our long time job in 
our high priority and very important queue pending too many times. If we can 
have queue level application lifetime monitor in the queue level, and set small 
lifetime in the minor queue, i think it may can solve our problems before the 
global schedulering can be used in FairScheduler.)

> Add queue level application lifetime monitor in FairScheduler 
> --
>
> Key: YARN-8564
> URL: https://issues.apache.org/jira/browse/YARN-8564
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: zhuqi
>Priority: Major
>
> I wish to have application lifetime monitor for queue level in FairSheduler. 
> In our large yarn cluster, sometimes there are too many small jobs in one 
> minor queue but may run too long, it may affect our our high priority and 
> very important queue . If we can have queue level application lifetime 
> monitor in the queue level, and set small lifetime in the minor queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8380) Support bind propagation options for mounts in docker runtime

2018-07-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553624#comment-16553624
 ] 

Hudson commented on YARN-8380:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14619 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14619/])
YARN-8380.  Support bind propagation options for mounts in docker (eyang: rev 
8688a0c7f88f2adf1a7fce695e06f3dd1f745080)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerRunCommand.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/TestNvidiaDockerV1CommandPlugin.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md


> Support bind propagation options for mounts in docker runtime
> -
>
> Key: YARN-8380
> URL: https://issues.apache.org/jira/browse/YARN-8380
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch
>
>
> The docker run command supports bind propagation options such as shared, but 
> currently we are only supporting ro and rw mount modes in the docker runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-23 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553540#comment-16553540
 ] 

Jason Lowe commented on YARN-8330:
--

One of the envisioned use-cases of ATSv2 was to record every container 
allocation for an application in order to do analysis like what is done in 
YARN-415.  That requires recording every container that enters the ALLOCATED 
state.

bq. If information is collected for end user consumption to understand their 
application usage, then by collecting RUNNING state might be sufficient.

That does not cover the case where an application receives containers but for 
some reason holds onto the allocations for a while before launching them.  Tez, 
for example, has some corner cases in its scheduler where it can hold onto 
container allocations for prolonged periods before launching them while it 
reuses other, already active containers.  Holding onto unlaunched container 
allocations is part of its footprint on the cluster.  Only showing containers 
that made it to the RUNNING is only telling part of the application's usage 
story.

Fixing the yarn container -list command problem does not mean the fix has to 
solely be in ATSv2.  If the data for a container in ATS is sufficient to 
distinguish allocated containers from running containers then this can, and 
arguably should be, filtered for the yarn container -list use-case.  If the 
data recorded for a container can't distinguish this then we should look into 
fixing that so it can be.  But I don't think we should pre-filter ALLOCATED 
containers on the RM publishing side and preclude the proper app footprint 
analysis use-case entirely.


> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch, YARN-8330.2.patch, YARN-8330.3.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 

[jira] [Created] (YARN-8570) GPU support in SLS

2018-07-23 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-8570:
---

 Summary: GPU support in SLS
 Key: YARN-8570
 URL: https://issues.apache.org/jira/browse/YARN-8570
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung
Assignee: Jonathan Hung


Currently resource requests in SLS only support memory and vcores. Since GPU is 
natively supported by YARN, it will be useful to support requesting GPU 
resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553500#comment-16553500
 ] 

genericqa commented on YARN-8545:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 
26s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8545 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932779/YARN-8545.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 56cd137fb41c 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 17e2616 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21348/testReport/ |
| Max. process+thread count | 755 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21348/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> YARN native service 

[jira] [Comment Edited] (YARN-8545) YARN native service should return container if launch failed

2018-07-23 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553426#comment-16553426
 ] 

Chandni Singh edited comment on YARN-8545 at 7/23/18 9:36 PM:
--

In Patch 1 : 
- releasing containers that failed
- removing failed containers from live instances



was (Author: csingh):
In Patch 1 : 
- releasing containers that failed
- removing containers from live instances


> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8545.001.patch
>
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8545) YARN native service should return container if launch failed

2018-07-23 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8545:

Attachment: YARN-8545.001.patch

> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8545.001.patch
>
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed

2018-07-23 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553422#comment-16553422
 ] 

Chandni Singh commented on YARN-8545:
-

[~gsaha] [~billie.rinaldi] could you please review the patch?


> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8569) Create an interface to provide cluster information to application

2018-07-23 Thread Eric Yang (JIRA)
Eric Yang created YARN-8569:
---

 Summary: Create an interface to provide cluster information to 
application
 Key: YARN-8569
 URL: https://issues.apache.org/jira/browse/YARN-8569
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Eric Yang


Some program requires container hostnames to be known for application to run.  
For example, distributed tensorflow requires launch_command that looks like:

{code}
# On ps0.example.com:
$ python trainer.py \
 --ps_hosts=ps0.example.com:,ps1.example.com: \
 --worker_hosts=worker0.example.com:,worker1.example.com: \
 --job_name=ps --task_index=0
# On ps1.example.com:
$ python trainer.py \
 --ps_hosts=ps0.example.com:,ps1.example.com: \
 --worker_hosts=worker0.example.com:,worker1.example.com: \
 --job_name=ps --task_index=1
# On worker0.example.com:
$ python trainer.py \
 --ps_hosts=ps0.example.com:,ps1.example.com: \
 --worker_hosts=worker0.example.com:,worker1.example.com: \
 --job_name=worker --task_index=0
# On worker1.example.com:
$ python trainer.py \
 --ps_hosts=ps0.example.com:,ps1.example.com: \
 --worker_hosts=worker0.example.com:,worker1.example.com: \
 --job_name=worker --task_index=1
{code}

This is a bit cumbersome to orchestrate via Distributed Shell, or YARN services 
launch_command.  In addition, the dynamic parameters do not work with YARN flex 
command.  This is the classic pain point for application developer attempt to 
automate system environment settings as parameter to end user application.

It would be great if YARN Docker integration can provide a simple option to 
expose hostnames of the yarn service via a mounted file.  The file content gets 
updated when flex command is performed.  This allows application developer to 
consume system environment settings via a standard interface.  It is like 
/proc/devices for Linux, but for Hadoop.  This may involve updating a file in 
distributed cache, and allow mounting of the file via container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553345#comment-16553345
 ] 

genericqa commented on YARN-8330:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m  
8s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8330 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932745/YARN-8330.3.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 218bb78e037a 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3a9e25e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21346/testReport/ |
| Max. process+thread count | 902 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21346/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> An extra container got launched by RM for 

[jira] [Comment Edited] (YARN-8380) Support bind propagation options for mounts in docker runtime

2018-07-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553321#comment-16553321
 ] 

Eric Yang edited comment on YARN-8380 at 7/23/18 7:47 PM:
--

+1 for patch 003.


was (Author: eyang):
+1

> Support bind propagation options for mounts in docker runtime
> -
>
> Key: YARN-8380
> URL: https://issues.apache.org/jira/browse/YARN-8380
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch
>
>
> The docker run command supports bind propagation options such as shared, but 
> currently we are only supporting ro and rw mount modes in the docker runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8380) Support bind propagation options for mounts in docker runtime

2018-07-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553321#comment-16553321
 ] 

Eric Yang commented on YARN-8380:
-

+1

> Support bind propagation options for mounts in docker runtime
> -
>
> Key: YARN-8380
> URL: https://issues.apache.org/jira/browse/YARN-8380
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch
>
>
> The docker run command supports bind propagation options such as shared, but 
> currently we are only supporting ro and rw mount modes in the docker runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8380) Support bind propagation options for mounts in docker runtime

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553316#comment-16553316
 ] 

genericqa commented on YARN-8380:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
0s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  8m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
46s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
15s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 92m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8380 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932751/YARN-8380.3.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  

[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553259#comment-16553259
 ] 

genericqa commented on YARN-8566:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
21s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
44s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 34s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}140m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Value of errorMessage from previous case is overwritten here due to 
switch statement fall through  At DefaultAMSProcessor.java:case is overwritten 
here due to switch statement fall through  At DefaultAMSProcessor.java:[line 
354] |
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.policies.TestDominantResourceFairnessPolicy
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8566 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-8418) App local logs could leaked if log aggregation fails to initialize for the app

2018-07-23 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553244#comment-16553244
 ] 

Suma Shivaprasad commented on YARN-8418:


[~bibinchundatt] Thanks for the patch. Had a couple of questions on the patch

The following lines have been deleted in the new patch. Wouldnt this cause 
leaks in case of other Exception cases (not Invalid token cases)?

253   appLogAggregators.remove(appId);  
254   closeFileSystems(userUgi);


and I see that the exception is being thrown later i.e after the log 
aggregation wrapper has been scheduled?

285 if (appDirException != null) {
286   throw appDirException;
287 }

This would also cause issues in exception cases, when the scheduled aggregator 
runs?


> App local logs could leaked if log aggregation fails to initialize for the app
> --
>
> Key: YARN-8418
> URL: https://issues.apache.org/jira/browse/YARN-8418
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8418.001.patch, YARN-8418.002.patch, 
> YARN-8418.003.patch, YARN-8418.004.patch, YARN-8418.005.patch
>
>
> If log aggregation fails init createApp directory container logs could get 
> leaked in NM directory
> For log running application restart of NM after token renewal this case is 
> possible/  Application submission with invalid delegation token



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8418) App local logs could leaked if log aggregation fails to initialize for the app

2018-07-23 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553244#comment-16553244
 ] 

Suma Shivaprasad edited comment on YARN-8418 at 7/23/18 6:36 PM:
-

[~bibinchundatt] Thanks for the patch. Had a couple of questions on the patch

The following lines have been deleted in the new patch. Wouldnt this cause 
leaks in case of other Exception cases (not Invalid token cases)?

{noformat}

253   appLogAggregators.remove(appId);  
254   closeFileSystems(userUgi);
{noformat}

and I see that the exception is being thrown later i.e after the log 
aggregation wrapper has been scheduled?

{noformat}
285 if (appDirException != null) {
286   throw appDirException;
287 }
{noformat}
This would also cause issues in exception cases, when the scheduled aggregator 
runs?




was (Author: suma.shivaprasad):
[~bibinchundatt] Thanks for the patch. Had a couple of questions on the patch

The following lines have been deleted in the new patch. Wouldnt this cause 
leaks in case of other Exception cases (not Invalid token cases)?

{noformat}

253   appLogAggregators.remove(appId);  
254   closeFileSystems(userUgi);
{noformat}

and I see that the exception is being thrown later i.e after the log 
aggregation wrapper has been scheduled?

{noformat}
285 if (appDirException != null) {
286   throw appDirException;
287 }

This would also cause issues in exception cases, when the scheduled aggregator 
runs?
{noformat}


> App local logs could leaked if log aggregation fails to initialize for the app
> --
>
> Key: YARN-8418
> URL: https://issues.apache.org/jira/browse/YARN-8418
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8418.001.patch, YARN-8418.002.patch, 
> YARN-8418.003.patch, YARN-8418.004.patch, YARN-8418.005.patch
>
>
> If log aggregation fails init createApp directory container logs could get 
> leaked in NM directory
> For log running application restart of NM after token renewal this case is 
> possible/  Application submission with invalid delegation token



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8418) App local logs could leaked if log aggregation fails to initialize for the app

2018-07-23 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553244#comment-16553244
 ] 

Suma Shivaprasad edited comment on YARN-8418 at 7/23/18 6:36 PM:
-

[~bibinchundatt] Thanks for the patch. Had a couple of questions on the patch

The following lines have been deleted in the new patch. Wouldnt this cause 
leaks in case of other Exception cases (not Invalid token cases)?

{noformat}

253   appLogAggregators.remove(appId);  
254   closeFileSystems(userUgi);
{noformat}

and I see that the exception is being thrown later i.e after the log 
aggregation wrapper has been scheduled?

{noformat}
285 if (appDirException != null) {
286   throw appDirException;
287 }

This would also cause issues in exception cases, when the scheduled aggregator 
runs?
{noformat}



was (Author: suma.shivaprasad):
[~bibinchundatt] Thanks for the patch. Had a couple of questions on the patch

The following lines have been deleted in the new patch. Wouldnt this cause 
leaks in case of other Exception cases (not Invalid token cases)?

253   appLogAggregators.remove(appId);  
254   closeFileSystems(userUgi);


and I see that the exception is being thrown later i.e after the log 
aggregation wrapper has been scheduled?

285 if (appDirException != null) {
286   throw appDirException;
287 }

This would also cause issues in exception cases, when the scheduled aggregator 
runs?


> App local logs could leaked if log aggregation fails to initialize for the app
> --
>
> Key: YARN-8418
> URL: https://issues.apache.org/jira/browse/YARN-8418
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8418.001.patch, YARN-8418.002.patch, 
> YARN-8418.003.patch, YARN-8418.004.patch, YARN-8418.005.patch
>
>
> If log aggregation fails init createApp directory container logs could get 
> leaked in NM directory
> For log running application restart of NM after token renewal this case is 
> possible/  Application submission with invalid delegation token



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553227#comment-16553227
 ] 

Hudson commented on YARN-6966:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14617 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14617/])
YARN-6966. NodeManager metrics may return wrong negative values when NM 
(haibochen: rev 9d3c39e9dd88b8f32223c01328581bb68507d415)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/TestContainerSchedulerRecovery.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/metrics/TestNodeManagerMetrics.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/ContainerScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java


> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7133) Clean up lock-try order in fair scheduler

2018-07-23 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553202#comment-16553202
 ] 

Daniel Templeton commented on YARN-7133:


For one, it's the format set as the standard by Sun.  See 
https://docs.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html

> Clean up lock-try order in fair scheduler
> -
>
> Key: YARN-7133
> URL: https://issues.apache.org/jira/browse/YARN-7133
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: newbie
> Attachments: YARN-7133.001.patch
>
>
> There are many places that follow the pattern:{code}try {
>   lock.lock();
>   ...
> } finally {
>   lock.unlock();
> }{code}
> There are a couple of reasons that's a bad idea.  The correct pattern 
> is:{code}lock.lock();
> try {
>   ...
> } finally {
>   lock.unlock();
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8525) RegistryDNS tcp channel stops working on interrupts

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553201#comment-16553201
 ] 

genericqa commented on YARN-8525:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
55s{color} | {color:green} hadoop-yarn-registry in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8525 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932739/YARN-8525.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 854ade677451 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 84d7bf1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21345/testReport/ |
| Max. process+thread count | 304 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21345/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> 

[jira] [Commented] (YARN-7133) Clean up lock-try order in fair scheduler

2018-07-23 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553197#comment-16553197
 ] 

Haibo Chen commented on YARN-7133:
--

[~snemeth] I am not quite familiar with the reasons why putting lock() inside 
try-catch block is a bad idea. Do you mind sharing some insights in case folks 
like me are wondering?

> Clean up lock-try order in fair scheduler
> -
>
> Key: YARN-7133
> URL: https://issues.apache.org/jira/browse/YARN-7133
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: newbie
> Attachments: YARN-7133.001.patch
>
>
> There are many places that follow the pattern:{code}try {
>   lock.lock();
>   ...
> } finally {
>   lock.unlock();
> }{code}
> There are a couple of reasons that's a bad idea.  The correct pattern 
> is:{code}lock.lock();
> try {
>   ...
> } finally {
>   lock.unlock();
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8380) Support bind propagation options for mounts in docker runtime

2018-07-23 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8380:
-
Description: The docker run command supports bind propagation options such 
as shared, but currently we are only supporting ro and rw mount modes in the 
docker runtime.  (was: The docker run command supports the mount type shared, 
but currently we are only supporting ro and rw mount types in the docker 
runtime.)

> Support bind propagation options for mounts in docker runtime
> -
>
> Key: YARN-8380
> URL: https://issues.apache.org/jira/browse/YARN-8380
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch
>
>
> The docker run command supports bind propagation options such as shared, but 
> currently we are only supporting ro and rw mount modes in the docker runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8380) Support bind propagation options for mounts in docker runtime

2018-07-23 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8380:
-
Summary: Support bind propagation options for mounts in docker runtime  
(was: Support shared mounts in docker runtime)

> Support bind propagation options for mounts in docker runtime
> -
>
> Key: YARN-8380
> URL: https://issues.apache.org/jira/browse/YARN-8380
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch
>
>
> The docker run command supports the mount type shared, but currently we are 
> only supporting ro and rw mount types in the docker runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8380) Support shared mounts in docker runtime

2018-07-23 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553193#comment-16553193
 ] 

Billie Rinaldi commented on YARN-8380:
--

Patch 3 supports formats 1, 2, 3, and 4 with a + sign separating the 
propagation option in format 2. It keeps the original configs 
docker.allowed.rw-mounts and docker.allowed.ro-mounts.

> Support shared mounts in docker runtime
> ---
>
> Key: YARN-8380
> URL: https://issues.apache.org/jira/browse/YARN-8380
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch
>
>
> The docker run command supports the mount type shared, but currently we are 
> only supporting ro and rw mount types in the docker runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8380) Support shared mounts in docker runtime

2018-07-23 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8380:
-
Attachment: YARN-8380.3.patch

> Support shared mounts in docker runtime
> ---
>
> Key: YARN-8380
> URL: https://issues.apache.org/jira/browse/YARN-8380
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8380.1.patch, YARN-8380.2.patch, YARN-8380.3.patch
>
>
> The docker run command supports the mount type shared, but currently we are 
> only supporting ro and rw mount types in the docker runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-23 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553183#comment-16553183
 ] 

Haibo Chen commented on YARN-6966:
--

TestContainerSchedulerRecovery.createRecoveredContainerState() is still longer 
than 80 characters, I'll fix it while committing the patch.  +1.

I'll create another Jira for the TODO comment I came across.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553180#comment-16553180
 ] 

Eric Yang commented on YARN-8330:
-

If information is collected for end user consumption to understand their 
application usage, then by collecting RUNNING state might be sufficient.  End 
user should not be penalized for YARN framework deficiency.  If information is 
collected for system administrator to understand the cluster health and isolate 
which node is potentially causing container to fail, then reporting 
ALLOCATED/ACQUIRED is preferable.  Timeline server is optimized for end user 
application reporting, the extra data seems unnecessary at this time  However, 
more information is collected, it is easier to avoid writing similar code 
twice.  The report filtering can be done at Timeline server to fulfill both use 
cases.  It could be a problem to handy cap the data collection toward one use 
case only.

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch, YARN-8330.2.patch, YARN-8330.3.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional 

[jira] [Commented] (YARN-6539) Create SecureLogin inside Router

2018-07-23 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553178#comment-16553178
 ] 

Giovanni Matteo Fumarola commented on YARN-6539:


Hi [~Dillon.] . I do not have any knowledge about what should be done in 
"secureLogin".
ResourceManager and NodeManager have the same function. 
I added it to keep some legacy with the RM and NM. 

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-23 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553169#comment-16553169
 ] 

Suma Shivaprasad commented on YARN-8330:


Thanks [~jlowe] [~rohithsharma] [~leftnoteasy] [~eyang] for reviewing the patch 
and feedback. Have updated the patch to publish containerCreated events only 
when the container reaches RUNNING state.

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch, YARN-8330.2.patch, YARN-8330.3.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-23 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8330:
---
Attachment: YARN-8330.3.patch

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch, YARN-8330.2.patch, YARN-8330.3.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-23 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8330:
---
Attachment: YARN-8360.2.patch

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch, YARN-8330.2.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-23 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8330:
---
Attachment: (was: YARN-8360.2.patch)

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch, YARN-8330.2.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8360) Yarn service conflict between restart policy and NM configuration

2018-07-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553142#comment-16553142
 ] 

Hudson commented on YARN-8360:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14615 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14615/])
YARN-8360. Improve YARN service restart policy and node manager auto (eyang: 
rev 84d7bf1eeff6b9418361afa4aa713e5e6f771365)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/ComponentRestartPolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/AlwaysRestartPolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/containerlaunch/TestAbstractLauncher.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/provider/AbstractProviderService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/OnFailureRestartPolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/ServiceTestUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/NeverRestartPolicy.java


> Yarn service conflict between restart policy and NM configuration 
> --
>
> Key: YARN-8360
> URL: https://issues.apache.org/jira/browse/YARN-8360
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Chandni Singh
>Assignee: Suma Shivaprasad
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8360.1.patch
>
>
> For the below spec, the service will not stop even after container failures 
> because of the NM auto retry properties :
>  * "yarn.service.container-failure.retry.max": 1,
>  * "yarn.service.container-failure.validity-interval-ms": 5000
>  The NM will continue auto-restarting containers.
>  {{fail_after 20}} fails after 20 seconds. Since the validity failure 
> interval is 5 seconds, NM will auto restart the container.
> {code:java}
> {
>   "name": "fail-demo2",
>   "version": "1.0.0",
>   "components" :
>   [
> {
>   "name": "comp1",
>   "number_of_containers": 1,
>   "launch_command": "fail_after 20",
>   "restart_policy": "NEVER",
>   "resource": {
> "cpus": 1,
> "memory": "256"
>   },
>   "configuration": {
> "properties": {
>   "yarn.service.container-failure.retry.max": 1,
>   "yarn.service.container-failure.validity-interval-ms": 5000
> }
>   }
> }
>   ]
> }
> {code}
> If {{restart_policy}} is NEVER, then the service should stop after the 
> container fails.
> Since we have introduced, the service level Restart Policies, I think we 
> should make the NM auto retry configurations part of the {{RetryPolicy}} and 
> get rid of all {{yarn.service.container-failure.**}} properties. Otherwise it 
> gets confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8360) Yarn service conflict between restart policy and NM configuration

2018-07-23 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8360:

Fix Version/s: 3.1.2
   3.2.0

> Yarn service conflict between restart policy and NM configuration 
> --
>
> Key: YARN-8360
> URL: https://issues.apache.org/jira/browse/YARN-8360
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Chandni Singh
>Assignee: Suma Shivaprasad
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8360.1.patch
>
>
> For the below spec, the service will not stop even after container failures 
> because of the NM auto retry properties :
>  * "yarn.service.container-failure.retry.max": 1,
>  * "yarn.service.container-failure.validity-interval-ms": 5000
>  The NM will continue auto-restarting containers.
>  {{fail_after 20}} fails after 20 seconds. Since the validity failure 
> interval is 5 seconds, NM will auto restart the container.
> {code:java}
> {
>   "name": "fail-demo2",
>   "version": "1.0.0",
>   "components" :
>   [
> {
>   "name": "comp1",
>   "number_of_containers": 1,
>   "launch_command": "fail_after 20",
>   "restart_policy": "NEVER",
>   "resource": {
> "cpus": 1,
> "memory": "256"
>   },
>   "configuration": {
> "properties": {
>   "yarn.service.container-failure.retry.max": 1,
>   "yarn.service.container-failure.validity-interval-ms": 5000
> }
>   }
> }
>   ]
> }
> {code}
> If {{restart_policy}} is NEVER, then the service should stop after the 
> container fails.
> Since we have introduced, the service level Restart Policies, I think we 
> should make the NM auto retry configurations part of the {{RetryPolicy}} and 
> get rid of all {{yarn.service.container-failure.**}} properties. Otherwise it 
> gets confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553130#comment-16553130
 ] 

genericqa commented on YARN-8566:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 
19s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}153m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8566 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932719/YARN-8566.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 32e9cf8cba11 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bbe2f62 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| whitespace | 

[jira] [Commented] (YARN-8525) RegistryDNS tcp channel stops working on interrupts

2018-07-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553126#comment-16553126
 ] 

Eric Yang commented on YARN-8525:
-

- Patch 003 prevents thread from exiting on InterruptException or 
ClosedByInterruptException.

> RegistryDNS tcp channel stops working on interrupts
> ---
>
> Key: YARN-8525
> URL: https://issues.apache.org/jira/browse/YARN-8525
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8525.001.patch, YARN-8525.002.patch, 
> YARN-8525.003.patch
>
>
> While waiting for request for registryDNS, Thread.sleep might send interrupt 
> exception.  This is currently not properly handled in registryDNS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8525) RegistryDNS tcp channel stops working on interrupts

2018-07-23 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8525:

Attachment: YARN-8525.003.patch

> RegistryDNS tcp channel stops working on interrupts
> ---
>
> Key: YARN-8525
> URL: https://issues.apache.org/jira/browse/YARN-8525
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8525.001.patch, YARN-8525.002.patch, 
> YARN-8525.003.patch
>
>
> While waiting for request for registryDNS, Thread.sleep might send interrupt 
> exception.  This is currently not properly handled in registryDNS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8567) Fetching yarn logs fails for long running application if it is not present in timeline store

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553114#comment-16553114
 ] 

genericqa commented on YARN-8567:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m  
1s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8567 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932723/YARN-8567.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 60c67b3d24de 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bbe2f62 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21343/testReport/ |
| Max. process+thread count | 706 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21343/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Fetching 

[jira] [Commented] (YARN-8360) Yarn service conflict between restart policy and NM configuration

2018-07-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553099#comment-16553099
 ] 

Eric Yang commented on YARN-8360:
-

[~suma.shivaprasad] Thanks for the information.  I had an environment that one 
of the node manager is failing, and this skewed the result of the container 
restart count by NM and AM.  The patch looks correct.

> Yarn service conflict between restart policy and NM configuration 
> --
>
> Key: YARN-8360
> URL: https://issues.apache.org/jira/browse/YARN-8360
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Chandni Singh
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8360.1.patch
>
>
> For the below spec, the service will not stop even after container failures 
> because of the NM auto retry properties :
>  * "yarn.service.container-failure.retry.max": 1,
>  * "yarn.service.container-failure.validity-interval-ms": 5000
>  The NM will continue auto-restarting containers.
>  {{fail_after 20}} fails after 20 seconds. Since the validity failure 
> interval is 5 seconds, NM will auto restart the container.
> {code:java}
> {
>   "name": "fail-demo2",
>   "version": "1.0.0",
>   "components" :
>   [
> {
>   "name": "comp1",
>   "number_of_containers": 1,
>   "launch_command": "fail_after 20",
>   "restart_policy": "NEVER",
>   "resource": {
> "cpus": 1,
> "memory": "256"
>   },
>   "configuration": {
> "properties": {
>   "yarn.service.container-failure.retry.max": 1,
>   "yarn.service.container-failure.validity-interval-ms": 5000
> }
>   }
> }
>   ]
> }
> {code}
> If {{restart_policy}} is NEVER, then the service should stop after the 
> container fails.
> Since we have introduced, the service level Restart Policies, I think we 
> should make the NM auto retry configurations part of the {{RetryPolicy}} and 
> get rid of all {{yarn.service.container-failure.**}} properties. Otherwise it 
> gets confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-23 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8566:
-
Attachment: YARN-8566.003.patch

> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8566.001.patch, YARN-8566.002.patch, 
> YARN-8566.003.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-23 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553063#comment-16553063
 ] 

Szilard Nemeth commented on YARN-8566:
--

Hi [~bsteinbach]!
Thanks for your comments.
The code indeed more readable with your suggestions.
Please see my updated patch.

> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8566.001.patch, YARN-8566.002.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-23 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553038#comment-16553038
 ] 

Eric Payne commented on YARN-4606:
--

+1
Thanks for all of your work on this JIRA, [~maniraj...@gmail.com]. I will be 
committing this later today or tomorrow.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, 
> YARN-4606.006.patch, YARN-4606.007.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-23 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553032#comment-16553032
 ] 

Antal Bálint Steinbach commented on YARN-8566:
--

Hi [~snemeth] !

 

Thanks for the patch. I only have some minor comments:
 * Maybe it would be good, to add diagnostic text for the 3rd case (UNKNOWN)
 * Using a switch for enums can be less verbose
 * you can extract 
app.getRMAppAttempt(appAttemptId).updateAMLaunchDiagnostics(...

 
{code:java}
// String errorMsg = "";
switch (e.getInvalidResourceType()){
  case GREATER_THEN_MAX_ALLOCATION:
errorMsg = "Cannot allocate containers as resource request is " +
"greater than the maximum allowed allocation!";
break;
  case LESS_THAN_ZERO:
errorMsg = "Cannot allocate containers as resource request is " +
"less than zero!";
  case UNKNOWN: 
  default:
errorMsg = "Cannot allocate containers for some unknown reasons!";
}
app.getRMAppAttempt(appAttemptId).updateAMLaunchDiagnostics(errorMsg);

{code}
 

> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8566.001.patch, YARN-8566.002.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8567) Fetching yarn logs fails for long running application if it is not present in timeline store

2018-07-23 Thread Tarun Parimi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552958#comment-16552958
 ] 

Tarun Parimi commented on YARN-8567:


{{AHSClientImpl#getContainers}} failed because the application entity got 
deleted as it exceeded {{yarn.timeline-service.ttl-ms .}} 

I checked in the debug logs that ClientRMService#getContainers is successful 
since the application is still running and is present in the ResourceManager. 

We seem to be only catching IOException here. Ideally we should catch 
YarnException also in this case so that the response from RM is at least 
returned if the application is found in RM. Attaching a patch for the same.

> Fetching yarn logs fails for long running application if it is not present in 
> timeline store
> 
>
> Key: YARN-8567
> URL: https://issues.apache.org/jira/browse/YARN-8567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8567.001.patch
>
>
> Using yarn logs command for a long running application which has been running 
> longer than the configured timeline service ttl 
> {{yarn.timeline-service.ttl-ms }} fails with the following exception.
> {code:java}
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The entity 
> for application application_152347939332_1 doesn't exist in the timeline 
> store
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:670)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:219)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:211)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172)
> at 
> org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2309)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getContainers(ApplicationHistoryProtocolPBClientImpl.java:183)
> at 
> org.apache.hadoop.yarn.client.api.impl.AHSClientImpl.getContainers(AHSClientImpl.java:151)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainers(YarnClientImpl.java:720)
> at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.getContainerReportsFromRunningApplication(LogsCLI.java:1089)
> at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.getContainersLogRequestForRunningApplication(LogsCLI.java:1064)
> at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:976)
> at org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:300)
> at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:107)
> at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:327)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8567) Fetching yarn logs fails for long running application if it is not present in timeline store

2018-07-23 Thread Tarun Parimi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarun Parimi updated YARN-8567:
---
Attachment: YARN-8567.001.patch

> Fetching yarn logs fails for long running application if it is not present in 
> timeline store
> 
>
> Key: YARN-8567
> URL: https://issues.apache.org/jira/browse/YARN-8567
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8567.001.patch
>
>
> Using yarn logs command for a long running application which has been running 
> longer than the configured timeline service ttl 
> {{yarn.timeline-service.ttl-ms }} fails with the following exception.
> {code:java}
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The entity 
> for application application_152347939332_1 doesn't exist in the timeline 
> store
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:670)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:219)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:211)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172)
> at 
> org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2309)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getContainers(ApplicationHistoryProtocolPBClientImpl.java:183)
> at 
> org.apache.hadoop.yarn.client.api.impl.AHSClientImpl.getContainers(AHSClientImpl.java:151)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainers(YarnClientImpl.java:720)
> at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.getContainerReportsFromRunningApplication(LogsCLI.java:1089)
> at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.getContainersLogRequestForRunningApplication(LogsCLI.java:1064)
> at 
> org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:976)
> at org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:300)
> at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:107)
> at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:327)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8568) Replace the deprecated zk-address property in the HA config example in ResourceManagerHA.md

2018-07-23 Thread JIRA
Antal Bálint Steinbach created YARN-8568:


 Summary: Replace the deprecated zk-address property in the HA 
config example in ResourceManagerHA.md
 Key: YARN-8568
 URL: https://issues.apache.org/jira/browse/YARN-8568
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 3.0.x
Reporter: Antal Bálint Steinbach
Assignee: Antal Bálint Steinbach


yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address

In the example,  "yarn.resourcemanager.zk-address" is used which is deprecated. 
In the description, the property name is correct "hadoop.zk.address".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-23 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8566:
-
Attachment: YARN-8566.002.patch

> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8566.001.patch, YARN-8566.002.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-23 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8566:
-
Attachment: YARN-8566.001.patch

> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8566.001.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-23 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552928#comment-16552928
 ] 

Szilard Nemeth commented on YARN-8566:
--

The first patch will add this to the App attempt's diagnostic info: Cannot 
allocate containers as resource request is greater than the maximum allowed 
allocation!

> Add diagnostic message for unschedulable containers
> ---
>
> Key: YARN-8566
> URL: https://issues.apache.org/jira/browse/YARN-8566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> 
>   10
> 
> 1 mb,2vcores
> 9 mb,4vcores, 0gpu
> 50
> -1.0f
> 2.0
> fair
>   
> 
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-23 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: branch-2.7.2.gpu-port-20180723.patch

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Attachments: GPU locality support for Job scheduling.pdf, 
> branch-2.7.2.gpu-port-20180723.patch, hadoop-2.7.2.gpu-port-20180711.patch, 
> hadoop-2.7.2.gpu-port.patch, hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8567) Fetching yarn logs fails for long running application if it is not present in timeline store

2018-07-23 Thread Tarun Parimi (JIRA)
Tarun Parimi created YARN-8567:
--

 Summary: Fetching yarn logs fails for long running application if 
it is not present in timeline store
 Key: YARN-8567
 URL: https://issues.apache.org/jira/browse/YARN-8567
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.7.0
Reporter: Tarun Parimi
Assignee: Tarun Parimi


Using yarn logs command for a long running application which has been running 
longer than the configured timeline service ttl {{yarn.timeline-service.ttl-ms 
}} fails with the following exception.
{code:java}
Exception in thread "main" 
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: The entity for 
application application_152347939332_1 doesn't exist in the timeline store
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplication(ApplicationHistoryManagerOnTimelineStore.java:670)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:219)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:211)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172)
at 
org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2309)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getContainers(ApplicationHistoryProtocolPBClientImpl.java:183)
at 
org.apache.hadoop.yarn.client.api.impl.AHSClientImpl.getContainers(AHSClientImpl.java:151)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainers(YarnClientImpl.java:720)
at 
org.apache.hadoop.yarn.client.cli.LogsCLI.getContainerReportsFromRunningApplication(LogsCLI.java:1089)
at 
org.apache.hadoop.yarn.client.cli.LogsCLI.getContainersLogRequestForRunningApplication(LogsCLI.java:1064)
at 
org.apache.hadoop.yarn.client.cli.LogsCLI.fetchApplicationLogs(LogsCLI.java:976)
at org.apache.hadoop.yarn.client.cli.LogsCLI.runCommand(LogsCLI.java:300)
at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:107)
at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:327)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-07-23 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552885#comment-16552885
 ] 

Antal Bálint Steinbach edited comment on YARN-8468 at 7/23/18 1:47 PM:
---

Hi [~haibochen] !

I only commented "Thanks for the feedback [~wilfreds]", but I also fixed his 
suggestions. I am sorry for that, please find my responses inline.
 - a {{FSLeafQueue}} and {{FSParentQueue}} always have a parent doing a null 
check on the parent is unneeded. The only queue that does not have a parent is 
the root queue which you already have special cased. _(In some tests sub-queue 
does not have a parent)_
 - {{getMaximumResourceCapability}} must support resource types and not just 
memory and vcores, same as YARN-7556 for this setting (_It returns Resource I 
assume than it is ok with resource types_)
 - {{getMaxAllowedAllocation}} from the NodeTracker support more than just 
memory and vcores, needs to flow through (_It returns Resource I assume than it 
is ok with resource types_)
 - {{FairScheduler}}: Why change the static imports only for a part of the 
config values, either change all or none (none is preferred) (_Fixed_)
 - {{FairSchedulerPage}}: missing toString on the ResourceInfo (_added but I 
can't see why is it necessary_)
 - Testing must also use resource types not only the old configuration type 
like: "memory-mb=5120, test1=4, vcores=2" _(Test added)_
 - {{TestFairScheduler}} Testing must also include failure cases for sub queues 
not just the root queue: setting value on root queue should throw and should 
not be applied (_Fixed_)
 - If this TestQueueMaxContainerAllocationValidator is a new file make sure 
that you add the license etc (_license text added for the new files_)

Balint


was (Author: bsteinbach):
Hi [~haibochen] !

I only commented "Thanks for the feedback [~wilfreds]", but I also fixed his 
suggestions. I am sorry for that, please find my responses inline.
 - a {{FSLeafQueue}} and {{FSParentQueue}} always have a parent doing a null 
check on the parent is unneeded. The only queue that does not have a parent is 
the root queue which you already have special cased. _(In some tests sub-queue 
does not have a parent)_
 - {{getMaximumResourceCapability}} must support resource types and not just 
memory and vcores, same as YARN-7556 for this setting (_It supports Resource I 
assume than it is ok with resource types_)
 - {{getMaxAllowedAllocation}} from the NodeTracker support more than just 
memory and vcores, needs to flow through (_It supports Resource I assume than 
it is ok with resource types_)
 - {{FairScheduler}}: Why change the static imports only for a part of the 
config values, either change all or none (none is preferred) (_Fixed_)
 - {{FairSchedulerPage}}: missing toString on the ResourceInfo (_added but I 
can't see why is it necessary_)
 - Testing must also use resource types not only the old configuration type 
like: "memory-mb=5120, test1=4, vcores=2" _(Test added)_
 - {{TestFairScheduler}} Testing must also include failure cases for sub queues 
not just the root queue: setting value on root queue should throw and should 
not be applied (_Fixed_)
 - If this TestQueueMaxContainerAllocationValidator is a new file make sure 
that you add the license etc (_license text added for the new files_)

Balint

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
>  Labels: patch
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created 

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-07-23 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552885#comment-16552885
 ] 

Antal Bálint Steinbach commented on YARN-8468:
--

Hi [~haibochen] !

I only commented "Thanks for the feedback [~wilfreds]", but I also fixed his 
suggestions. I am sorry for that, please find my responses inline.
 - a {{FSLeafQueue}} and {{FSParentQueue}} always have a parent doing a null 
check on the parent is unneeded. The only queue that does not have a parent is 
the root queue which you already have special cased. _(In some tests sub-queue 
does not have a parent)_
 - {{getMaximumResourceCapability}} must support resource types and not just 
memory and vcores, same as YARN-7556 for this setting (_It supports Resource I 
assume than it is ok with resource types_)
 - {{getMaxAllowedAllocation}} from the NodeTracker support more than just 
memory and vcores, needs to flow through (_It supports Resource I assume than 
it is ok with resource types_)
 - {{FairScheduler}}: Why change the static imports only for a part of the 
config values, either change all or none (none is preferred) (_Fixed_)
 - {{FairSchedulerPage}}: missing toString on the ResourceInfo (_added but I 
can't see why is it necessary_)
 - Testing must also use resource types not only the old configuration type 
like: "memory-mb=5120, test1=4, vcores=2" _(Test added)_
 - {{TestFairScheduler}} Testing must also include failure cases for sub queues 
not just the root queue: setting value on root queue should throw and should 
not be applied (_Fixed_)
 - If this TestQueueMaxContainerAllocationValidator is a new file make sure 
that you add the license etc (_license text added for the new files_)

Balint

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
>  Labels: patch
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-23 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8566:


 Summary: Add diagnostic message for unschedulable containers
 Key: YARN-8566
 URL: https://issues.apache.org/jira/browse/YARN-8566
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


If a queue is configured with maxResources set to 0 for a resource, and an 
application is submitted to that queue that requests that resource, that 
application will remain pending until it is removed or moved to a different 
queue. This behavior can be realized without extended resources, but it’s 
unlikely a user will create a queue that allows 0 memory or CPU. As the number 
of resources in the system increases, this scenario will become more common, 
and it will become harder to recognize these cases. Therefore, the scheduler 
should indicate in the diagnostic string for an application if it was not 
scheduled because of a 0 maxResources setting.

Example configuration (fair-scheduler.xml) : 

{code:java}

  10

1 mb,2vcores
9 mb,4vcores, 0gpu
50
-1.0f
2.0
fair
  


{code}

Command: 

{code:java}
yarn jar 
"./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
-Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
{code}

The job hangs and the application diagnostic info is empty.
Given that an exception is thrown before any mapper/reducer container is 
created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552812#comment-16552812
 ] 

genericqa commented on YARN-8559:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 32s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}134m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8559 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932684/YARN-8559.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8fc74729784c 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bbe2f62 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21341/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21341/testReport/ |
| Max. process+thread count | 929 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-23 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552698#comment-16552698
 ] 

Manikandan R commented on YARN-4606:


Unit test failure is not related to this patch

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, 
> YARN-4606.006.patch, YARN-4606.007.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread Anna Savarin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552697#comment-16552697
 ] 

Anna Savarin commented on YARN-8559:


Sweet!

> Expose scheduling configuration info in Resource Manager's /conf endpoint
> -
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552646#comment-16552646
 ] 

Weiwei Yang commented on YARN-8559:
---

Sure [~banditka], I updated that in v2 patch. Doing an extra check before and 
after making changes in one case, hope that helps.

> Expose scheduling configuration info in Resource Manager's /conf endpoint
> -
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8559:
--
Attachment: YARN-8559.002.patch

> Expose scheduling configuration info in Resource Manager's /conf endpoint
> -
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread Anna Savarin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552616#comment-16552616
 ] 

Anna Savarin commented on YARN-8559:


Thank you for the patch, [~cheersyang].  Do you think it makes sense to first 
make a change to the config in your test and making sure that the updated value 
is retrieved, in addition to the standard retrieval?

> Expose scheduling configuration info in Resource Manager's /conf endpoint
> -
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: YARN-8559.001.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552584#comment-16552584
 ] 

genericqa commented on YARN-8559:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m  
5s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 10s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}125m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.policies.TestDominantResourceFairnessPolicy
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8559 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932663/YARN-8559.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f9cabc63c9d8 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bbe2f62 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21338/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21338/testReport/ |
| Max. process+thread count | 941 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552575#comment-16552575
 ] 

genericqa commented on YARN-8548:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
40s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 77m 
35s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}176m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8548 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932657/YARN-8548-005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2edbc766dbe6 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bbe2f62 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| whitespace | 

[jira] [Comment Edited] (YARN-8553) Reduce complexity of AHSWebService getApps method

2018-07-23 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552541#comment-16552541
 ] 

Szilard Nemeth edited comment on YARN-8553 at 7/23/18 9:52 AM:
---

Hi [~rohithsharma]!
First patch is ready for review!
I performed a similar refactor like in RMWebServices, i.e. building the 
GetAppRequest with the builder.
The loop iterating on appReports with plenty of continue statements still looks 
pretty bad but I think that's not in the scope of this jira.


was (Author: snemeth):
Hi [~rohithsharma]!
First patch is ready for review!
I performed a similar refactor like in RMWebServices.
The loop iterating on appReports with plenty of continue statements still looks 
pretty bad but I think that's not in the scope of this jira.

> Reduce complexity of AHSWebService getApps method
> -
>
> Key: YARN-8553
> URL: https://issues.apache.org/jira/browse/YARN-8553
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8553.001.patch
>
>
> YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in 
> AHSWebservice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8553) Reduce complexity of AHSWebService getApps method

2018-07-23 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552541#comment-16552541
 ] 

Szilard Nemeth edited comment on YARN-8553 at 7/23/18 9:51 AM:
---

Hi [~rohithsharma]!
First patch is ready for review!
I performed a similar refactor like in RMWebServices.
The loop iterating on appReports with plenty of continue statements still looks 
pretty bad but I think that's not in the scope of this jira.


was (Author: snemeth):
Hi [~rohithsharma]!
First patch is ready for review!

> Reduce complexity of AHSWebService getApps method
> -
>
> Key: YARN-8553
> URL: https://issues.apache.org/jira/browse/YARN-8553
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8553.001.patch
>
>
> YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in 
> AHSWebservice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6539) Create SecureLogin inside Router

2018-07-23 Thread Dillon Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552559#comment-16552559
 ] 

Dillon Zhang commented on YARN-6539:


Hi [~giovanni.fumarola], can you please describe what's "secureLogin" mean. If 
you haven't working on this issue, I'd like to take it, thank  you.

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8553) Reduce complexity of AHSWebService getApps method

2018-07-23 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552541#comment-16552541
 ] 

Szilard Nemeth commented on YARN-8553:
--

Hi [~rohithsharma]!
First patch is ready for review!

> Reduce complexity of AHSWebService getApps method
> -
>
> Key: YARN-8553
> URL: https://issues.apache.org/jira/browse/YARN-8553
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8553.001.patch
>
>
> YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in 
> AHSWebservice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8553) Reduce complexity of AHSWebService getApps method

2018-07-23 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8553:
-
Attachment: YARN-8553.001.patch

> Reduce complexity of AHSWebService getApps method
> -
>
> Key: YARN-8553
> URL: https://issues.apache.org/jira/browse/YARN-8553
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8553.001.patch
>
>
> YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in 
> AHSWebservice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-07-23 Thread niu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552536#comment-16552536
 ] 

niu edited comment on YARN-8513 at 7/23/18 9:21 AM:


We also met this problem in 2.9.1. It is caused by deadlock.


was (Author: hustnn):
We also met this problem in 2.9.1.

> CapacityScheduler infinite loop when queue is near fully utilized
> -
>
> Key: YARN-8513
> URL: https://issues.apache.org/jira/browse/YARN-8513
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.9.1
> Environment: Ubuntu 14.04.5
> YARN is configured with one label and 5 queues.
>Reporter: Chen Yufei
>Priority: Major
> Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, 
> jstack-5.log, top-during-lock.log, top-when-normal.log
>
>
> ResourceManager does not respond to any request when queue is near fully 
> utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM 
> restart, it can recover running jobs and start accepting new ones.
>  
> Seems like CapacityScheduler is in an infinite loop printing out the 
> following log messages (more than 25,000 lines in a second):
>  
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=0.99816763 
> absoluteUsedCapacity=0.99816763 used= 
> cluster=}}
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal}}
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
>  assignedContainer application attempt=appattempt_1530619767030_1652_01 
> container=null 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943
>  clusterResource= type=NODE_LOCAL 
> requestedPartition=}}
>  
> I encounter this problem several times after upgrading to YARN 2.9.1, while 
> the same configuration works fine under version 2.7.3.
>  
> YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a 
> similar problem.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

2018-07-23 Thread niu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552536#comment-16552536
 ] 

niu commented on YARN-8513:
---

We also met this problem in 2.9.1.

> CapacityScheduler infinite loop when queue is near fully utilized
> -
>
> Key: YARN-8513
> URL: https://issues.apache.org/jira/browse/YARN-8513
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.9.1
> Environment: Ubuntu 14.04.5
> YARN is configured with one label and 5 queues.
>Reporter: Chen Yufei
>Priority: Major
> Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, 
> jstack-5.log, top-during-lock.log, top-when-normal.log
>
>
> ResourceManager does not respond to any request when queue is near fully 
> utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM 
> restart, it can recover running jobs and start accepting new ones.
>  
> Seems like CapacityScheduler is in an infinite loop printing out the 
> following log messages (more than 25,000 lines in a second):
>  
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=0.99816763 
> absoluteUsedCapacity=0.99816763 used= 
> cluster=}}
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal}}
> {{2018-07-10 17:16:29,227 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
>  assignedContainer application attempt=appattempt_1530619767030_1652_01 
> container=null 
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943
>  clusterResource= type=NODE_LOCAL 
> requestedPartition=}}
>  
> I encounter this problem several times after upgrading to YARN 2.9.1, while 
> the same configuration works fine under version 2.7.3.
>  
> YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a 
> similar problem.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552507#comment-16552507
 ] 

Weiwei Yang commented on YARN-8559:
---

A simple patch is now available for this issue ;)

> Expose scheduling configuration info in Resource Manager's /conf endpoint
> -
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: YARN-8559.001.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed

2018-07-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-7748:
--
Attachment: YARN-7748.002.patch

> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted 
> failed
> 
>
> Key: YARN-7748
> URL: https://issues.apache.org/jira/browse/YARN-7748
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.0.0
>Reporter: Haibo Chen
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-7748.001.patch, YARN-7748.002.patch
>
>
> TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted
> Failing for the past 1 build (Since Failed#19244 )
> Took 0.4 sec.
> *Error Message*
> expected null, but 
> was:
> *Stacktrace*
> {code}
> java.lang.AssertionError: expected null, but 
> was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8036) Memory Available shows a negative value after running updateNodeResource

2018-07-23 Thread Charan Hebri (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552493#comment-16552493
 ] 

Charan Hebri commented on YARN-8036:


[~Zian Chen] This may have been fixed by YARN-8464. I haven't seen this issue 
recently.

> Memory Available shows a negative value after running updateNodeResource
> 
>
> Key: YARN-8036
> URL: https://issues.apache.org/jira/browse/YARN-8036
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Zian Chen
>Priority: Major
>
> Running updateNodeResource for a node that already has applications running 
> on it doesn't update Memory Available with the right values. It may end up 
> showing negative values based on the requirements of the application. 
> Attached a screenshot for reference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler

2018-07-23 Thread zhuqi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-8564:

Description: I wish to have application lifetime monitor for queue level in 
FairSheduler. In our large yarn cluster, sometimes there are too many small 
jobs in one minor queue but may run too long, it may cause our long time job in 
our high priority and very important queue pending too many times. If we can 
have queue level application lifetime monitor in the queue level, and set small 
lifetime in the minor queue, i think it may can solve our problems before the 
global schedulering can be used in FairScheduler.  (was: I wish to have 
application lifetime monitor for queue level. In our large yarn cluster, 
sometimes there are too many small jobs in one minor queue but may run too 
long, it may cause our long time job in our high priority and very important 
queue pending too many times. If we can have queue level application lifetime 
monitor in the queue level, and set small lifetime in the minor queue, i think 
it may can solve our problems before the global schedulering can be used in 
FairScheduler.)

> Add queue level application lifetime monitor in FairScheduler 
> --
>
> Key: YARN-8564
> URL: https://issues.apache.org/jira/browse/YARN-8564
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: zhuqi
>Priority: Major
>
> I wish to have application lifetime monitor for queue level in FairSheduler. 
> In our large yarn cluster, sometimes there are too many small jobs in one 
> minor queue but may run too long, it may cause our long time job in our high 
> priority and very important queue pending too many times. If we can have 
> queue level application lifetime monitor in the queue level, and set small 
> lifetime in the minor queue, i think it may can solve our problems before the 
> global schedulering can be used in FairScheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler

2018-07-23 Thread zhuqi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-8564:

Description: I wish to have application lifetime monitor for queue level. 
In our large yarn cluster, sometimes there are too many small jobs in one minor 
queue but may run too long, it may cause our long time job in our high priority 
and very important queue pending too many times. If we can have queue level 
application lifetime monitor in the queue level, and set small lifetime in the 
minor queue, i think it may can solve our problems before the global 
schedulering can be used in FairScheduler.  (was: I wish to have application 
lifetime monitor for queue level. In our large yarn cluster, sometimes there 
are too many small jobs in one minor queue but may run too long, it may cause 
our long time job in our high priority queue pending too many times. If we can 
have queue level application lifetime monitor in the queue level, and set small 
lifetime in the minor queue, i think it can solve our problems.)

> Add queue level application lifetime monitor in FairScheduler 
> --
>
> Key: YARN-8564
> URL: https://issues.apache.org/jira/browse/YARN-8564
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: zhuqi
>Priority: Major
>
> I wish to have application lifetime monitor for queue level. In our large 
> yarn cluster, sometimes there are too many small jobs in one minor queue but 
> may run too long, it may cause our long time job in our high priority and 
> very important queue pending too many times. If we can have queue level 
> application lifetime monitor in the queue level, and set small lifetime in 
> the minor queue, i think it may can solve our problems before the global 
> schedulering can be used in FairScheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler

2018-07-23 Thread zhuqi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-8564:

Description: I wish to have application lifetime monitor for queue level. 
In our large yarn cluster, sometimes there are too many small jobs in one minor 
queue but may run too long, it may cause our long time job in our high priority 
queue pending too many times. If we can have queue level application lifetime 
monitor in the queue level, and set small lifetime in the minor queue, i think 
it can solve our problems.  (was: Wish to have application lifetime monitor for 
queue level. In our large yarn cluster, sometimes there are too many small jobs 
in one minor queue but may run too long, it may cause our long time job in our 
high priority queue pending too many times. If we can have queue level 
application lifetime monitor in the queue level, and set small lifetime in the 
minor queue, i think it can solve our problems.)

> Add queue level application lifetime monitor in FairScheduler 
> --
>
> Key: YARN-8564
> URL: https://issues.apache.org/jira/browse/YARN-8564
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: zhuqi
>Priority: Major
>
> I wish to have application lifetime monitor for queue level. In our large 
> yarn cluster, sometimes there are too many small jobs in one minor queue but 
> may run too long, it may cause our long time job in our high priority queue 
> pending too many times. If we can have queue level application lifetime 
> monitor in the queue level, and set small lifetime in the minor queue, i 
> think it can solve our problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8564) Add queue level application lifetime monitor in FairScheduler

2018-07-23 Thread zhuqi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552453#comment-16552453
 ] 

zhuqi commented on YARN-8564:
-

Hi [~rohithsharma] :

I have changed my description, thanks.

> Add queue level application lifetime monitor in FairScheduler 
> --
>
> Key: YARN-8564
> URL: https://issues.apache.org/jira/browse/YARN-8564
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: zhuqi
>Priority: Major
>
> Wish to have application lifetime monitor for queue level. In our large yarn 
> cluster, sometimes there are too many small jobs in one minor queue but may 
> run too long, it may cause our long time job in our high priority queue 
> pending too many times. If we can have queue level application lifetime 
> monitor in the queue level, and set small lifetime in the minor queue, i 
> think it can solve our problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler

2018-07-23 Thread zhuqi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-8564:

Description: Wish to have application lifetime monitor for queue level. In 
our large yarn cluster, sometimes there are too many small jobs in one minor 
queue but may run too long, it may cause our long time job in our high priority 
queue pending too many times. If we can have queue level application lifetime 
monitor in the queue level, and set small lifetime in the minor queue, i think 
it can solve our problems.  (was: Now the application life time monitor feature 
is only worked with the Capacity Scheduler, but it is not supported with Fair 
Scheduler. Our company wish to use this feature in our product environment in 
order to make our  big growing  clusters more stable and efficient .)

> Add queue level application lifetime monitor in FairScheduler 
> --
>
> Key: YARN-8564
> URL: https://issues.apache.org/jira/browse/YARN-8564
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: zhuqi
>Priority: Major
>
> Wish to have application lifetime monitor for queue level. In our large yarn 
> cluster, sometimes there are too many small jobs in one minor queue but may 
> run too long, it may cause our long time job in our high priority queue 
> pending too many times. If we can have queue level application lifetime 
> monitor in the queue level, and set small lifetime in the minor queue, i 
> think it can solve our problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned YARN-8559:
-

Assignee: Weiwei Yang

> Expose scheduling configuration info in Resource Manager's /conf endpoint
> -
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Minor
> Attachments: YARN-8559.001.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8559:
--
Attachment: YARN-8559.001.patch

> Expose scheduling configuration info in Resource Manager's /conf endpoint
> -
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Priority: Minor
> Attachments: YARN-8559.001.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8559:
--
Target Version/s: 3.2.0, 3.0.3, 3.1.2  (was: 3.0.0, 3.1.0)

> Expose scheduling configuration info in Resource Manager's /conf endpoint
> -
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Priority: Minor
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552245#comment-16552245
 ] 

Weiwei Yang edited comment on YARN-8559 at 7/23/18 7:42 AM:


Thanks [~leftnoteasy], that's correct. "scheduler-conf" right now is a PUT for 
refreshing CS configs. There is "http://RM_ADDR/scheduler; to retrieve 
scheduler info, see more at [apache 
doc|http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API].
 It dumps scheduler info but not directly from conf. I agree to add a GET for 
this endpoint too, so user could first query existing CS configuration, then 
update via mutable API and at last query again to make sure it is all good.

Let me reopen the ticket to get this done.

Thanks


was (Author: cheersyang):
Thanks [~leftnoteasy], that's correct. "scheduler-conf" right now is a PUT for 
refreshing CS configs. There is "http://RM_ADDR/scheduler; to retrieve 
scheduler info, see more at [apache 
doc|http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API].
 It dumps scheduler info but not directly from conf. I agree to add a GET for 
this endpoint too.
 Let me reopen this JIRA for more discussion then.
 Thanks

> Expose scheduling configuration info in Resource Manager's /conf endpoint
> -
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Priority: Minor
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8559) Expose scheduling configuration info in Resource Manager's /conf endpoint

2018-07-23 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552245#comment-16552245
 ] 

Weiwei Yang edited comment on YARN-8559 at 7/23/18 7:40 AM:


Thanks [~leftnoteasy], that's correct. "scheduler-conf" right now is a PUT for 
refreshing CS configs. There is "http://RM_ADDR/scheduler; to retrieve 
scheduler info, see more at [apache 
doc|http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API].
 It dumps scheduler info but not directly from conf. I agree to add a GET for 
this endpoint too.
 Let me reopen this JIRA for more discussion then.
 Thanks


was (Author: cheersyang):
Thanks [~leftnoteasy], that's correct. "scheduler-conf" right now is a PUT for 
refreshing CS configs. There is "http://RM_ADDR/scheduler; to retrieve 
scheduler info, see more at [apache 
doc|http://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API].
 It dumps scheduler info but not directly from conf. I am thinking since we 
support the mutable configuration via REST (scheduler-conf), we should support 
GET too. Thoughts [~leftnoteasy], [~banditka]?
Let me reopen this JIRA for more discussion then.
Thanks

> Expose scheduling configuration info in Resource Manager's /conf endpoint
> -
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Priority: Minor
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8564) Add queue level application lifetime monitor in FairScheduler

2018-07-23 Thread zhuqi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-8564:

Summary: Add queue level application lifetime monitor in FairScheduler   
(was: Add application lifetime monitor in FairScheduler)

> Add queue level application lifetime monitor in FairScheduler 
> --
>
> Key: YARN-8564
> URL: https://issues.apache.org/jira/browse/YARN-8564
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: zhuqi
>Priority: Major
>
> Now the application life time monitor feature is only worked with the 
> Capacity Scheduler, but it is not supported with Fair Scheduler. Our company 
> wish to use this feature in our product environment in order to make our  big 
> growing  clusters more stable and efficient .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-23 Thread Bilwa S T (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552403#comment-16552403
 ] 

Bilwa S T commented on YARN-8548:
-

[~sunilg] thanks for reviewing...comment is handled

> AllocationRespose proto setNMToken initBuilder not done
> ---
>
> Key: YARN-8548
> URL: https://issues.apache.org/jira/browse/YARN-8548
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-8548-001.patch, YARN-8548-002.patch, 
> YARN-8548-003.patch, YARN-8548-004.patch, YARN-8548-005.patch
>
>
> Distributed Scheduling allocate failing
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1445)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1355)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy85.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-23 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-8548:

Attachment: YARN-8548-005.patch

> AllocationRespose proto setNMToken initBuilder not done
> ---
>
> Key: YARN-8548
> URL: https://issues.apache.org/jira/browse/YARN-8548
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-8548-001.patch, YARN-8548-002.patch, 
> YARN-8548-003.patch, YARN-8548-004.patch, YARN-8548-005.patch
>
>
> Distributed Scheduling allocate failing
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1445)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1355)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy85.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >