[jira] [Commented] (YARN-9693) When AMRMProxyService is enabled RMCommunicator will register with failure

2019-12-03 Thread panlijie (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986819#comment-16986819
 ] 

panlijie commented on YARN-9693:


[~cane] thank you , I will try run this patch

> When AMRMProxyService is enabled RMCommunicator will register with failure
> --
>
> Key: YARN-9693
> URL: https://issues.apache.org/jira/browse/YARN-9693
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9693.001.patch
>
>
> When we enable amrm proxy service, the  RMCommunicator will register with 
> failure below:
> {code:java}
> 2019-07-23 17:12:44,794 INFO [TaskHeartbeatHandler PingChecker] 
> org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler 
> thread interrupted
> 2019-07-23 17:12:44,794 ERROR [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid 
> AMRMToken from appattempt_1563872237585_0001_02
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:186)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:123)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:280)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:986)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1300)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1768)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1764)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1698)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: 
> Invalid AMRMToken from appattempt_1563872237585_0001_02
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy93.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:170)
>   ... 14 more
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Invalid AMRMToken from 

[jira] [Commented] (YARN-10010) NM upload log cost too much time

2019-12-03 Thread Wilfred Spiegelenburg (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986740#comment-16986740
 ] 

Wilfred Spiegelenburg commented on YARN-10010:
--

The thread pool size is configurable via 
{{yarn.nodemanager.logaggregation.threadpool-size-max}} if you need more 
threads than you can set it higher.
I would recommend that you set it to a number almost as high as the number of 
simultaneous application you expect in the cluster. I cannot give you an exact 
number but 100 threads especially in larger clusters might not be enough.

Can we also close this as a dupe of YARN-8364 ?

> NM upload log cost too much time
> 
>
> Key: YARN-10010
> URL: https://issues.apache.org/jira/browse/YARN-10010
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: notfound.png
>
>
> Since thread pool size of log service is 100.
> Some times the log uploading service will delay for some apps.like below
>  !notfound.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10009) DRF can treat minimum user limit percent as a max when custom resource is defined

2019-12-03 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10009:
--
Component/s: capacity scheduler

> DRF can treat minimum user limit percent as a max when custom resource is 
> defined
> -
>
> Key: YARN-10009
> URL: https://issues.apache.org/jira/browse/YARN-10009
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3, 2.11.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10009.UT.patch
>
>
> | |Memory|Vcores|res_1|
> |Queue1 Totals|20GB|100|80|
> |Resources requested by App1 in Queue1|8GB (40% of total)|8 (8% of total)|80 
> (100% of total)|
> In the previous use case:
>  - Queue1 has a value of 25 for {{miminum-user-limit-percent}}
>  - User1 has requested 8 containers with {{}} 
> each
>  - {{res_1}} will be the dominant resource this case.
> All 8 containers should be assigned by the capacity scheduler, but with min 
> user limit pct set to 25, only 2 containers are assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10011) Catch all exception during init app in LogAggregationService

2019-12-03 Thread zhoukang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated YARN-10011:

Component/s: nodemanager

> Catch all exception  during init app in LogAggregationService 
> --
>
> Key: YARN-10011
> URL: https://issues.apache.org/jira/browse/YARN-10011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> we should catch all exception during init app in LogAggregationService in 
> case of nm exit 
> {code:java}
> 2019-06-12,09:36:03,652 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.IllegalStateException
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
> at 
> org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:118)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2115)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1300)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1296)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1312)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:193)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:319)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:116)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10011) Catch all exception during init app in LogAggregationService

2019-12-03 Thread zhoukang (Jira)
zhoukang created YARN-10011:
---

 Summary: Catch all exception  during init app in 
LogAggregationService 
 Key: YARN-10011
 URL: https://issues.apache.org/jira/browse/YARN-10011
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhoukang
Assignee: zhoukang


we should catch all exception during init app in LogAggregationService in case 
of nm exit 
{code:java}
2019-06-12,09:36:03,652 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread
java.lang.IllegalStateException
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:118)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2115)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1300)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1296)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1312)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:193)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:319)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:116)
at java.lang.Thread.run(Thread.java:745)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10009) In Capacity Scheduler, DRF can treat minimum user limit percent as a max when custom resource is defined

2019-12-03 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10009:
--
Attachment: YARN-10009.001.patch

> In Capacity Scheduler, DRF can treat minimum user limit percent as a max when 
> custom resource is defined
> 
>
> Key: YARN-10009
> URL: https://issues.apache.org/jira/browse/YARN-10009
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3, 2.11.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10009.001.patch, YARN-10009.UT.patch
>
>
> | |Memory|Vcores|res_1|
> |Queue1 Totals|20GB|100|80|
> |Resources requested by App1 in Queue1|8GB (40% of total)|8 (8% of total)|80 
> (100% of total)|
> In the previous use case:
>  - Queue1 has a value of 25 for {{miminum-user-limit-percent}}
>  - User1 has requested 8 containers with {{}} 
> each
>  - {{res_1}} will be the dominant resource this case.
> All 8 containers should be assigned by the capacity scheduler, but with min 
> user limit pct set to 25, only 2 containers are assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10009) In Capacity Scheduler, DRF can treat minimum user limit percent as a max when custom resource is defined

2019-12-03 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10009:
--
Summary: In Capacity Scheduler, DRF can treat minimum user limit percent as 
a max when custom resource is defined  (was: DRF can treat minimum user limit 
percent as a max when custom resource is defined)

> In Capacity Scheduler, DRF can treat minimum user limit percent as a max when 
> custom resource is defined
> 
>
> Key: YARN-10009
> URL: https://issues.apache.org/jira/browse/YARN-10009
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3, 2.11.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10009.UT.patch
>
>
> | |Memory|Vcores|res_1|
> |Queue1 Totals|20GB|100|80|
> |Resources requested by App1 in Queue1|8GB (40% of total)|8 (8% of total)|80 
> (100% of total)|
> In the previous use case:
>  - Queue1 has a value of 25 for {{miminum-user-limit-percent}}
>  - User1 has requested 8 containers with {{}} 
> each
>  - {{res_1}} will be the dominant resource this case.
> All 8 containers should be assigned by the capacity scheduler, but with min 
> user limit pct set to 25, only 2 containers are assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10009) In Capacity Scheduler, DRC can treat minimum user limit percent as a max when custom resource is defined

2019-12-03 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10009:
--
Summary: In Capacity Scheduler, DRC can treat minimum user limit percent as 
a max when custom resource is defined  (was: In Capacity Scheduler, DRF can 
treat minimum user limit percent as a max when custom resource is defined)

> In Capacity Scheduler, DRC can treat minimum user limit percent as a max when 
> custom resource is defined
> 
>
> Key: YARN-10009
> URL: https://issues.apache.org/jira/browse/YARN-10009
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3, 2.11.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10009.001.patch, YARN-10009.UT.patch
>
>
> | |Memory|Vcores|res_1|
> |Queue1 Totals|20GB|100|80|
> |Resources requested by App1 in Queue1|8GB (40% of total)|8 (8% of total)|80 
> (100% of total)|
> In the previous use case:
>  - Queue1 has a value of 25 for {{miminum-user-limit-percent}}
>  - User1 has requested 8 containers with {{}} 
> each
>  - {{res_1}} will be the dominant resource this case.
> All 8 containers should be assigned by the capacity scheduler, but with min 
> user limit pct set to 25, only 2 containers are assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup

2019-12-03 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987105#comment-16987105
 ] 

Eric Payne commented on YARN-9992:
--

The code changes look fine, but I'm still trying to understand what is 
different between trunk and branch-2. These code changes are not in trunk, but 
something is picking up the resource-types.xml in the CS init path.

> Max allocation per queue is zero for custom resource types on RM startup
> 
>
> Key: YARN-9992
> URL: https://issues.apache.org/jira/browse/YARN-9992
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9992.001.patch
>
>
> Found an issue where trying to request GPUs on a newly booted RM cannot 
> schedule. It throws the exception in 
> SchedulerUtils#throwInvalidResourceException:
> {noformat}
> throw new InvalidResourceRequestException(
> "Invalid resource request, requested resource type=[" + reqResourceName
> + "] < 0 or greater than maximum allowed allocation. Requested "
> + "resource=" + reqResource + ", maximum allowed allocation="
> + availableResource
> + ", please note that maximum allowed allocation is calculated "
> + "by scheduler based on maximum resource of registered "
> + "NodeManagers, which might be less than configured "
> + "maximum allocation="
> + ResourceUtils.getResourceTypesMaximumAllocation());{noformat}
> Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works 
> again.
> I think the RC is that upon scheduler refresh, resource-types.xml is loaded 
> in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call 
> ResourceUtils#fetchMaximumAllocationFromConfig in 
> CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to 
> fetch the {{yarn.resource-types}} config. But resource-types.xml is not 
> loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find 
> the custom resource when computing max allocations, and the custom resource 
> max allocation is 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2019-12-03 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987222#comment-16987222
 ] 

Eric Payne commented on YARN-8292:
--

I would like to get this back to branch-2.10. [~sunilg], [~jhung], 
[~leftnoteasy], can I please request a review?

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch, 
> YARN-8292.009.patch, YARN-8292.branch-2.009.patch, 
> YARN-8292.branch-2.010.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2019-12-03 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987236#comment-16987236
 ] 

Jonathan Hung commented on YARN-8292:
-

[^YARN-8292.branch-2.010.patch] looks fine to me, Ive kicked precommit, +1 
pending jenkins.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch, 
> YARN-8292.009.patch, YARN-8292.branch-2.009.patch, 
> YARN-8292.branch-2.010.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10009) In Capacity Scheduler, DRC can treat minimum user limit percent as a max when custom resource is defined

2019-12-03 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987181#comment-16987181
 ] 

Eric Payne commented on YARN-10009:
---

[~sunilg], [~jhung], [~leftnoteasy], could I please request that you review 
this?

> In Capacity Scheduler, DRC can treat minimum user limit percent as a max when 
> custom resource is defined
> 
>
> Key: YARN-10009
> URL: https://issues.apache.org/jira/browse/YARN-10009
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3, 2.11.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10009.001.patch, YARN-10009.UT.patch
>
>
> | |Memory|Vcores|res_1|
> |Queue1 Totals|20GB|100|80|
> |Resources requested by App1 in Queue1|8GB (40% of total)|8 (8% of total)|80 
> (100% of total)|
> In the previous use case:
>  - Queue1 has a value of 25 for {{miminum-user-limit-percent}}
>  - User1 has requested 8 containers with {{}} 
> each
>  - {{res_1}} will be the dominant resource this case.
> All 8 containers should be assigned by the capacity scheduler, but with min 
> user limit pct set to 25, only 2 containers are assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10009) In Capacity Scheduler, DRC can treat minimum user limit percent as a max when custom resource is defined

2019-12-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987121#comment-16987121
 ] 

Hadoop QA commented on YARN-10009:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 16s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 12 unchanged - 0 fixed = 14 total (was 12) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
9s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 
21s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}175m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-10009 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12987377/YARN-10009.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a031af742580 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 0c217fe |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup

2019-12-03 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987270#comment-16987270
 ] 

Jonathan Hung commented on YARN-9992:
-

Hmm, not sure how I missed this before, I think it's related to YARN-9205. Let 
me try porting that.

> Max allocation per queue is zero for custom resource types on RM startup
> 
>
> Key: YARN-9992
> URL: https://issues.apache.org/jira/browse/YARN-9992
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9992.001.patch
>
>
> Found an issue where trying to request GPUs on a newly booted RM cannot 
> schedule. It throws the exception in 
> SchedulerUtils#throwInvalidResourceException:
> {noformat}
> throw new InvalidResourceRequestException(
> "Invalid resource request, requested resource type=[" + reqResourceName
> + "] < 0 or greater than maximum allowed allocation. Requested "
> + "resource=" + reqResource + ", maximum allowed allocation="
> + availableResource
> + ", please note that maximum allowed allocation is calculated "
> + "by scheduler based on maximum resource of registered "
> + "NodeManagers, which might be less than configured "
> + "maximum allocation="
> + ResourceUtils.getResourceTypesMaximumAllocation());{noformat}
> Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works 
> again.
> I think the RC is that upon scheduler refresh, resource-types.xml is loaded 
> in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call 
> ResourceUtils#fetchMaximumAllocationFromConfig in 
> CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to 
> fetch the {{yarn.resource-types}} config. But resource-types.xml is not 
> loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find 
> the custom resource when computing max allocations, and the custom resource 
> max allocation is 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-12-03 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987304#comment-16987304
 ] 

Jonathan Hung commented on YARN-9205:
-

attached [^YARN-9205-branch-2.001.patch] which contains trivial changes:
 * Set ResourceUtils#resetResourceTypes to public (originally done in YARN-7119)
 * Change TestResourceProfiles.TEST_CONF_RESET_RESOURCE_TYPES to 
TestResourceUtils.TEST_CONF_RESET_RESOURCE_TYPES in TestCSAllocateCustomResource

Committed to branch-2, branch-2.10

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-9205-branch-2.001.patch, 
> YARN-9205-branch-3.1.001.patch, YARN-9205-branch-3.2.001.patch, 
> YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, 
> YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch, 
> YARN-9205-trunk.005.patch, YARN-9205-trunk.006.patch, 
> YARN-9205-trunk.007.patch, YARN-9205-trunk.008.patch, 
> YARN-9205-trunk.009.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource 

[jira] [Updated] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-12-03 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9205:

Fix Version/s: 2.11.0
   2.10.1

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1, 2.10.1, 2.11.0
>
> Attachments: YARN-9205-branch-2.001.patch, 
> YARN-9205-branch-3.1.001.patch, YARN-9205-branch-3.2.001.patch, 
> YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, 
> YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch, 
> YARN-9205-trunk.005.patch, YARN-9205-trunk.006.patch, 
> YARN-9205-trunk.007.patch, YARN-9205-trunk.008.patch, 
> YARN-9205-trunk.009.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root cause.
> {code:java}
> private static Map 
> 

[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2019-12-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987319#comment-16987319
 ] 

Hadoop QA commented on YARN-8292:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
58s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
52s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
33s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
49s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
46s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
32s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
37s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
54s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 50s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 7 new + 97 unchanged - 0 fixed = 104 total (was 97) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
2s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m  3s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}150m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:f555aa740b5 |
| JIRA Issue | YARN-8292 |
| JIRA Patch URL | 

[jira] [Updated] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-12-03 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9205:

Attachment: YARN-9205-branch-2.001.patch

> When using custom resource type, application will fail to run due to the 
> CapacityScheduler throws 
> InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) 
> ---
>
> Key: YARN-9205
> URL: https://issues.apache.org/jira/browse/YARN-9205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-9205-branch-2.001.patch, 
> YARN-9205-branch-3.1.001.patch, YARN-9205-branch-3.2.001.patch, 
> YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, 
> YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch, 
> YARN-9205-trunk.005.patch, YARN-9205-trunk.006.patch, 
> YARN-9205-trunk.007.patch, YARN-9205-trunk.008.patch, 
> YARN-9205-trunk.009.patch
>
>
> In a non-secure cluster. Reproduce it as follows:
>  # Set capacity scheduler in yarn-site.xml
>  # Use default capacity-scheduler.xml
>  # Set custom resource type "cmp.com/hdw" in resource-types.xml
>  # Set a value say 10 in node-resources.xml
>  # Start cluster
>  # Submit a distribute shell application which requests some "cmp.com/hdw"
> The AM will get an exception from CapacityScheduler and then failed. This bug 
> doesn't exist in FairScheduler.
> {code:java}
> 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested 
> container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: 
> GUARANTEED, Enforce Execution Type: false}]Resource Profile[]
> 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[cmp.com/hdw], 
> Requested resource=, maximum allowed 
> allocation=, please note that maximum allowed 
> allocation is calculated by scheduler based on maximum resource of registered 
> NodeManagers, which might be less than configured maximum 
> allocation=
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> ...{code}
> Did a roughly debugging, below method return the wrong maximum capacity.
> DefaultAMSProcessor.java, Line 234.
> {code:java}
> Resource maximumCapacity =
>  getScheduler().getMaximumResourceCapability(app.getQueue());{code}
> The above code seems should return "" 
> but returns "".
> This incorrect value might be caused by queue maximum allocation calculation 
> involved in YARN-8720:
> AbstractCSQueue.java Line364
> {code:java}
> this.maximumAllocation =
>  configuration.getMaximumAllocationPerQueue(
>  getQueuePath());{code}
> And this invokes CapacitySchedulerConfiguration.java Line 895:
> {code:java}
> Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this);
> {code}
> Passing a "this" which is not a YarnConfiguration instance will cause below 
> code return null for resource names and then only contains mandatory 
> resources. This might be the root cause.
> {code:java}
> private static Map 
> 

[jira] [Created] (YARN-10012) Guaranteed and max capacity queue metrics for custom resources

2019-12-03 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-10012:


 Summary: Guaranteed and max capacity queue metrics for custom 
resources
 Key: YARN-10012
 URL: https://issues.apache.org/jira/browse/YARN-10012
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


YARN-9085 adds support for guaranteed/maxcapacity MB/vcores. We should add the 
same for custom resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9958) Remove the invalid lock in ContainerExecutor

2019-12-03 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved YARN-9958.
-
Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Committed to trunk. Thanks [~jiwq] for the contribution and thanks [~Tao Yang] 
for the review.

> Remove the invalid lock in ContainerExecutor
> 
>
> Key: YARN-9958
> URL: https://issues.apache.org/jira/browse/YARN-9958
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.3.0
>
>
> ContainerExecutor has ReadLock and WriteLock. These used to call get/put 
> method of ConcurrentMap. Due to the ConcurrentMap providing thread safety and 
> atomicity guarantees, so we can remove the lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9958) Remove the invalid lock in ContainerExecutor

2019-12-03 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987565#comment-16987565
 ] 

Hudson commented on YARN-9958:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17718 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17718/])
YARN-9958. Remove the invalid lock in ContainerExecutor (#1704) (aajisaka: rev 
c48de9aa2ddf7622648c4410612ffc035861df63)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java


> Remove the invalid lock in ContainerExecutor
> 
>
> Key: YARN-9958
> URL: https://issues.apache.org/jira/browse/YARN-9958
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Fix For: 3.3.0
>
>
> ContainerExecutor has ReadLock and WriteLock. These used to call get/put 
> method of ConcurrentMap. Due to the ConcurrentMap providing thread safety and 
> atomicity guarantees, so we can remove the lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org