[jira] [Commented] (YARN-9693) When AMRMProxyService is enabled RMCommunicator will register with failure
[ https://issues.apache.org/jira/browse/YARN-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986819#comment-16986819 ] panlijie commented on YARN-9693: [~cane] thank you , I will try run this patch > When AMRMProxyService is enabled RMCommunicator will register with failure > -- > > Key: YARN-9693 > URL: https://issues.apache.org/jira/browse/YARN-9693 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Affects Versions: 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9693.001.patch > > > When we enable amrm proxy service, the RMCommunicator will register with > failure below: > {code:java} > 2019-07-23 17:12:44,794 INFO [TaskHeartbeatHandler PingChecker] > org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler > thread interrupted > 2019-07-23 17:12:44,794 ERROR [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid > AMRMToken from appattempt_1563872237585_0001_02 > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:186) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:123) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:280) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:986) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1300) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1768) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1764) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1698) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: > Invalid AMRMToken from appattempt_1563872237585_0001_02 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy93.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:170) > ... 14 more > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > Invalid AMRMToken from
[jira] [Commented] (YARN-10010) NM upload log cost too much time
[ https://issues.apache.org/jira/browse/YARN-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986740#comment-16986740 ] Wilfred Spiegelenburg commented on YARN-10010: -- The thread pool size is configurable via {{yarn.nodemanager.logaggregation.threadpool-size-max}} if you need more threads than you can set it higher. I would recommend that you set it to a number almost as high as the number of simultaneous application you expect in the cluster. I cannot give you an exact number but 100 threads especially in larger clusters might not be enough. Can we also close this as a dupe of YARN-8364 ? > NM upload log cost too much time > > > Key: YARN-10010 > URL: https://issues.apache.org/jira/browse/YARN-10010 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: notfound.png > > > Since thread pool size of log service is 100. > Some times the log uploading service will delay for some apps.like below > !notfound.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10009) DRF can treat minimum user limit percent as a max when custom resource is defined
[ https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10009: -- Component/s: capacity scheduler > DRF can treat minimum user limit percent as a max when custom resource is > defined > - > > Key: YARN-10009 > URL: https://issues.apache.org/jira/browse/YARN-10009 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3, 2.11.0 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10009.UT.patch > > > | |Memory|Vcores|res_1| > |Queue1 Totals|20GB|100|80| > |Resources requested by App1 in Queue1|8GB (40% of total)|8 (8% of total)|80 > (100% of total)| > In the previous use case: > - Queue1 has a value of 25 for {{miminum-user-limit-percent}} > - User1 has requested 8 containers with {{}} > each > - {{res_1}} will be the dominant resource this case. > All 8 containers should be assigned by the capacity scheduler, but with min > user limit pct set to 25, only 2 containers are assigned. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10011) Catch all exception during init app in LogAggregationService
[ https://issues.apache.org/jira/browse/YARN-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-10011: Component/s: nodemanager > Catch all exception during init app in LogAggregationService > -- > > Key: YARN-10011 > URL: https://issues.apache.org/jira/browse/YARN-10011 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > we should catch all exception during init app in LogAggregationService in > case of nm exit > {code:java} > 2019-06-12,09:36:03,652 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:129) > at > org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:118) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2115) > at > org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1300) > at > org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1296) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1312) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:193) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:319) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:116) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10011) Catch all exception during init app in LogAggregationService
zhoukang created YARN-10011: --- Summary: Catch all exception during init app in LogAggregationService Key: YARN-10011 URL: https://issues.apache.org/jira/browse/YARN-10011 Project: Hadoop YARN Issue Type: Bug Reporter: zhoukang Assignee: zhoukang we should catch all exception during init app in LogAggregationService in case of nm exit {code:java} 2019-06-12,09:36:03,652 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:118) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2115) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1300) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1296) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1312) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:193) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:319) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:116) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10009) In Capacity Scheduler, DRF can treat minimum user limit percent as a max when custom resource is defined
[ https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10009: -- Attachment: YARN-10009.001.patch > In Capacity Scheduler, DRF can treat minimum user limit percent as a max when > custom resource is defined > > > Key: YARN-10009 > URL: https://issues.apache.org/jira/browse/YARN-10009 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3, 2.11.0 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10009.001.patch, YARN-10009.UT.patch > > > | |Memory|Vcores|res_1| > |Queue1 Totals|20GB|100|80| > |Resources requested by App1 in Queue1|8GB (40% of total)|8 (8% of total)|80 > (100% of total)| > In the previous use case: > - Queue1 has a value of 25 for {{miminum-user-limit-percent}} > - User1 has requested 8 containers with {{}} > each > - {{res_1}} will be the dominant resource this case. > All 8 containers should be assigned by the capacity scheduler, but with min > user limit pct set to 25, only 2 containers are assigned. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10009) In Capacity Scheduler, DRF can treat minimum user limit percent as a max when custom resource is defined
[ https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10009: -- Summary: In Capacity Scheduler, DRF can treat minimum user limit percent as a max when custom resource is defined (was: DRF can treat minimum user limit percent as a max when custom resource is defined) > In Capacity Scheduler, DRF can treat minimum user limit percent as a max when > custom resource is defined > > > Key: YARN-10009 > URL: https://issues.apache.org/jira/browse/YARN-10009 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3, 2.11.0 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10009.UT.patch > > > | |Memory|Vcores|res_1| > |Queue1 Totals|20GB|100|80| > |Resources requested by App1 in Queue1|8GB (40% of total)|8 (8% of total)|80 > (100% of total)| > In the previous use case: > - Queue1 has a value of 25 for {{miminum-user-limit-percent}} > - User1 has requested 8 containers with {{}} > each > - {{res_1}} will be the dominant resource this case. > All 8 containers should be assigned by the capacity scheduler, but with min > user limit pct set to 25, only 2 containers are assigned. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10009) In Capacity Scheduler, DRC can treat minimum user limit percent as a max when custom resource is defined
[ https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10009: -- Summary: In Capacity Scheduler, DRC can treat minimum user limit percent as a max when custom resource is defined (was: In Capacity Scheduler, DRF can treat minimum user limit percent as a max when custom resource is defined) > In Capacity Scheduler, DRC can treat minimum user limit percent as a max when > custom resource is defined > > > Key: YARN-10009 > URL: https://issues.apache.org/jira/browse/YARN-10009 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3, 2.11.0 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10009.001.patch, YARN-10009.UT.patch > > > | |Memory|Vcores|res_1| > |Queue1 Totals|20GB|100|80| > |Resources requested by App1 in Queue1|8GB (40% of total)|8 (8% of total)|80 > (100% of total)| > In the previous use case: > - Queue1 has a value of 25 for {{miminum-user-limit-percent}} > - User1 has requested 8 containers with {{}} > each > - {{res_1}} will be the dominant resource this case. > All 8 containers should be assigned by the capacity scheduler, but with min > user limit pct set to 25, only 2 containers are assigned. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup
[ https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987105#comment-16987105 ] Eric Payne commented on YARN-9992: -- The code changes look fine, but I'm still trying to understand what is different between trunk and branch-2. These code changes are not in trunk, but something is picking up the resource-types.xml in the CS init path. > Max allocation per queue is zero for custom resource types on RM startup > > > Key: YARN-9992 > URL: https://issues.apache.org/jira/browse/YARN-9992 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9992.001.patch > > > Found an issue where trying to request GPUs on a newly booted RM cannot > schedule. It throws the exception in > SchedulerUtils#throwInvalidResourceException: > {noformat} > throw new InvalidResourceRequestException( > "Invalid resource request, requested resource type=[" + reqResourceName > + "] < 0 or greater than maximum allowed allocation. Requested " > + "resource=" + reqResource + ", maximum allowed allocation=" > + availableResource > + ", please note that maximum allowed allocation is calculated " > + "by scheduler based on maximum resource of registered " > + "NodeManagers, which might be less than configured " > + "maximum allocation=" > + ResourceUtils.getResourceTypesMaximumAllocation());{noformat} > Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works > again. > I think the RC is that upon scheduler refresh, resource-types.xml is loaded > in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call > ResourceUtils#fetchMaximumAllocationFromConfig in > CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to > fetch the {{yarn.resource-types}} config. But resource-types.xml is not > loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find > the custom resource when computing max allocations, and the custom resource > max allocation is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987222#comment-16987222 ] Eric Payne commented on YARN-8292: -- I would like to get this back to branch-2.10. [~sunilg], [~jhung], [~leftnoteasy], can I please request a review? > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8292.001.patch, YARN-8292.002.patch, > YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, > YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch, > YARN-8292.009.patch, YARN-8292.branch-2.009.patch, > YARN-8292.branch-2.010.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987236#comment-16987236 ] Jonathan Hung commented on YARN-8292: - [^YARN-8292.branch-2.010.patch] looks fine to me, Ive kicked precommit, +1 pending jenkins. > Fix the dominant resource preemption cannot happen when some of the resource > vector becomes negative > > > Key: YARN-8292 > URL: https://issues.apache.org/jira/browse/YARN-8292 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sumana Sathish >Assignee: Wangda Tan >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8292.001.patch, YARN-8292.002.patch, > YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, > YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch, > YARN-8292.009.patch, YARN-8292.branch-2.009.patch, > YARN-8292.branch-2.010.patch > > > This is an example of the problem: > > {code} > // guaranteed, max,used, pending > "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root > "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a > "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b > "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c > {code} > There're 3 resource types. Total resource of the cluster is 30:18:6 > For both of a/b, there're 3 containers running, each of container is 2:2:1. > Queue c uses 0 resource, and have 1:1:1 pending resource. > Under existing logic, preemption cannot happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10009) In Capacity Scheduler, DRC can treat minimum user limit percent as a max when custom resource is defined
[ https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987181#comment-16987181 ] Eric Payne commented on YARN-10009: --- [~sunilg], [~jhung], [~leftnoteasy], could I please request that you review this? > In Capacity Scheduler, DRC can treat minimum user limit percent as a max when > custom resource is defined > > > Key: YARN-10009 > URL: https://issues.apache.org/jira/browse/YARN-10009 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3, 2.11.0 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-10009.001.patch, YARN-10009.UT.patch > > > | |Memory|Vcores|res_1| > |Queue1 Totals|20GB|100|80| > |Resources requested by App1 in Queue1|8GB (40% of total)|8 (8% of total)|80 > (100% of total)| > In the previous use case: > - Queue1 has a value of 25 for {{miminum-user-limit-percent}} > - User1 has requested 8 containers with {{}} > each > - {{res_1}} will be the dominant resource this case. > All 8 containers should be assigned by the capacity scheduler, but with min > user limit pct set to 25, only 2 containers are assigned. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10009) In Capacity Scheduler, DRC can treat minimum user limit percent as a max when custom resource is defined
[ https://issues.apache.org/jira/browse/YARN-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987121#comment-16987121 ] Hadoop QA commented on YARN-10009: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 8s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 17s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 16s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 12 unchanged - 0 fixed = 14 total (was 12) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 9s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 21s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}175m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-10009 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987377/YARN-10009.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a031af742580 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0c217fe | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup
[ https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987270#comment-16987270 ] Jonathan Hung commented on YARN-9992: - Hmm, not sure how I missed this before, I think it's related to YARN-9205. Let me try porting that. > Max allocation per queue is zero for custom resource types on RM startup > > > Key: YARN-9992 > URL: https://issues.apache.org/jira/browse/YARN-9992 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9992.001.patch > > > Found an issue where trying to request GPUs on a newly booted RM cannot > schedule. It throws the exception in > SchedulerUtils#throwInvalidResourceException: > {noformat} > throw new InvalidResourceRequestException( > "Invalid resource request, requested resource type=[" + reqResourceName > + "] < 0 or greater than maximum allowed allocation. Requested " > + "resource=" + reqResource + ", maximum allowed allocation=" > + availableResource > + ", please note that maximum allowed allocation is calculated " > + "by scheduler based on maximum resource of registered " > + "NodeManagers, which might be less than configured " > + "maximum allocation=" > + ResourceUtils.getResourceTypesMaximumAllocation());{noformat} > Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works > again. > I think the RC is that upon scheduler refresh, resource-types.xml is loaded > in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call > ResourceUtils#fetchMaximumAllocationFromConfig in > CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to > fetch the {{yarn.resource-types}} config. But resource-types.xml is not > loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find > the custom resource when computing max allocations, and the custom resource > max allocation is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)
[ https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987304#comment-16987304 ] Jonathan Hung commented on YARN-9205: - attached [^YARN-9205-branch-2.001.patch] which contains trivial changes: * Set ResourceUtils#resetResourceTypes to public (originally done in YARN-7119) * Change TestResourceProfiles.TEST_CONF_RESET_RESOURCE_TYPES to TestResourceUtils.TEST_CONF_RESET_RESOURCE_TYPES in TestCSAllocateCustomResource Committed to branch-2, branch-2.10 > When using custom resource type, application will fail to run due to the > CapacityScheduler throws > InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) > --- > > Key: YARN-9205 > URL: https://issues.apache.org/jira/browse/YARN-9205 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Critical > Fix For: 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-9205-branch-2.001.patch, > YARN-9205-branch-3.1.001.patch, YARN-9205-branch-3.2.001.patch, > YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, > YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch, > YARN-9205-trunk.005.patch, YARN-9205-trunk.006.patch, > YARN-9205-trunk.007.patch, YARN-9205-trunk.008.patch, > YARN-9205-trunk.009.patch > > > In a non-secure cluster. Reproduce it as follows: > # Set capacity scheduler in yarn-site.xml > # Use default capacity-scheduler.xml > # Set custom resource type "cmp.com/hdw" in resource-types.xml > # Set a value say 10 in node-resources.xml > # Start cluster > # Submit a distribute shell application which requests some "cmp.com/hdw" > The AM will get an exception from CapacityScheduler and then failed. This bug > doesn't exist in FairScheduler. > {code:java} > 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: > GUARANTEED, Enforce Execution Type: false}]Resource Profile[] > 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request! Cannot allocate containers as requested resource is greater > than maximum allowed allocation. Requested resource type=[cmp.com/hdw], > Requested resource=, maximum allowed > allocation=, please note that maximum allowed > allocation is calculated by scheduler based on maximum resource of registered > NodeManagers, which might be less than configured maximum > allocation= > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > ...{code} > Did a roughly debugging, below method return the wrong maximum capacity. > DefaultAMSProcessor.java, Line 234. > {code:java} > Resource maximumCapacity = > getScheduler().getMaximumResourceCapability(app.getQueue());{code} > The above code seems should return "" > but returns "". > This incorrect value might be caused by queue maximum allocation calculation > involved in YARN-8720: > AbstractCSQueue.java Line364 > {code:java} > this.maximumAllocation = > configuration.getMaximumAllocationPerQueue( > getQueuePath());{code} > And this invokes CapacitySchedulerConfiguration.java Line 895: > {code:java} > Resource
[jira] [Updated] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)
[ https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9205: Fix Version/s: 2.11.0 2.10.1 > When using custom resource type, application will fail to run due to the > CapacityScheduler throws > InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) > --- > > Key: YARN-9205 > URL: https://issues.apache.org/jira/browse/YARN-9205 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Critical > Fix For: 3.1.2, 3.3.0, 3.2.1, 2.10.1, 2.11.0 > > Attachments: YARN-9205-branch-2.001.patch, > YARN-9205-branch-3.1.001.patch, YARN-9205-branch-3.2.001.patch, > YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, > YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch, > YARN-9205-trunk.005.patch, YARN-9205-trunk.006.patch, > YARN-9205-trunk.007.patch, YARN-9205-trunk.008.patch, > YARN-9205-trunk.009.patch > > > In a non-secure cluster. Reproduce it as follows: > # Set capacity scheduler in yarn-site.xml > # Use default capacity-scheduler.xml > # Set custom resource type "cmp.com/hdw" in resource-types.xml > # Set a value say 10 in node-resources.xml > # Start cluster > # Submit a distribute shell application which requests some "cmp.com/hdw" > The AM will get an exception from CapacityScheduler and then failed. This bug > doesn't exist in FairScheduler. > {code:java} > 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: > GUARANTEED, Enforce Execution Type: false}]Resource Profile[] > 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request! Cannot allocate containers as requested resource is greater > than maximum allowed allocation. Requested resource type=[cmp.com/hdw], > Requested resource=, maximum allowed > allocation=, please note that maximum allowed > allocation is calculated by scheduler based on maximum resource of registered > NodeManagers, which might be less than configured maximum > allocation= > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > ...{code} > Did a roughly debugging, below method return the wrong maximum capacity. > DefaultAMSProcessor.java, Line 234. > {code:java} > Resource maximumCapacity = > getScheduler().getMaximumResourceCapability(app.getQueue());{code} > The above code seems should return "" > but returns "". > This incorrect value might be caused by queue maximum allocation calculation > involved in YARN-8720: > AbstractCSQueue.java Line364 > {code:java} > this.maximumAllocation = > configuration.getMaximumAllocationPerQueue( > getQueuePath());{code} > And this invokes CapacitySchedulerConfiguration.java Line 895: > {code:java} > Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this); > {code} > Passing a "this" which is not a YarnConfiguration instance will cause below > code return null for resource names and then only contains mandatory > resources. This might be the root cause. > {code:java} > private static Map >
[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative
[ https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987319#comment-16987319 ] Hadoop QA commented on YARN-8292: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 36s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 58s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 52s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 33s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 49s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 46s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 37s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 54s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 50s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 7 new + 97 unchanged - 0 fixed = 104 total (was 97) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 2s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 3s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}150m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:f555aa740b5 | | JIRA Issue | YARN-8292 | | JIRA Patch URL |
[jira] [Updated] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)
[ https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9205: Attachment: YARN-9205-branch-2.001.patch > When using custom resource type, application will fail to run due to the > CapacityScheduler throws > InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) > --- > > Key: YARN-9205 > URL: https://issues.apache.org/jira/browse/YARN-9205 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Critical > Fix For: 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-9205-branch-2.001.patch, > YARN-9205-branch-3.1.001.patch, YARN-9205-branch-3.2.001.patch, > YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, > YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch, > YARN-9205-trunk.005.patch, YARN-9205-trunk.006.patch, > YARN-9205-trunk.007.patch, YARN-9205-trunk.008.patch, > YARN-9205-trunk.009.patch > > > In a non-secure cluster. Reproduce it as follows: > # Set capacity scheduler in yarn-site.xml > # Use default capacity-scheduler.xml > # Set custom resource type "cmp.com/hdw" in resource-types.xml > # Set a value say 10 in node-resources.xml > # Start cluster > # Submit a distribute shell application which requests some "cmp.com/hdw" > The AM will get an exception from CapacityScheduler and then failed. This bug > doesn't exist in FairScheduler. > {code:java} > 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[ 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: > GUARANTEED, Enforce Execution Type: false}]Resource Profile[] > 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request! Cannot allocate containers as requested resource is greater > than maximum allowed allocation. Requested resource type=[cmp.com/hdw], > Requested resource=, maximum allowed > allocation=, please note that maximum allowed > allocation is calculated by scheduler based on maximum resource of registered > NodeManagers, which might be less than configured maximum > allocation= > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > ...{code} > Did a roughly debugging, below method return the wrong maximum capacity. > DefaultAMSProcessor.java, Line 234. > {code:java} > Resource maximumCapacity = > getScheduler().getMaximumResourceCapability(app.getQueue());{code} > The above code seems should return "" > but returns "". > This incorrect value might be caused by queue maximum allocation calculation > involved in YARN-8720: > AbstractCSQueue.java Line364 > {code:java} > this.maximumAllocation = > configuration.getMaximumAllocationPerQueue( > getQueuePath());{code} > And this invokes CapacitySchedulerConfiguration.java Line 895: > {code:java} > Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this); > {code} > Passing a "this" which is not a YarnConfiguration instance will cause below > code return null for resource names and then only contains mandatory > resources. This might be the root cause. > {code:java} > private static Map >
[jira] [Created] (YARN-10012) Guaranteed and max capacity queue metrics for custom resources
Jonathan Hung created YARN-10012: Summary: Guaranteed and max capacity queue metrics for custom resources Key: YARN-10012 URL: https://issues.apache.org/jira/browse/YARN-10012 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Hung YARN-9085 adds support for guaranteed/maxcapacity MB/vcores. We should add the same for custom resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9958) Remove the invalid lock in ContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved YARN-9958. - Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Committed to trunk. Thanks [~jiwq] for the contribution and thanks [~Tao Yang] for the review. > Remove the invalid lock in ContainerExecutor > > > Key: YARN-9958 > URL: https://issues.apache.org/jira/browse/YARN-9958 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > > ContainerExecutor has ReadLock and WriteLock. These used to call get/put > method of ConcurrentMap. Due to the ConcurrentMap providing thread safety and > atomicity guarantees, so we can remove the lock. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9958) Remove the invalid lock in ContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987565#comment-16987565 ] Hudson commented on YARN-9958: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17718 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17718/]) YARN-9958. Remove the invalid lock in ContainerExecutor (#1704) (aajisaka: rev c48de9aa2ddf7622648c4410612ffc035861df63) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java > Remove the invalid lock in ContainerExecutor > > > Key: YARN-9958 > URL: https://issues.apache.org/jira/browse/YARN-9958 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > > ContainerExecutor has ReadLock and WriteLock. These used to call get/put > method of ConcurrentMap. Due to the ConcurrentMap providing thread safety and > atomicity guarantees, so we can remove the lock. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org