[ https://issues.apache.org/jira/browse/YARN-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Szilard Nemeth updated YARN-8202: --------------------------------- Attachment: (was: YARN-8202-006.patch) > DefaultAMSProcessor should properly check units of requested custom resource > types against minimum/maximum allocation > --------------------------------------------------------------------------------------------------------------------- > > Key: YARN-8202 > URL: https://issues.apache.org/jira/browse/YARN-8202 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Szilard Nemeth > Assignee: Szilard Nemeth > Priority: Blocker > Attachments: YARN-8202-001.patch, YARN-8202-002.patch, > YARN-8202-003.patch, YARN-8202-004.patch, YARN-8202-005.patch > > > > When I execute a pi job with arguments: > {code:java} > -Dmapreduce.map.resource.memory-mb=200 > -Dmapreduce.map.resource.resource1=500M 1 1000{code} > and I have one node with 5GB of resource1, I get the following exception on > every second and the job hangs: > {code:java} > 2018-04-24 08:42:03,694 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 20 on 8030, call Call#386 Retry#0 > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from > 172.31.119.172:58138 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested resource type=[resource1] < 0 or greater than > maximum allowed allocation. Requested resource=<memory:200, vCores:1, > resource1: 500M>, maximum allowed allocation=<memory:6144, vCores:8, > resource1: 5G>, please note that maximum allowed allocation is calculated by > scheduler based on maximum resource of registered NodeManagers, which might > be less than configured maximum allocation=<memory:8192, vCores:8192, > resource1: 9223372036854775807G> > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:286) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:242) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:258) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:249) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:230) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > {code} > *This is because > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils#validateResourceRequest > does not take resource units into account.* > > However, if I start a job with arguments: > {code:java} > -Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=1G > 1 1000{code} > and I still have 5GB of resource1 on one node then the job runs successfully. > > I also tried a third job run, when I request 1GB of resource1 and I have no > nodes with any amount of resource1, then I restart the node with 5GBs of > resource1, the job ultimately completes, but just after the node with enough > resources registered in RM, which is the desired behaviour. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org