[ 
https://issues.apache.org/jira/browse/YARN-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9393:
---------------------------------
    Affects Version/s: 3.2.1

> Asking for more resources than cluster capacity errors are handled in a 
> different layer for custom resources
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-9393
>                 URL: https://issues.apache.org/jira/browse/YARN-9393
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.2.1
>            Reporter: Szilard Nemeth
>            Priority: Major
>
> *1. If I start an MR sleep job, asking for more memory than the cluster has:*
>  
> Command: 
> {code:java}
> MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.map.resource.memory-mb=8000 
> -Dmapreduce.map.resource.resource1=5000M 1 1000;popd{code}
>  
> Error message (coming from 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#handleMapContainerRequest):
>  
> {code:java}
> 2019-03-16 13:14:58,963 INFO mapreduce.Job: Job job_1552766296556_0003 failed 
> with state KILLED due to: The required MAP capability is more than the 
> supported max container capability in the cluster. Killing the Job. 
> mapResourceRequest: <memory:8000, vCores:1, resource1: 5000M> 
> maxContainerCapability:<memory:6144, vCores:8, resource1: 6000000000> Job 
> received Kill while in RUNNING state.{code}
>  
>  
>  
> *2. If I start an MR sleep job, asking for more vcores than the cluster has:*
>  
> Command: 
> {code:java}
> MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.map.resource.memory-mb=2000 -Dmapreduce.map.resource.vcores=9 
> -Dmapreduce.map.resource.resource1=5000M 1 1000;popd{code}
>  
> Error message (coming from: 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#handleMapContainerRequest)
>  
> {code:java}
> 2019-03-16 13:17:59,546 INFO mapreduce.Job: Job job_1552766296556_0005 failed 
> with state KILLED due to: The required MAP capability is more than the 
> supported max container capability in the cluster. Killing the Job. 
> mapResourceRequest: <memory:2000, vCores:9, resource1: 5000M> 
> maxContainerCapability:<memory:6144, vCores:8, resource1: 6000000000> Job 
> received Kill while in RUNNING state{code}
>  
>  
> *3. However, if I start an MR sleep job, asking for more amount of 
> "resource1" than the cluster has:*
>  
> Command:
> {code:java}
> MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.map.resource.memory-mb=200 
> -Dmapreduce.map.resource.resource1=18G 1 1000;popd{code}
>  
>  
> Error stacktrace (coming from ResourceManager / 
> *ApplicationMasterService.allocate)*
> 2019-03-16 15:05:32,893 WARN
> {code:java}
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor: Invalid 
> resource ask by application appattempt_1552773851229_0001_000001 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocatio n. Requested resource type=[resource1], 
> Requested resource=<memory:200, vCores:1, resource1: 18000000000>, maximum 
> allowed allocation=<memory:6144, vCores:8, resource1: 6000000000>, p lease 
> note that maximum allowed allocation is calculated by scheduler based on 
> maximum resource of registered NodeManagers, which might be less than 
> configured maximum allocation=<mem ory:8192, vCores:8192, resource1: 
> 9223372036854775807>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:316)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:294)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:302)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:259)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:243)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93)
>          at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:429)
>          at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>          at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>          at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531)
>          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)         at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)         at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)         at 
> java.security.AccessController.doPrivileged(Native Method)         at 
> javax.security.auth.Subject.doAs(Subject.java:422)         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2827) 
> 2019-03-16 15:05:32,894 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 28 on default port 8030, call Call#37 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allo cate from 
> 172.28.196.136:40734 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocatio n. Requested resource type=[resource1], 
> Requested resource=<memory:200, vCores:1, resource1: 18000000000>, maximum 
> allowed allocation=<memory:6144, vCores:8, resource1: 6000000000>, p lease 
> note that maximum allowed allocation is calculated by scheduler based on 
> maximum resource of registered NodeManagers, which might be less than 
> configured maximum allocation=<mem ory:8192, vCores:8192, resource1: 
> 9223372036854775807>{code}
>  
>  
> For normal resources, exceeding the cluster capacity is handled by the MR 
> client (RMContainerAllocator), but for custom resources it's handled in 
> ApplicationMasterService.allocate, meaning that the AM was created and it 
> fails to allocate the mapper container because of the too big request. 
>  
> This behavior is inconsistent, we need to aim for handling all resource 
> similarly. My vote is to add a piece of code to the RMContainerAllocator that 
> should also handle custom resources and fail-fast as like it happens with 
> normal resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to