[
https://issues.apache.org/jira/browse/YARN-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Szilard Nemeth updated YARN-9393:
---------------------------------
Affects Version/s: 3.2.1
> Asking for more resources than cluster capacity errors are handled in a
> different layer for custom resources
> ------------------------------------------------------------------------------------------------------------
>
> Key: YARN-9393
> URL: https://issues.apache.org/jira/browse/YARN-9393
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.2.1
> Reporter: Szilard Nemeth
> Priority: Major
>
> *1. If I start an MR sleep job, asking for more memory than the cluster has:*
>
> Command:
> {code:java}
> MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar"
> pi -Dmapreduce.map.resource.memory-mb=8000
> -Dmapreduce.map.resource.resource1=5000M 1 1000;popd{code}
>
> Error message (coming from
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#handleMapContainerRequest):
>
> {code:java}
> 2019-03-16 13:14:58,963 INFO mapreduce.Job: Job job_1552766296556_0003 failed
> with state KILLED due to: The required MAP capability is more than the
> supported max container capability in the cluster. Killing the Job.
> mapResourceRequest: <memory:8000, vCores:1, resource1: 5000M>
> maxContainerCapability:<memory:6144, vCores:8, resource1: 6000000000> Job
> received Kill while in RUNNING state.{code}
>
>
>
> *2. If I start an MR sleep job, asking for more vcores than the cluster has:*
>
> Command:
> {code:java}
> MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar"
> pi -Dmapreduce.map.resource.memory-mb=2000 -Dmapreduce.map.resource.vcores=9
> -Dmapreduce.map.resource.resource1=5000M 1 1000;popd{code}
>
> Error message (coming from:
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#handleMapContainerRequest)
>
> {code:java}
> 2019-03-16 13:17:59,546 INFO mapreduce.Job: Job job_1552766296556_0005 failed
> with state KILLED due to: The required MAP capability is more than the
> supported max container capability in the cluster. Killing the Job.
> mapResourceRequest: <memory:2000, vCores:9, resource1: 5000M>
> maxContainerCapability:<memory:6144, vCores:8, resource1: 6000000000> Job
> received Kill while in RUNNING state{code}
>
>
> *3. However, if I start an MR sleep job, asking for more amount of
> "resource1" than the cluster has:*
>
> Command:
> {code:java}
> MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar"
> pi -Dmapreduce.map.resource.memory-mb=200
> -Dmapreduce.map.resource.resource1=18G 1 1000;popd{code}
>
>
> Error stacktrace (coming from ResourceManager /
> *ApplicationMasterService.allocate)*
> 2019-03-16 15:05:32,893 WARN
> {code:java}
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor: Invalid
> resource ask by application appattempt_1552773851229_0001_000001
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid
> resource request! Cannot allocate containers as requested resource is greater
> than maximum allowed allocatio n. Requested resource type=[resource1],
> Requested resource=<memory:200, vCores:1, resource1: 18000000000>, maximum
> allowed allocation=<memory:6144, vCores:8, resource1: 6000000000>, p lease
> note that maximum allowed allocation is calculated by scheduler based on
> maximum resource of registered NodeManagers, which might be less than
> configured maximum allocation=<mem ory:8192, vCores:8192, resource1:
> 9223372036854775807> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:316)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:294)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:302)
> at
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:259)
> at
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:243)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
> at
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:429)
> at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) at
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2827)
> 2019-03-16 15:05:32,894 INFO org.apache.hadoop.ipc.Server: IPC Server handler
> 28 on default port 8030, call Call#37 Retry#0
> org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allo cate from
> 172.28.196.136:40734
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid
> resource request! Cannot allocate containers as requested resource is greater
> than maximum allowed allocatio n. Requested resource type=[resource1],
> Requested resource=<memory:200, vCores:1, resource1: 18000000000>, maximum
> allowed allocation=<memory:6144, vCores:8, resource1: 6000000000>, p lease
> note that maximum allowed allocation is calculated by scheduler based on
> maximum resource of registered NodeManagers, which might be less than
> configured maximum allocation=<mem ory:8192, vCores:8192, resource1:
> 9223372036854775807>{code}
>
>
> For normal resources, exceeding the cluster capacity is handled by the MR
> client (RMContainerAllocator), but for custom resources it's handled in
> ApplicationMasterService.allocate, meaning that the AM was created and it
> fails to allocate the mapper container because of the too big request.
>
> This behavior is inconsistent, we need to aim for handling all resource
> similarly. My vote is to add a piece of code to the RMContainerAllocator that
> should also handle custom resources and fail-fast as like it happens with
> normal resources.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]