Szilard Nemeth created YARN-9393:
------------------------------------
Summary: Asking for more resources than cluster capacity errors
are handled in a different layer for custom resources
Key: YARN-9393
URL: https://issues.apache.org/jira/browse/YARN-9393
Project: Hadoop YARN
Issue Type: Bug
Reporter: Szilard Nemeth
*1. If I start an MR sleep job, asking for more memory than the cluster has:*
Command:
{code:java}
MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar
"./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi
-Dmapreduce.map.resource.memory-mb=8000
-Dmapreduce.map.resource.resource1=5000M 1 1000;popd{code}
Error message (coming from
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#handleMapContainerRequest):
{code:java}
2019-03-16 13:14:58,963 INFO mapreduce.Job: Job job_1552766296556_0003 failed
with state KILLED due to: The required MAP capability is more than the
supported max container capability in the cluster. Killing the Job.
mapResourceRequest: <memory:8000, vCores:1, resource1: 5000M>
maxContainerCapability:<memory:6144, vCores:8, resource1: 6000000000> Job
received Kill while in RUNNING state.{code}
*2. If I start an MR sleep job, asking for more vcores than the cluster has:*
Command:
{code:java}
MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar
"./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi
-Dmapreduce.map.resource.memory-mb=2000 -Dmapreduce.map.resource.vcores=9
-Dmapreduce.map.resource.resource1=5000M 1 1000;popd{code}
Error message (coming from:
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#handleMapContainerRequest)
{code:java}
2019-03-16 13:17:59,546 INFO mapreduce.Job: Job job_1552766296556_0005 failed
with state KILLED due to: The required MAP capability is more than the
supported max container capability in the cluster. Killing the Job.
mapResourceRequest: <memory:2000, vCores:9, resource1: 5000M>
maxContainerCapability:<memory:6144, vCores:8, resource1: 6000000000> Job
received Kill while in RUNNING state{code}
*3. However, if I start an MR sleep job, asking for more amount of "resource1"
than the cluster has:*
Command:
{code:java}
MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar
"./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi
-Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=18G 1
1000;popd{code}
Error stacktrace (coming from ResourceManager /
*ApplicationMasterService.allocate)*
2019-03-16 15:05:32,893 WARN
{code:java}
org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor: Invalid
resource ask by application appattempt_1552773851229_0001_000001
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid
resource request! Cannot allocate containers as requested resource is greater
than maximum allowed allocatio n. Requested resource type=[resource1],
Requested resource=<memory:200, vCores:1, resource1: 18000000000>, maximum
allowed allocation=<memory:6144, vCores:8, resource1: 6000000000>, p lease note
that maximum allowed allocation is calculated by scheduler based on maximum
resource of registered NodeManagers, which might be less than configured
maximum allocation=<mem ory:8192, vCores:8192, resource1: 9223372036854775807>
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:316)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:294)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:302)
at
org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:259)
at
org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:243)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
at
org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93)
at
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:429)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) at
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2827)
2019-03-16 15:05:32,894 INFO org.apache.hadoop.ipc.Server: IPC Server handler
28 on default port 8030, call Call#37 Retry#0
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allo cate from
172.28.196.136:40734
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid
resource request! Cannot allocate containers as requested resource is greater
than maximum allowed allocatio n. Requested resource type=[resource1],
Requested resource=<memory:200, vCores:1, resource1: 18000000000>, maximum
allowed allocation=<memory:6144, vCores:8, resource1: 6000000000>, p lease note
that maximum allowed allocation is calculated by scheduler based on maximum
resource of registered NodeManagers, which might be less than configured
maximum allocation=<mem ory:8192, vCores:8192, resource1:
9223372036854775807>{code}
For normal resources, exceeding the cluster capacity is handled by the MR
client (RMContainerAllocator), but for custom resources it's handled in
ApplicationMasterService.allocate, meaning that the AM was created and it fails
to allocate the mapper container because of the too big request.
This behavior is inconsistent, we need to aim for handling all resource
similarly. My vote is to add a piece of code to the RMContainerAllocator that
should also handle custom resources and fail-fast as like it happens with
normal resources.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]