[jira] [Created] (YARN-9393) Asking for more resources than cluster capacity errors are handled in a different layer for custom resources

Szilard Nemeth (JIRA) Sat, 16 Mar 2019 15:25:46 -0700

Szilard Nemeth created YARN-9393:
------------------------------------

             Summary: Asking for more resources than cluster capacity errors 
are handled in a different layer for custom resources
                 Key: YARN-9393
                 URL: https://issues.apache.org/jira/browse/YARN-9393
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Szilard Nemeth



*1. If I start an MR sleep job, asking for more memory than the cluster has:*
 
Command: 
{code:java}
MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar 
"./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi 
-Dmapreduce.map.resource.memory-mb=8000 
-Dmapreduce.map.resource.resource1=5000M 1 1000;popd{code}
 
Error message (coming from 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#handleMapContainerRequest):
 
{code:java}
2019-03-16 13:14:58,963 INFO mapreduce.Job: Job job_1552766296556_0003 failed 
with state KILLED due to: The required MAP capability is more than the 
supported max container capability in the cluster. Killing the Job. 
mapResourceRequest: <memory:8000, vCores:1, resource1: 5000M> 
maxContainerCapability:<memory:6144, vCores:8, resource1: 6000000000> Job 
received Kill while in RUNNING state.{code}
 
 
 
*2. If I start an MR sleep job, asking for more vcores than the cluster has:*
 
Command: 
{code:java}
MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar 
"./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi 
-Dmapreduce.map.resource.memory-mb=2000 -Dmapreduce.map.resource.vcores=9 
-Dmapreduce.map.resource.resource1=5000M 1 1000;popd{code}
 
Error message (coming from: 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#handleMapContainerRequest)
 
{code:java}
2019-03-16 13:17:59,546 INFO mapreduce.Job: Job job_1552766296556_0005 failed 
with state KILLED due to: The required MAP capability is more than the 
supported max container capability in the cluster. Killing the Job. 
mapResourceRequest: <memory:2000, vCores:9, resource1: 5000M> 
maxContainerCapability:<memory:6144, vCores:8, resource1: 6000000000> Job 
received Kill while in RUNNING state{code}
 
 
*3. However, if I start an MR sleep job, asking for more amount of "resource1" 
than the cluster has:*
 
Command:
{code:java}
MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar 
"./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi 
-Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=18G 1 
1000;popd{code}
 
 
Error stacktrace (coming from ResourceManager / 
*ApplicationMasterService.allocate)*
2019-03-16 15:05:32,893 WARN
{code:java}
org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor: Invalid 
resource ask by application appattempt_1552773851229_0001_000001 
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request! Cannot allocate containers as requested resource is greater 
than maximum allowed allocatio n. Requested resource type=[resource1], 
Requested resource=<memory:200, vCores:1, resource1: 18000000000>, maximum 
allowed allocation=<memory:6144, vCores:8, resource1: 6000000000>, p lease note 
that maximum allowed allocation is calculated by scheduler based on maximum 
resource of registered NodeManagers, which might be less than configured 
maximum allocation=<mem ory:8192, vCores:8192, resource1: 9223372036854775807>  
       at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:316)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:294)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:302)
         at 
org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:259)
         at 
org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:243)
         at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
         at 
org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93)
         at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:429)
         at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
         at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)         at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)         at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)         at 
java.security.AccessController.doPrivileged(Native Method)         at 
javax.security.auth.Subject.doAs(Subject.java:422)         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2827) 
2019-03-16 15:05:32,894 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
28 on default port 8030, call Call#37 Retry#0 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allo cate from 
172.28.196.136:40734 
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request! Cannot allocate containers as requested resource is greater 
than maximum allowed allocatio n. Requested resource type=[resource1], 
Requested resource=<memory:200, vCores:1, resource1: 18000000000>, maximum 
allowed allocation=<memory:6144, vCores:8, resource1: 6000000000>, p lease note 
that maximum allowed allocation is calculated by scheduler based on maximum 
resource of registered NodeManagers, which might be less than configured 
maximum allocation=<mem ory:8192, vCores:8192, resource1: 
9223372036854775807>{code}
 
 
For normal resources, exceeding the cluster capacity is handled by the MR 
client (RMContainerAllocator), but for custom resources it's handled in 
ApplicationMasterService.allocate, meaning that the AM was created and it fails 
to allocate the mapper container because of the too big request. 
 
This behavior is inconsistent, we need to aim for handling all resource 
similarly. My vote is to add a piece of code to the RMContainerAllocator that 
should also handle custom resources and fail-fast as like it happens with 
normal resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (YARN-9393) Asking for more resources than cluster capacity errors are handled in a different layer for custom resources

Reply via email to