[
https://issues.apache.org/jira/browse/YARN-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328552#comment-16328552
]
Feng Yuan commented on YARN-7685:
---------------------------------
Now 2.7.x version do not support labeled resource preemption.
> Preemption does not happen when a node label partition is fully utilized
> ------------------------------------------------------------------------
>
> Key: YARN-7685
> URL: https://issues.apache.org/jira/browse/YARN-7685
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 2.7.3
> Reporter: Prabhu Joseph
> Priority: Major
> Attachments: Screen Shot 2017-12-27 at 3.28.13 PM.png, Screen Shot
> 2017-12-27 at 3.28.20 PM.png, Screen Shot 2017-12-27 at 3.28.32 PM.png,
> Screen Shot 2017-12-27 at 3.31.42 PM.png, capacity-scheduler.xml
>
>
> Have two queues default and tkgrid and two node labels default
> (exclusivity=true) and tkgrid (exclusivity=false)
> default queue = capacity 15% and max capacity is 100% and default node label
> expression is tkgrid
> tkgrid queue = capacity 85% and max capacity is 100% and default node label
> expression is default
> When default queue has occupied the complete node label tkgrid and then a new
> job submitted into tkgrid queue with node label expression tkgrid will wait
> in ACCEPTED state forever as there is no space in tkgrid partition for the
> Application Master. Preemption does not kick in for this scenario.
> Attached capacity-scheduler.xml, RM UI, Nodes and Node Labels screenshot.
> {code}
> Repro Steps:
> [yarn@bigdata3 root]$ yarn cluster --list-node-labels
> Node Labels: <tkgrid:exclusivity=false>
> Job 1 submitted into default queue which has utilized complete tkgrid node
> label partition:
> yarn jar
> /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
> -master_memory 2048 -container_memory 2048 -shell_command sleep -shell_args
> 2h -timeout 7200000 -jar
> /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
> -queue default -num_containers 20
> Job 2 submitted into tkgrid queue with AM node label expression as tkgrid
> which stays at ACCEPTED state forever
> yarn jar
> /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
> -master_memory 2048 -container_memory 2048 -shell_command sleep -shell_args
> 2h -timeout 7200000 -jar
> /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
> -queue tkgrid -node_label_expression tkgrid -num_containers 20
> 17/12/27 09:31:48 INFO distributedshell.Client: Got application report from
> ASM for, appId=5, clientToAMToken=null, appDiagnostics=[Wed Dec 27 09:31:39
> +0000 2017] Application is Activated, waiting for resources to be assigned
> for AM. Details : AM Partition = tkgrid ; Partition Resource =
> <memory:35840, vCores:56> ; Queue's Absolute capacity = 85.0 % ; Queue's
> Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; ,
> appMasterHost=N/A, appQueue=tkgrid, appMasterRpcPort=-1,
> appStartTime=1514367099792, yarnAppState=ACCEPTED,
> distributedFinalState=UNDEFINED,
> appTrackingUrl=http://bigdata3.openstacklocal:8088/proxy/application_1514366265793_0005/,
> appUser=yarn
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]