[ 
https://issues.apache.org/jira/browse/YARN-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328552#comment-16328552
 ] 

Feng Yuan commented on YARN-7685:
---------------------------------

Now 2.7.x version do not support labeled resource preemption.

> Preemption does not happen when a node label partition is fully utilized
> ------------------------------------------------------------------------
>
>                 Key: YARN-7685
>                 URL: https://issues.apache.org/jira/browse/YARN-7685
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 2.7.3
>            Reporter: Prabhu Joseph
>            Priority: Major
>         Attachments: Screen Shot 2017-12-27 at 3.28.13 PM.png, Screen Shot 
> 2017-12-27 at 3.28.20 PM.png, Screen Shot 2017-12-27 at 3.28.32 PM.png, 
> Screen Shot 2017-12-27 at 3.31.42 PM.png, capacity-scheduler.xml
>
>
> Have two queues default and tkgrid and two node labels default 
> (exclusivity=true) and tkgrid (exclusivity=false)
> default queue = capacity 15% and max capacity is 100% and default node label 
> expression is tkgrid
> tkgrid queue = capacity 85% and max capacity is 100% and default node label 
> expression is default
> When default queue has occupied the complete node label tkgrid and then a new 
> job submitted into tkgrid queue with node label expression tkgrid will wait 
> in ACCEPTED state forever as there is no space in tkgrid partition for the 
> Application Master. Preemption does not kick in for this scenario.
> Attached capacity-scheduler.xml, RM UI, Nodes and Node Labels screenshot.
> {code}
> Repro Steps:
> [yarn@bigdata3 root]$ yarn cluster  --list-node-labels 
> Node Labels: <tkgrid:exclusivity=false>
> Job 1 submitted into default queue which has utilized complete tkgrid node 
> label partition:
> yarn jar 
> /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
>   -master_memory 2048 -container_memory 2048 -shell_command sleep -shell_args 
> 2h -timeout 7200000 -jar 
> /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
>  -queue default  -num_containers 20
> Job 2 submitted into tkgrid queue with AM node label expression as tkgrid 
> which stays at ACCEPTED state forever
> yarn jar 
> /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
>   -master_memory 2048 -container_memory 2048 -shell_command sleep -shell_args 
> 2h -timeout 7200000 -jar 
> /usr/hdp/2.6.1.0-129/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.6.1.0-129.jar
>  -queue tkgrid  -node_label_expression tkgrid  -num_containers 20
> 17/12/27 09:31:48 INFO distributedshell.Client: Got application report from 
> ASM for, appId=5, clientToAMToken=null, appDiagnostics=[Wed Dec 27 09:31:39 
> +0000 2017] Application is Activated, waiting for resources to be assigned 
> for AM.  Details : AM Partition = tkgrid ; Partition Resource = 
> <memory:35840, vCores:56> ; Queue's Absolute capacity = 85.0 % ; Queue's 
> Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; , 
> appMasterHost=N/A, appQueue=tkgrid, appMasterRpcPort=-1, 
> appStartTime=1514367099792, yarnAppState=ACCEPTED, 
> distributedFinalState=UNDEFINED, 
> appTrackingUrl=http://bigdata3.openstacklocal:8088/proxy/application_1514366265793_0005/,
>  appUser=yarn
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to