[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407094#comment-16407094
 ] 

Jason Lowe commented on YARN-7872:
----------------------------------

[~leftnoteasy] would be better at answering this, as he knows node labels far 
better than I do.  As I understand it, node labels are effectively 
hard-partitioning the cluster (especially back in 2.7).  For the simple case of 
a single node label, it's like smashing two clusters together where one 
cluster's nodes have the label and the other cluster's nodes do not.  If you 
want to allocate on the unlabeled nodes, you don't specify any label with your 
request.  If you want to allocate on the labeled nodes, you specify the label 
with your request.

In that case node locality and node label are _not_ orthogonal.  For example, a 
node label can be used to reserve nodes for certain apps.  If any other app 
comes along with an ANY request and plunks down containers on those nodes, that 
totally defeats the purpose of that node label.  So I believe this is working 
as designed.

bq. If not acceptable (i.e. the current behavior is by designed), so, how can 
we use locality to request container within these labeled nodes?

It's like I stated above.  If you want the resource to be placed on the labeled 
nodes, put the label in your request.  If you want the request to be placed on 
unlabeled nodes, omit the label from your request.  If you want the request to 
go anywhere, labeled or not, I don't think exclusive node labels allow for that 
functionality, but I may be missing something there.  I do know that allowing 
apps asking for ANY resource without a label to start using resources on 
labeled nodes will break some setups.



> labeled node cannot be used to satisfy locality specified request
> -----------------------------------------------------------------
>
>                 Key: YARN-7872
>                 URL: https://issues.apache.org/jira/browse/YARN-7872
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Yuqi Wang
>            Assignee: Yuqi Wang
>            Priority: Blocker
>             Fix For: 2.7.2
>
>         Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to