[
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuqi Wang updated YARN-7872:
----------------------------
Target Version/s: 3.0.0, 2.7.2 (was: 2.7.2)
> labeled node cannot be used to satisfy locality specified request
> -----------------------------------------------------------------
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, capacityscheduler, resourcemanager
> Affects Versions: 2.7.2
> Reporter: Yuqi Wang
> Assignee: Yuqi Wang
> Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to
> satisfy locality specified request (i.e. container request with 'not ANY'
> resource name and the relax locality is false).
>
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel:
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName:
> \{/default-rack}]
> The container request:
> [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]]
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames:
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7
> and 2.8), the node cannot allocate container for the request, because the
> node label is not matched when the leaf queue assign container.
>
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to
> select candidate nodes for container request. And the node label matching
> should only be executed for container request with ANY resource name, since
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know
> it should not have node label), we should use the requested resource name to
> match with the node instead of using the requested node label to match with
> the node. And this resource name matching should be safe, since the node
> whose node label is not accessible for the queue will not be sent to the leaf
> queue.
>
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we
> use locality to request container within these labeled nodes?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]