[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-7872:
----------------------------
    Description: 
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*

*If not* *acceptable (i.e. the current behavior is by designed), so, how can we 
use* *locality to request container within these labeled nodes?*

  was:
labeled node (i.e. node with 'not empty' node label) cannot be used to satisfy 
locality specified request (i.e. container request with 'not ANY' resource name 
and the relax locality is false).

For example:

The node with available resource:

[Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
[persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
\{/default-rack}]

The container request:
 [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
{color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: \{SRG}{color} 
RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]

Current RM capacity scheduler's behavior is that, the node cannot allocate 
container for the request because of the node label not matched in the leaf 
queue assign container.

However, node locality and node label should be two orthogonal dimensions to 
select candidate nodes for container request. And the node label matching 
should only be executed for container request with ANY resource name, since 
only this kind of container request is allowed to have 'not empty' node label.

So, for container request with 'not ANY' resource name (so, we know it should 
not have node label), we should use resource name to match with the node 
instead of using node label to match with the node. And this resource name 
matching should be safe, since the node whose node label is not accessible for 
the queue will not be sent to the leaf queue.

*Attachment is the fix according to this principle, please help to review.*

*Without it, we cannot use locality to request container within these labeled 
nodes.*

*If the fix is acceptable, we should also recheck whether the same issue 
happens in trunk and other hadoop versions.*

*If not* *acceptable (i.e. the current behavior is by designed), so, how can we 
use* *locality to request container within labeled nodes?*


> labeled node cannot be used to satisfy locality specified request
> -----------------------------------------------------------------
>
>                 Key: YARN-7872
>                 URL: https://issues.apache.org/jira/browse/YARN-7872
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Yuqi Wang
>            Assignee: Yuqi Wang
>            Priority: Blocker
>             Fix For: 2.7.2
>
>         Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
> For example:
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that, the node cannot allocate 
> container for the request because of the node label not matched in the leaf 
> queue assign container.
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we know it should 
> not have node label), we should use resource name to match with the node 
> instead of using node label to match with the node. And this resource name 
> matching should be safe, since the node whose node label is not accessible 
> for the queue will not be sent to the leaf queue.
> *Attachment is the fix according to this principle, please help to review.*
> *Without it, we cannot use locality to request container within these labeled 
> nodes.*
> *If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.*
> *If not* *acceptable (i.e. the current behavior is by designed), so, how can 
> we use* *locality to request container within these labeled nodes?*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to