[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343133#comment-16343133
 ] 

Arun Suresh edited comment on YARN-7839 at 1/29/18 9:36 AM:
------------------------------------------------------------

[~cheersyang],
bq. Instead, I am thinking... can we generate a list of ordered candidate nodes 
for each allocation (based on some policy), then let scheduler work on such 
candidate set of nodes to pick up one that fulfills scheduler's requests?
Candidate nodes are not a bad idea (I remember we discussed it briefly 
earlier). The reason I did not want to attempt it initially was for the 
following reasons:
# In the current scheme, we do a re-place and then re-schedule ONLY if the 
initial placement was rejected by the scheduler. This means the algorithm can 
bail as soon as it finds the first viable node - this makes the algorithm 
simpler and the Datastructures that the algorithm returns back simpler. If the 
algorithm were to output a list of candidate nodes, there is a very good 
chance, it would have to loop through more nodes (and possibly the entire 
nodeset) per request. Also, for cases a node affinity, candidate selection 
would be complicated. For eg. if *foo* has to be placed only in Nodes that have 
*bar*, and if we have 5 candidates for *bar*, the 5 candidates for *foo* would 
also have to depend on the *bar* candidates.
# The Scheduler's attemptOnNode takes multiple locks - lock on the app, node 
and queue. Passing in multiple nodes means the lock will be held for a longer 
duration - which can cause problems - we have experience many such issues, 
espescially, when applications complete before its containers complete which 
results in releseContainer taking a lock on multiple nodes.

In any case, even if we were to try candidate nodes, I still believe letting 
the algorithm query the nodes' capacity at the time of placing might not be 
bad. I prefer we do not hold a lock - since one of the motivations of 
separating out the Placement phase from the scheduling phase is so that the 
placement can operate on a loosly consistent view of the cluster.




was (Author: asuresh):
[~cheersyang],
bq. Instead, I am thinking... can we generate a list of ordered candidate nodes 
for each allocation (based on some policy), then let scheduler work on such 
candidate set of nodes to pick up one that fulfills scheduler's requests?
Candidate nodes are not a bad idea (I remember we discussed it briefly 
earlier). The reason I did not want to attempt it initially was for the 
following reasons:
# In the current scheme, we do a re-place and then re-schedule ONLY if the 
initial placement was unsuccessful. This means the algorithm can bail as soon 
as it finds the first viable node - this makes the algorithm simpler and the 
Datastructures that the algorithm returns back simpler. If the algorithm were 
to output a list of candidate nodes, there is a very good chance, it would have 
to loop through more nodes (and possibly the entire nodeset) per request. Also, 
for cases a node affinity, candidate selection would be complicated. For eg. if 
*foo* has to be placed only in Nodes that have *bar*, and if we have 5 
candidates for *bar*, the 5 candidates for *foo* would also have to depend on 
the *bar* candidates.
# The Scheduler's attemptOnNode takes multiple locks - lock on the app, node 
and queue. Passing in multiple nodes means the lock will be held for a longer 
duration - which can cause problems - we have experience many such issues, 
espescially, when applications complete before its containers complete which 
results in releseContainer taking a lock on multiple nodes.

In any case, even if we were to try candidate nodes, I still believe letting 
the algorithm query the nodes' capacity at the time of placing might not be 
bad. I prefer we do not hold a lock - since one of the motivations of 
separating out the Placement phase from the scheduling phase is so that the 
placement can operate on a loosly consistent view of the cluster.



> Check node capacity before placing in the Algorithm
> ---------------------------------------------------
>
>                 Key: YARN-7839
>                 URL: https://issues.apache.org/jira/browse/YARN-7839
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Priority: Major
>
> Currently, the Algorithm assigns a node to a requests purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to