[ 
https://issues.apache.org/jira/browse/YARN-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589201#comment-14589201
 ] 

Wei Shao commented on YARN-3806:
--------------------------------

Hi Wangda Tan,

Regarding #4 in your comments. (Decouple application / nodes from scheduler).

This proposal suggests new object ResourceManager (or we could call it 
SchedulerNodeManager) to manage SchedulerNodes and handle all events from 
clusters nodes. SchedulerManager doesn't response to these events directly.
In current implementation of FiCaSchedulerNode, It looks to me container 
reservation feature may don't need to bind with fair/capacity scheduling. 
Fair/capacity scheduling can use it, but other scheduling policies can choose 
to use it or not as well.

And by single application queue introduced in the proposal, maybe the 
scheduler-specific features of FiCaSchedulerApp can be moved to the 
implementation of specific scheduler queue, like resource limits. And parent 
queues and single application queues can implement these features consistently. 
Also, it looks to me delayed scheduling feature may don't need to bind with 
fair/capacity scheduling, any scheduler can choose to use it or not.

By proposal, in each scheduling cycle, the SchedulerManager reads status of 
cluster resources from ResourceManager, updates scheduling parameters 
(fairShare, resource limits and so on) consistently for all queues (application 
is also queue), and sends resource preemption/allocation events to 
SchedulerApp, SchedulerApp can implement container warning feature in 
preemptResource() and delayed scheduling feature in acquireResource(), which 
are applicable for all schedulers. Also, scheduler doesn't specify the 
resources SchedulerApp can get, SchedulerApp.acquireResource() asks available 
resources from ResourceManager directly.

And by proposal, the procedures to update scheduling parameters are scalable 
(by parallelism), idempotent, and transactional. See detail in proposal for why 
these properties can be helpful.

Since both YARN-3306 and this proposal are trying to address similar issues, If 
some ideas in this proposal are useful, maybe efforts can be combined.

Thoughts? Thanks!

> Proposal of Generic Scheduling Framework for YARN
> -------------------------------------------------
>
>                 Key: YARN-3806
>                 URL: https://issues.apache.org/jira/browse/YARN-3806
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Wei Shao
>         Attachments: ProposalOfGenericSchedulingFrameworkForYARN-V1.0.pdf, 
> ProposalOfGenericSchedulingFrameworkForYARN-V1.1.pdf
>
>
> Currently, a typical YARN cluster runs many different kinds of applications: 
> production applications, ad hoc user applications, long running services and 
> so on. Different YARN scheduling policies may be suitable for different 
> applications. For example, capacity scheduling can manage production 
> applications well since application can get guaranteed resource share, fair 
> scheduling can manage ad hoc user applications well since it can enforce 
> fairness among users. However, current YARN scheduling framework doesn’t have 
> a mechanism for multiple scheduling policies work hierarchically in one 
> cluster.
> YARN-3306 talked about many issues of today’s YARN scheduling framework, and 
> proposed a per-queue policy driven framework. In detail, it supported 
> different scheduling policies for leaf queues. However, support of different 
> scheduling policies for upper level queues is not seriously considered yet. 
> A generic scheduling framework is proposed here to address these limitations. 
> It supports different policies for any queue consistently. The proposal tries 
> to solve many other issues in current YARN scheduling framework as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to