[
https://issues.apache.org/jira/browse/YARN-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154346#comment-16154346
]
Wangda Tan commented on YARN-7138:
----------------------------------
Thanks [~djp] for additional comments:
bq. That's not true. K8S website
(https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/)
document how to implement a customized scheduler with several pre-define APIs
and these APIs even get versioned (so far is version 1). For more details, you
can refer:
https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithm/scheduler_interface.go
I'm not familiar with golang, so please correct me if I was wrong. My
understanding of {{scheduler_interface.go}} is, it provided suggested APIs for
scheduler, but it is not a public API. Signatures like {{v1.Node}} means it use
{{Node}} api inside v1 API package. But it doesn't claim scheduler API
stability itself. Could you point me if you have any ref of claiming scheduler
API stability for K8S?
bq. Agree that it is different so far, but the way how K8S works now may hint
the direction that YARN could leverage in future - especially for long running
services with extremely scale ..
Instead of following K8S's model to let app choose scheduler, I would like to
let a "router" module to choose scheduler for app -- this is also the model
used to implement RM federation's. Blindly letting multi-scheduler run
independently in the cluster could cause lots of resource accounting and
maintenace issues: every scheduler could account resource once so it could
cause over allocation, etc.
To clarify, I agree with the idea of making a better API for scheduler and
declare stability so we can support easier pluggability in the future. However
I don't think it is the time to claim that now before we clean up scheduler
implementation. It doesn't helpful and even discouraging to developers if we
claim existing scheduler API to stable.
bq. Can you share more details on these discussions?
I can try to summarize what we discussed, no conclusions so far:
a.
https://issues.apache.org/jira/secure/attachment/12867869/YARN-6592-Rich-Placement-Constraints-Design-V1.pdf,
"implementation design": by adding a ResourceRequestPreprocessor to look at
scheduler state and request, and forward hard-locality request to scheduler to
do allocation. User can plug their own "ResourceRequestPreprocessor" to do
request to node binding.
b. {{YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf}} in
YARN-5139 suggested an approach to introduce NodeScorer and committer interface
so users can implement their own scorer to lookup cluster states and send
allocation proposal to scheduler to do allocation.
I personally prefer b. than a.
> Fix incompatible API change for YarnScheduler involved by YARN-5521
> -------------------------------------------------------------------
>
> Key: YARN-7138
> URL: https://issues.apache.org/jira/browse/YARN-7138
> Project: Hadoop YARN
> Issue Type: Bug
> Components: scheduler
> Reporter: Junping Du
> Priority: Critical
>
> From JACC report for 2.8.2 against 2.7.4, it indicates that we have
> incompatible changes happen in YarnScheduler:
> {noformat}
> hadoop-yarn-server-resourcemanager-2.7.4.jar, YarnScheduler.class
> package org.apache.hadoop.yarn.server.resourcemanager.scheduler
> YarnScheduler.allocate ( ApplicationAttemptId p1, List<ResourceRequest> p2,
> List<ContainerId> p3, List<String> p4, List<String> p5 ) [abstract] :
> Allocation
> {noformat}
> The root cause is YARN-5221. We should change it back or workaround this by
> adding back original API (mark as deprecated if not used any more).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]