[
https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412817#comment-15412817
]
He Tianyi commented on YARN-5479:
---------------------------------
Thanks for comments. [~rchiang]. [~jlowe].
bq. I'd be careful with having multiple implementations or multiple APIs for
doing the same thing with Resource. Resource is used a lot of places in the
Hadoop codebase and this could add confusion, even with accurate Javadocs.
Yes, multiple implementations would be confusing. I tried to replace
{{ResourcePBImpl}} directly with the implementation I mentioned and looks like
no other issue is raised. Maybe we could still stick to single version of
implementation by making it faster.
bq. The nodeUpdate() changes will conflict with YARN-5047 unless you plan on
doing the same changes for CapacityScheduler and FifoScheduler.
Most changes can be done in {{attemptScheduling}}, which is dedicated to
FairScheduler. So perhaps we can keep it that way.
bq. Minimally I think we should approach this as two (or more) separate JIRAs
since there are two vastly different approaches to improving performance here.
Agreed. Will fill separate JIRAs to address each aspect.
bq. I don't think we should start loosening the guarantees of the scheduler for
performance reasons until we've exhausted the other ways we can improve
performance
Certainly. However, the approach would be quite simple for implementing. While
doing so does not seemly cause any problem in production (fairness is slightly
damaged locally, but within acceptable range. and there is no effect globally.
though not carefully investigated yet).
So if one must figure out how to balance between resource utilization and
fairness (since resource costs), providing such option (e.g. through
configuration) may be viable.
Shall we make this issue an umbrella? There are still many approaches to
deliver better performance in FairScheduler.
> FairScheduler: Scheduling performance improvement
> -------------------------------------------------
>
> Key: YARN-5479
> URL: https://issues.apache.org/jira/browse/YARN-5479
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: fairscheduler, resourcemanager
> Affects Versions: 2.6.0
> Reporter: He Tianyi
> Assignee: He Tianyi
>
> Currently ResourceManager uses a single thread to handle async events for
> scheduling. As number of nodes grows, more events need to be processed in
> time in FairScheduler. Also, increased number of applications & queues slows
> down processing of each single event.
> There are two cases that slow processing of nodeUpdate events is problematic:
> A. global throughput is lower than number of nodes through heartbeat rounds.
> This keeps resource from being allocated since the inefficiency.
> B. global throughput meets the need, but for some of these rounds, events of
> some nodes cannot get processed before next heartbeat. This brings
> inefficiency handling burst requests (i.e. newly submitted MapReduce
> application cannot get its all task launched soon given enough resource).
> Pretty sure some people will encounter the problem eventually after a single
> cluster is scaled to several K of nodes (even with {{assignmultiple}}
> enabled).
> This issue proposes to perform several optimization towards performance in
> FairScheduler {{nodeUpdate}} method. To be specific:
> A. trading off fairness with efficiency, queue & app sorting can be skipped
> (or should this be called 'delayed sorting'?). we can either start another
> dedicated thread to do the sorting & updating, or actually perform sorting
> after current result have been used several times (say sort once in every 100
> calls.)
> B. performing calculation on {{Resource}} instances is expensive, since at
> least 2 objects ({{ResourceImpl}} and its proto builder) is created each time
> (using 'immutable' apis). the overhead can be eliminated with a
> light-weighted implementation of Resource, which do not instantiate a builder
> until necessary, because most instances are used as intermediate result in
> scheduler instead of being exchanged via IPC. Also, {{createResource}} is
> using reflection, which can be replaced by a plain {{new}} (for scheduler
> usage only). furthermore, perhaps we could 'intern' resource to avoid
> allocation.
> C. other minor changes: such as move {{updateRootMetrics}} call to
> {{update}}, making root queue metrics eventual consistent (which may
> satisfies most of the needs). or introduce counters to {{getResourceUsage}}
> and make changing of resource incrementally instead of recalculate each time.
> With A and B, I was looking at 4 times improvement in a cluster with 2K nodes.
> Suggestions? Opinions?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]