[
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316489#comment-17316489
]
Jim Brennan commented on YARN-10475:
------------------------------------
[~chaosju] thanks for your comment. The implementation we provided here is
using overall cluster utilization vs node utilization to adjust the heartbeat
so that under-utilized nodes get more scheduling opportunities. Note that this
feature was developed internally on branch-2 before the global scheduler was
added. It has worked well to help keep our nodes more evenly utilized.
I think that other metrics for scaling the heartbeat are definitely worth
exploring, which is why we filed [YARN-10478] to make it pluggable. That would
be a good place to make suggestions for alternate approaches.
> Scale RM-NM heartbeat interval based on node utilization
> --------------------------------------------------------
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: yarn
> Affects Versions: 2.10.1, 3.4.0
> Reporter: Jim Brennan
> Assignee: Jim Brennan
> Priority: Minor
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10475-branch-3.2.003.patch,
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch,
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu
> utilization compared to overall cluster cpu utilization. If a node is
> over-utilized compared to the rest of the cluster, it's heartbeat interval
> slows down. If it is under-utilized compared to the rest of the cluster,
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for
> several years. It was developed by [~nroberts], based on the observation
> that larger faster nodes on our cluster were under-utilized compared to
> smaller slower nodes.
> This feature is dependent on [YARN-10450], which added cluster-wide
> utilization metrics.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]