[
https://issues.apache.org/jira/browse/YARN-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391597#comment-16391597
]
Wei Yan commented on YARN-4488:
-------------------------------
Thanks for pinging, [~leftnoteasy]. I created YARN-7844 previously, which
mostly exposes related metrics in the scheduler level, including (may not fully
included in YARN-7844.001.patch) various scheduler ops (node_add, node_remove,
allocate, update...), and event queue size. This set of metrics would help us
understand whether RM scheduler is under-pressure, what is the throughput of
the scheduler, and whether the scheduler itself becomes a system bottleneck.
For this JIRA, the scheduling delay for a container, an application can be
various due to different reasons: scheduler itself, resource availability,
queue configs... I'm not sure how we can use this info in prod, to tune queue
configs. In our prod env, the top complaints from customers are their jobs get
long time to run. Mostly becuase of their queues short of resources, which have
already covered by existing metrics (tracking available resources for each
queue).
> CapacityScheduler: Compute per-container allocation latency and roll up to
> get per-application and per-queue
> ------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4488
> URL: https://issues.apache.org/jira/browse/YARN-4488
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Karthik Kambatla
> Assignee: Manikandan R
> Priority: Major
> Attachments: YARN-4485.001.patch
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]