[
https://issues.apache.org/jira/browse/YARN-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tao Yang updated YARN-9050:
---------------------------
Summary: [Umbrella] Usability improvements for scheduler activities (was:
Usability improvements for scheduler activities)
> [Umbrella] Usability improvements for scheduler activities
> ----------------------------------------------------------
>
> Key: YARN-9050
> URL: https://issues.apache.org/jira/browse/YARN-9050
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacityscheduler
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Major
> Attachments: image-2018-11-23-16-46-38-138.png
>
>
> We have did some usability improvements for scheduler activities based on
> YARN3.1 in our cluster as follows:
> 1. Not available for multi-thread asynchronous scheduling. App and node
> activites maybe confused when multiple scheduling threads record activites of
> different allocation processes in the same variables like appsAllocation and
> recordingNodesAllocation in ActivitiesManager. I think these variables should
> be thread-local to make activities clear among multiple threads.
> 2. Incomplete activites for multi-node lookup machanism, since
> ActivitiesLogger will skip recording through {{if (node == null ||
> activitiesManager == null) }} when node is null which represents this
> allocation is for multi-nodes. We need support recording activities for
> multi-node lookup machanism.
> 3. Current app activites can not meet requirements of diagnostics, for
> example, we can know that node doesn't match request but hard to know why,
> especially when using placement constraints, it's difficult to make a
> detailed diagnosis manually. So I propose to improve the diagnoses of
> activites, add diagnosis for placement constraints check, update insufficient
> resource diagnosis with detailed info (like 'insufficient resource
> names:[memory-mb]') and so on.
> 4. Add more useful fields for app activities, in some scenarios we need to
> distinguish different requests but can't locate requests based on app
> activities info, there are some other fields can help to filter what we want
> such as allocation tags. We have added containerPriority, allocationRequestId
> and allocationTags fields in AppAllocation.
> 5. Filter app activities by key fields, sometimes the results of app
> activities is massive, it's hard to find what we want. We have support filter
> by allocation-tags to meet requirements from some apps, more over, we can
> take container-priority and allocation-request-id as candidates if necessary.
> 6. Aggragate app activities by diagnoses. For a single allocation process,
> activities still can be massive in a large cluster, we frequently want to
> know why request can't be allocated in cluster, it's hard to check every node
> manually in a large cluster, so that aggragation for app activities by
> diagnoses is neccessary to solve this trouble. We have added groupingType
> parameter for app-activities REST API for this, supports grouping by
> diagnositics and example like this:
> !image-2018-11-23-16-46-38-138.png!
> I think we can have a discuss about these points, useful improvements which
> can be accepted will be added into the patch. Thanks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]