[
https://issues.apache.org/jira/browse/YARN-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weiwei Yang resolved YARN-9050.
-------------------------------
Hadoop Flags: Reviewed
Resolution: Fixed
> [Umbrella] Usability improvements for scheduler activities
> ----------------------------------------------------------
>
> Key: YARN-9050
> URL: https://issues.apache.org/jira/browse/YARN-9050
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacityscheduler
> Reporter: Tao Yang
> Assignee: Tao Yang
> Priority: Major
> Fix For: 3.3.0
>
> Attachments: image-2018-11-23-16-46-38-138.png
>
>
> We have did some usability improvements for scheduler activities based on
> YARN3.1 in our cluster as follows:
> 1. Not available for multi-thread asynchronous scheduling. App and node
> activities maybe confused when multiple scheduling threads record activities
> of different allocation processes in the same variables like appsAllocation
> and recordingNodesAllocation in ActivitiesManager. I think these variables
> should be thread-local to make activities clear among multiple threads.
> 2. Incomplete activities for multi-node lookup mechanism, since
> ActivitiesLogger will skip recording through \{{if (node == null ||
> activitiesManager == null) }} when node is null which represents this
> allocation is for multi-nodes. We need support recording activities for
> multi-node lookup mechanism.
> 3. Current app activities can not meet requirements of diagnostics, for
> example, we can know that node doesn't match request but hard to know why,
> especially when using placement constraints, it's difficult to make a
> detailed diagnosis manually. So I propose to improve the diagnoses of
> activities, add diagnosis for placement constraints check, update
> insufficient resource diagnosis with detailed info (like 'insufficient
> resource names:[memory-mb]') and so on.
> 4. Add more useful fields for app activities, in some scenarios we need to
> distinguish different requests but can't locate requests based on app
> activities info, there are some other fields can help to filter what we want
> such as allocation tags. We have added containerPriority, allocationRequestId
> and allocationTags fields in AppAllocation.
> 5. Filter app activities by key fields, sometimes the results of app
> activities is massive, it's hard to find what we want. We have support filter
> by allocation-tags to meet requirements from some apps, more over, we can
> take container-priority and allocation-request-id as candidates if necessary.
> 6. Aggregate app activities by diagnoses. For a single allocation process,
> activities still can be massive in a large cluster, we frequently want to
> know why request can't be allocated in cluster, it's hard to check every node
> manually in a large cluster, so that aggregation for app activities by
> diagnoses is necessary to solve this trouble. We have added groupingType
> parameter for app-activities REST API for this, supports grouping by
> diagnostics.
> I think we can have a discuss about these points, useful improvements which
> can be accepted will be added into the patch. Thanks.
> Running design doc is attachedĀ
> [here|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.2jnaobmmfne5].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]