[ https://issues.apache.org/jira/browse/YARN-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059046#comment-17059046 ]
Brahma Reddy Battula commented on YARN-9050: -------------------------------------------- Planning to cut the 3.3 branch by March-17,Please let me know anything blocking from this umbrella. and I updated [wiki|https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-CommonFeatures:] > [Umbrella] Usability improvements for scheduler activities > ---------------------------------------------------------- > > Key: YARN-9050 > URL: https://issues.apache.org/jira/browse/YARN-9050 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Major > Fix For: 3.3.0 > > Attachments: image-2018-11-23-16-46-38-138.png > > > We have did some usability improvements for scheduler activities based on > YARN3.1 in our cluster as follows: > 1. Not available for multi-thread asynchronous scheduling. App and node > activities maybe confused when multiple scheduling threads record activities > of different allocation processes in the same variables like appsAllocation > and recordingNodesAllocation in ActivitiesManager. I think these variables > should be thread-local to make activities clear among multiple threads. > 2. Incomplete activities for multi-node lookup mechanism, since > ActivitiesLogger will skip recording through \{{if (node == null || > activitiesManager == null) }} when node is null which represents this > allocation is for multi-nodes. We need support recording activities for > multi-node lookup mechanism. > 3. Current app activities can not meet requirements of diagnostics, for > example, we can know that node doesn't match request but hard to know why, > especially when using placement constraints, it's difficult to make a > detailed diagnosis manually. So I propose to improve the diagnoses of > activities, add diagnosis for placement constraints check, update > insufficient resource diagnosis with detailed info (like 'insufficient > resource names:[memory-mb]') and so on. > 4. Add more useful fields for app activities, in some scenarios we need to > distinguish different requests but can't locate requests based on app > activities info, there are some other fields can help to filter what we want > such as allocation tags. We have added containerPriority, allocationRequestId > and allocationTags fields in AppAllocation. > 5. Filter app activities by key fields, sometimes the results of app > activities is massive, it's hard to find what we want. We have support filter > by allocation-tags to meet requirements from some apps, more over, we can > take container-priority and allocation-request-id as candidates if necessary. > 6. Aggregate app activities by diagnoses. For a single allocation process, > activities still can be massive in a large cluster, we frequently want to > know why request can't be allocated in cluster, it's hard to check every node > manually in a large cluster, so that aggregation for app activities by > diagnoses is necessary to solve this trouble. We have added groupingType > parameter for app-activities REST API for this, supports grouping by > diagnostics. > I think we can have a discuss about these points, useful improvements which > can be accepted will be added into the patch. Thanks. > Running design doc is attachedĀ > [here|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.2jnaobmmfne5]. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org