[
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374343#comment-15374343
]
Chen Ge commented on YARN-4091:
-------------------------------
Hi all,
Given "YARN-4091.preliminary.1.patch" I uploaded above, here are some brief
descriptions about newly added classes and test REST API.
Newly Added Classes:
ActivityManager:
A class to store node or application allocations. It mainly contains
operations for allocation start, add, update and finish.
NodeAllocation:
It contains allocation information for one allocation in a node
heartbeat. Detailed allocation activities are first stored in
"AllocationActivity" as operations, then transformed to a tree structure. Tree
structure starts from root queue and ends in leaf queue, application or
container allocation.
AllocationActivity:
It records an activity operation in allocation, which can be classified
as queue, application or container activity. Other information include state,
diagnostic, priority.
ActivityNode:
It represents tree node in "NodeAllocation" tree structure. Each node
may represent queue, application or container in allocation activity. Node may
have children node if successfully allocated to next level.
ActivityDiagnosticConstant:
Collection of diagnostics.
ActivityState:
Collection of activity operation states.
AllocationState:
Collection of allocation final states.
AllocationActivityType:
Collection of types for activity operation.
AppAllocation:
It contains allocation information for one application within a period
of time. Each application allocation may have several allocation attempts.
ActivitiesInfo:
DAO object to display node allocation activity.
NodeAllocationInfo:
DAO object to display each node allocation in node heartbeat.
ActivityNodeInfo:
DAO object to display node information in allocation tree. It
corresponds to "ActivityNode" class.
AppActivitiesInfo:
DAO object to display application activity.
AppAllocationInfo:
DAO object to display application allocation detailed information.
Test REST API:
look at next node’s activities(by default):
http://localhost:18088/ws/v1/cluster/scheduler/activities
Only look at specific node:
http://localhost:18088/ws/v1/cluster/scheduler/activities?nodeId=node-87:75
OR without port number
http://localhost:18088/ws/v1/cluster/scheduler/activities?nodeId=node-87
look at activities for specific application within a period of time(3s
in default):
http://localhost:18088/ws/v1/cluster/scheduler/app-activities?appId=application_1468198570845_0022
http://localhost:18088/ws/v1/cluster/scheduler/app-activities?appId=application_1468198570845_0022&maxTime=5.2
Test class:
TestRMWebServicesCapacitySched.java
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched#testActivityJSON
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched#testAppActivityJSON
Thanks for review. Please feel free to put forward any suggestions for
improvements.
> Improvement: Introduce more debug/diagnostics information to detail out
> scheduler activity
> ------------------------------------------------------------------------------------------
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacity scheduler, resourcemanager
> Affects Versions: 2.7.0
> Reporter: Sunil G
> Assignee: Chen Ge
> Attachments: Improvement on debugdiagnostic information - YARN.pdf,
> YARN-4091-design-doc-v1.pdf, YARN-4091.preliminary.1.patch
>
>
> As schedulers are improved with various new capabilities, more configurations
> which tunes the schedulers starts to take actions such as limit assigning
> containers to an application, or introduce delay to allocate container etc.
> There are no clear information passed down from scheduler to outerworld under
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in
> scheduler where it skips/rejects container assignment, activate application
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve
> on this as we discuss.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]