[jira] [Comment Edited] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2016-07-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393360#comment-15393360
 ] 

Wangda Tan edited comment on YARN-4091 at 7/26/16 7:39 AM:
---

Thanks [~ChenGe],

The latest patch looks much better to me, would like to request more reviews 
from [~sunilg] / [~eepayne] / [~jlowe].


was (Author: leftnoteasy):
Thanks [~ChenGe],

Would like to request more reviews from [~sunilg] / [~eepayne] / [~jlowe].

> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> --
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Chen Ge
> Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> YARN-4091-design-doc-v1.pdf, YARN-4091.1.patch, YARN-4091.2.patch, 
> YARN-4091.3.patch, YARN-4091.preliminary.1.patch, app_activities.json, 
> node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2016-07-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374343#comment-15374343
 ] 

Wangda Tan edited comment on YARN-4091 at 7/19/16 6:33 PM:
---

Hi all,

Given "YARN-4091.preliminary.1.patch" I uploaded above, here are some brief 
descriptions about newly added classes and test REST API.

Newly Added Classes:
ActivityManager:
- A class to store node or application allocations. It mainly contains 
operations for allocation start, add, update and finish.

NodeAllocation:
- It contains allocation information for one allocation in a node heartbeat. 
Detailed allocation activities are first stored in "AllocationActivity" as 
operations, then transformed to a tree structure. Tree structure starts from 
root queue and ends in leaf queue, application or container allocation.

AllocationActivity:
- It records an activity operation in allocation, which can be classified as 
queue, application or container activity. Other information include state, 
diagnostic, priority.

ActivityNode:
- It represents tree node in "NodeAllocation" tree structure. Each node may 
represent queue, application or container in allocation activity. Node may have 
children node if successfully allocated to next level.

ActivityDiagnosticConstant:
- Collection of diagnostics.

ActivityState:
- Collection of activity operation states.

AllocationState:
- Collection of allocation final states.

AllocationActivityType:
- Collection of types for activity operation.

AppAllocation:
- It contains allocation information for one application within a period of 
time. Each application allocation may have several allocation attempts.

ActivitiesInfo:
- DAO object to display node allocation activity.

NodeAllocationInfo:
- DAO object to display each node allocation in node heartbeat.

ActivityNodeInfo:
- DAO object to display node information in allocation tree. It corresponds to 
"ActivityNode" class.

AppActivitiesInfo:
- DAO object to display application activity.

AppAllocationInfo:
- DAO object to display application allocation detailed information.


Test REST API:
- Look at next node’s activities(by 
default):http://localhost:18088/ws/v1/cluster/scheduler/activities
- Only look at specific node: 
http://localhost:18088/ws/v1/cluster/scheduler/activities?nodeId=node-87:75 OR 
without port number 
http://localhost:18088/ws/v1/cluster/scheduler/activities?nodeId=node-87
- look at activities for specific application within a period of time(3s in 
default): 
http://localhost:18088/ws/v1/cluster/scheduler/app-activities?appId=application_1468198570845_0022,
 
http://localhost:18088/ws/v1/cluster/scheduler/app-activities?appId=application_1468198570845_0022=5.2


Test class:
- TestRMWebServicesCapacitySched.java
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched#testActivityJSON
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched#testAppActivityJSON

Thanks for review. Please feel free to put forward any suggestions for 
improvements.


was (Author: chenge):
Hi all,

Given "YARN-4091.preliminary.1.patch" I uploaded above, here are some brief 
descriptions about newly added classes and test REST API.

Newly Added Classes:
ActivityManager:
A class to store node or application allocations. It mainly contains 
operations for allocation start, add, update and finish.

NodeAllocation:
It contains allocation information for one allocation in a node 
heartbeat. Detailed allocation activities are first stored in 
"AllocationActivity" as operations, then transformed to a tree structure. Tree 
structure starts from root queue and ends in leaf queue, application or 
container allocation.

AllocationActivity:
It records an activity operation in allocation, which can be classified 
as queue, application or container activity. Other information include state, 
diagnostic, priority.

ActivityNode:
It represents tree node in "NodeAllocation" tree structure. Each node 
may represent queue, application or container in allocation activity. Node may 
have children node if successfully allocated to next level.

ActivityDiagnosticConstant:
Collection of diagnostics.

ActivityState:
Collection of activity operation states.

AllocationState:
Collection of allocation final states.

AllocationActivityType:
Collection of types for activity operation.

AppAllocation:
It contains allocation information for one application within a period 
of time. Each application allocation may have several allocation attempts.

ActivitiesInfo:
DAO object to display node allocation activity.

NodeAllocationInfo:
DAO object to display each node allocation in node heartbeat.

ActivityNodeInfo:
DAO object to display node information in allocation tree. It 
corresponds to "ActivityNode" class.

AppActivitiesInfo: