[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392897#comment-15392897
 ] 

Chen Ge commented on YARN-4091:
-------------------------------

We also run scheduler load simulator(sls) using fake data. There are 2000 nodes 
in total. In one second, 2000 node heartbeats occur.

Two APIs are provided as activity view. The first one is to record activities 
for one node heartbeat. The second one is to record application activities 
within a period of time, given applicationId and time.

If running in previous patch without changes, one node heartbeat costs 0.2ms 
approximately. If we only record application activities, the difference of 
running time is unnoticeable, less than 0.01 ms. But if we record a complete 
node heartbeat activities, the running time for each node heartbeat is 0.6ms, 
which is about 3X compared to the baseline. However, in practice, only a few 
nodes' activities will be recorded at the same time. For example, if there're 
30 nodes activities being recoreded at the same time (which is already a huge 
number to me). Compared to the time cost by 2000 node heartbeats, the time to 
record activities is small (around 3% more overhead), so it is neglectable and 
acceptable.

> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Chen Ge
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> YARN-4091-design-doc-v1.pdf, YARN-4091.1.patch, YARN-4091.2.patch, 
> YARN-4091.3.patch, YARN-4091.preliminary.1.patch, app_activities.json, 
> node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to