[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375131#comment-15375131
 ] 

Sunil G commented on YARN-4091:
-------------------------------

Thanks [~ChenGe] for the patch and detailed doc.

Few initial comments, I will also share more feedback soon.

*REST api comments :*
1. For REST query ending with {{activities?nodeId=node-87}} I think it may scan 
all nodes in that host if there are multiple NMs running on same node. correct?
2. If we are supporting above option, could we pass node names in comma 
separated form to {{nodeId}} like  {{activities?nodeId=node-87,node-88}}   , 
May we can define a scope here for number of node manager to query as response 
o/p also need to be simpler to understand.
3. For {{app-activities?appId=application_1468198570845_0022}} I think o/p is 
different from node ? Could you also please attach REST o/p for app and node 
scenario.
4.   It is possible that some times we may look for relaxed scheduling by 
considering missed opportunities. So one round of nodes has to undergo 
heartbeats to get an allocation for few cases like (rack local/dflt partition 
from shared label) etc. Its better we add an option like collect scheduler 
activity for an app till missed opportunity is 0. Thoughts?
5. 


*General Comments :*
1. ActivityManager is a class which holds all the informations regarding 
scheduling activities tracker. Over the time, I think we might need to consider 
cases like cleanup of some out standing requests, internal aggregation to 
compact and re-order collected data across heartbeats. For all these cases, I 
think its better we can make ActivityManager as an extended service for 
scheduler. So it can start a thread associated with service to do all 
monitoring and cleanup. This is just a thought, pls feel free to share your 
opinion as its a good to have option.
2. I am in favor of having the current direct simple call to start/update/stop 
scheduling activity. But will it be better if we define an read-write interface 
and clearly define who will read the data, and who can write to the activity 
manager. On a second thought, could we raise events to ActivityManager from 
scheduler and we can make it asynchronous for writes. It may become more clear 
and simple. Thoughts?


> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Chen Ge
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> YARN-4091-design-doc-v1.pdf, YARN-4091.preliminary.1.patch
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to