Wangda Tan commented on YARN-4091:

Thanks folks working on this design 

Took a look at design doc, and I also thought about these stuffs recently:

*Some general issues we need to think about before going too far:*
1) Since we can have thousands of nodes per sec, and there can be thousands of 
applications running concurrently in a cluster, we must consider what's the 
overhead of recording all these stuffs.
2) Do we really need record this per container?
3) How can YARN show this to customer (especially for admin).

*From my experience, how to troubleshoot resource allocation issue is:*
1) Why I have available resources in NMs, but my application cannot leverage 
2) Why allocate to other app (queue/user) instead of me.

And my typical approach to look at these issues is:
1) Enable debug logging of scheduler
2) Grep a host_name (which customer declares it has available resources), see 
what happened within one node heartbeat.

So for me, how this feature could be useful to me:
1) It's able to capture one node heartbeat information
2) Captured information has hierarchy
3) It may looks like
                goto queue - a
                        goto queue - a.a1
                                goto app_1
                                        goto app_1.priority
                                                        check - queue capacity 
                                                        check - user limit 
                                                        check - node locality 
failed <<<<
                                goto app_1 ..
                goto queue -b

IAW, it's a human readable version of DBEUG log for a single node heartbeat.

And I think admin can benefit from this as well.

Another point is, we don't need to do this for every node heartbeat, doing that 
on demand for one single node heartbeat should be enough for most of cases. 
Admin should know which node to look at.

*Some rough ideas about how the REST API looks like:*
REST Response:
- "What happened" (such as skip-becomes-of-locality / 
node-partition-not-matched, etc. AND status such as usedCapacity, etc.) and 
"Who" (queue/user/app)
- Parent event (We may need hierarchy of these events)

REST Request:
- It seems send a nodeId to look should be enough for now.

This could be a async API, client request to get next allocation report of a 
given NodeId, and scheduler response report when it becomes ready.
API of internal could reference to HTrace, not sure if we can directly leverage 
HTrace to do such logging. I like basic API deisng of HTrace, but we may not 
need complexity like Sampler/Storage, etc.


> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> ------------------------------------------------------------------------------------------
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.

This message was sent by Atlassian JIRA

Reply via email to