Wangda Tan commented on YARN-4091:


bq. However, my doubt is , we cannot do this for each heartbeat. If we want to 
do a specific heartbeat for a specific node, we need input from external way. 
Such a command or REST query etc.

That is what I meant! We will do such debug logging totally on demand. In my 
mind, the REST API looks like:
- Request: contains nodeId as parameter.
- Response: "pending fetching" when the request accepted. After the requested 
nodeId finished heartbeat, it contains all debug information.

I feel like we may not need queue/application as input, since we can make sure 
node is doing heartbeat every few seconds, we doesn't know if a queue/app will 
be accessed. We can do highlight in web UI for specified queue/application.

> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> ------------------------------------------------------------------------------------------
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.

This message was sent by Atlassian JIRA

Reply via email to