Benjamin Teke created YARN-10421:
------------------------------------

             Summary: Create YarnDiagnosticsServlet to serve diagnostic queries 
                 Key: YARN-10421
                 URL: https://issues.apache.org/jira/browse/YARN-10421
             Project: Hadoop YARN
          Issue Type: Sub-task
            Reporter: Benjamin Teke
            Assignee: Benjamin Teke


YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
forks a separate process, which executes a shell/Python/etc script. Based on 
the use-cases listed below the script collects information, bundles it and 
sends it to UI2. The diagnostic cases are the following:
 # Application hanging: 
 ** Application logs
 ** Find the hanging container and get multiple Jstacks
 ** ResourceManager logs during job lifecycle
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL

 # Application failed: 
 ** Application logs
 ** ResourceManager logs during job lifecycle.
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL.
 ** Job related metrics like container, attempts.

 # Scheduler related issue:
 ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
 ** Multiple Jstacks of ResourceManager
 ** YARN and Scheduler Configuration
 ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
_/ws/v1/cluster/nodes response_
 ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
(YARN-10319)

 # ResourceManager / NodeManager daemon fails to start:
 ** ResourceManager and NodeManager out and log file
 ** YARN and Scheduler Configuration

To ease the load on the RM, the servlet should allow only one HTTP request at a 
time. If a new request comes in while serving another an appropriate response 
code should be returned, with the message "Diagnostics Collection in Progress”.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to