[jira] [Updated] (YARN-10421) Create YarnDiagnosticsServlet to serve diagnostic queries

2020-09-10 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10421:
-
Attachment: YARN-10421.001.patch

> Create YarnDiagnosticsServlet to serve diagnostic queries 
> --
>
> Key: YARN-10421
> URL: https://issues.apache.org/jira/browse/YARN-10421
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10421.001.patch
>
>
> YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
> forks a separate process, which executes a shell/Python/etc script. Based on 
> the use-cases listed below the script collects information, bundles it and 
> sends it to UI2. The diagnostic cases are the following:
>  # Application hanging: 
>  ** Application logs
>  ** Find the hanging container and get multiple Jstacks
>  ** ResourceManager logs during job lifecycle
>  ** NodeManager logs from NodeManager where the hanging containers of the 
> jobs ran
>  ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
> History URL
>  # Application failed: 
>  ** Application logs
>  ** ResourceManager logs during job lifecycle.
>  ** NodeManager logs from NodeManager where the hanging containers of the 
> jobs ran
>  ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
> History URL.
>  ** Job related metrics like container, attempts.
>  # Scheduler related issue:
>  ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
>  ** Multiple Jstacks of ResourceManager
>  ** YARN and Scheduler Configuration
>  ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
> _/ws/v1/cluster/nodes response_
>  ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
> (YARN-10319)
>  # ResourceManager / NodeManager daemon fails to start:
>  ** ResourceManager and NodeManager out and log file
>  ** YARN and Scheduler Configuration
> To ease the load on the RM, the servlet should allow only one HTTP request at 
> a time. If a new request comes in while serving another an appropriate 
> response code should be returned, with the message "Diagnostics Collection in 
> Progress”. The servlet should list the possible diagnostic cases to the UI. 
> The cases will be implemented in the script. The servlet should be 
> transparent to the script changes to help with the (on-the-fly) extensibility 
> of the diagnostic tool. 
>  
> The diag bundle can become large in size, so a threshold functionality should 
> be added. If the bundle's size exceeds the threshold the bundle will be 
> stored in a local folder on the host of the RM, and the path will be returned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10421) Create YarnDiagnosticsServlet to serve diagnostic queries

2020-09-02 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10421:
-
Description: 
YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
forks a separate process, which executes a shell/Python/etc script. Based on 
the use-cases listed below the script collects information, bundles it and 
sends it to UI2. The diagnostic cases are the following:
 # Application hanging: 
 ** Application logs
 ** Find the hanging container and get multiple Jstacks
 ** ResourceManager logs during job lifecycle
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL
 # Application failed: 
 ** Application logs
 ** ResourceManager logs during job lifecycle.
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL.
 ** Job related metrics like container, attempts.
 # Scheduler related issue:
 ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
 ** Multiple Jstacks of ResourceManager
 ** YARN and Scheduler Configuration
 ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
_/ws/v1/cluster/nodes response_
 ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
(YARN-10319)
 # ResourceManager / NodeManager daemon fails to start:
 ** ResourceManager and NodeManager out and log file
 ** YARN and Scheduler Configuration

To ease the load on the RM, the servlet should allow only one HTTP request at a 
time. If a new request comes in while serving another an appropriate response 
code should be returned, with the message "Diagnostics Collection in Progress”. 
The servlet should list the possible diagnostic cases to the UI. The cases will 
be implemented in the script. The servlet should be transparent to the script 
changes to help with the (on-the-fly) extensibility of the diagnostic tool. 

 

The diag bundle can become large in size, so a threshold functionality should 
be added. If the bundle's size exceeds the threshold the bundle will be stored 
in a local folder on the host of the RM, and the path will be returned.

  was:
YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
forks a separate process, which executes a shell/Python/etc script. Based on 
the use-cases listed below the script collects information, bundles it and 
sends it to UI2. The diagnostic cases are the following:
 # Application hanging: 
 ** Application logs
 ** Find the hanging container and get multiple Jstacks
 ** ResourceManager logs during job lifecycle
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL
 # Application failed: 
 ** Application logs
 ** ResourceManager logs during job lifecycle.
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL.
 ** Job related metrics like container, attempts.
 # Scheduler related issue:
 ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
 ** Multiple Jstacks of ResourceManager
 ** YARN and Scheduler Configuration
 ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
_/ws/v1/cluster/nodes response_
 ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
(YARN-10319)
 # ResourceManager / NodeManager daemon fails to start:
 ** ResourceManager and NodeManager out and log file
 ** YARN and Scheduler Configuration

To ease the load on the RM, the servlet should allow only one HTTP request at a 
time. If a new request comes in while serving another an appropriate response 
code should be returned, with the message "Diagnostics Collection in Progress”. 
The servlet should list the possible diagnostic cases to the UI. The cases will 
be implemented in the script. The servlet should be transparent to the script 
changes to help with the extensibility of the diagnostic tool.


> Create YarnDiagnosticsServlet to serve diagnostic queries 
> --
>
> Key: YARN-10421
> URL: https://issues.apache.org/jira/browse/YARN-10421
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>
> YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
> forks a separate process, which executes a shell/Python/etc script. Based on 
> the use-cases listed below the script collects information, bundles it and 
> sends it to UI2. The diagnostic cases are the following:
>  

[jira] [Updated] (YARN-10421) Create YarnDiagnosticsServlet to serve diagnostic queries

2020-09-02 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10421:
-
Description: 
YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
forks a separate process, which executes a shell/Python/etc script. Based on 
the use-cases listed below the script collects information, bundles it and 
sends it to UI2. The diagnostic cases are the following:
 # Application hanging: 
 ** Application logs
 ** Find the hanging container and get multiple Jstacks
 ** ResourceManager logs during job lifecycle
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL
 # Application failed: 
 ** Application logs
 ** ResourceManager logs during job lifecycle.
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL.
 ** Job related metrics like container, attempts.
 # Scheduler related issue:
 ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
 ** Multiple Jstacks of ResourceManager
 ** YARN and Scheduler Configuration
 ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
_/ws/v1/cluster/nodes response_
 ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
(YARN-10319)
 # ResourceManager / NodeManager daemon fails to start:
 ** ResourceManager and NodeManager out and log file
 ** YARN and Scheduler Configuration

To ease the load on the RM, the servlet should allow only one HTTP request at a 
time. If a new request comes in while serving another an appropriate response 
code should be returned, with the message "Diagnostics Collection in Progress”. 
The servlet should list the possible diagnostic cases to the UI. The cases will 
be implemented in the script. The servlet should be transparent to the script 
changes to help with the extensibility of the diagnostic tool.

  was:
YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
forks a separate process, which executes a shell/Python/etc script. Based on 
the use-cases listed below the script collects information, bundles it and 
sends it to UI2. The diagnostic cases are the following:
 # Application hanging: 
 ** Application logs
 ** Find the hanging container and get multiple Jstacks
 ** ResourceManager logs during job lifecycle
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL
 # Application failed: 
 ** Application logs
 ** ResourceManager logs during job lifecycle.
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL.
 ** Job related metrics like container, attempts.
 # Scheduler related issue:
 ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
 ** Multiple Jstacks of ResourceManager
 ** YARN and Scheduler Configuration
 ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
_/ws/v1/cluster/nodes response_
 ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
(YARN-10319)
 # ResourceManager / NodeManager daemon fails to start:
 ** ResourceManager and NodeManager out and log file
 ** YARN and Scheduler Configuration

To ease the load on the RM, the servlet should allow only one HTTP request at a 
time. If a new request comes in while serving another an appropriate response 
code should be returned, with the message "Diagnostics Collection in Progress”.


> Create YarnDiagnosticsServlet to serve diagnostic queries 
> --
>
> Key: YARN-10421
> URL: https://issues.apache.org/jira/browse/YARN-10421
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>
> YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
> forks a separate process, which executes a shell/Python/etc script. Based on 
> the use-cases listed below the script collects information, bundles it and 
> sends it to UI2. The diagnostic cases are the following:
>  # Application hanging: 
>  ** Application logs
>  ** Find the hanging container and get multiple Jstacks
>  ** ResourceManager logs during job lifecycle
>  ** NodeManager logs from NodeManager where the hanging containers of the 
> jobs ran
>  ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
> History URL
>  # Application failed: 
>  ** Application logs
>  ** ResourceManager logs during job lifecycle.
>  ** NodeManager logs from NodeManager 

[jira] [Updated] (YARN-10421) Create YarnDiagnosticsServlet to serve diagnostic queries

2020-09-02 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10421:
-
Description: 
YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
forks a separate process, which executes a shell/Python/etc script. Based on 
the use-cases listed below the script collects information, bundles it and 
sends it to UI2. The diagnostic cases are the following:
 # Application hanging: 
 ** Application logs
 ** Find the hanging container and get multiple Jstacks
 ** ResourceManager logs during job lifecycle
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL
 # Application failed: 
 ** Application logs
 ** ResourceManager logs during job lifecycle.
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL.
 ** Job related metrics like container, attempts.
 # Scheduler related issue:
 ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
 ** Multiple Jstacks of ResourceManager
 ** YARN and Scheduler Configuration
 ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
_/ws/v1/cluster/nodes response_
 ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
(YARN-10319)
 # ResourceManager / NodeManager daemon fails to start:
 ** ResourceManager and NodeManager out and log file
 ** YARN and Scheduler Configuration

To ease the load on the RM, the servlet should allow only one HTTP request at a 
time. If a new request comes in while serving another an appropriate response 
code should be returned, with the message "Diagnostics Collection in Progress”.

  was:
YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
forks a separate process, which executes a shell/Python/etc script. Based on 
the use-cases listed below the script collects information, bundles it and 
sends it to UI2. The diagnostic cases are the following:
 # Application hanging: 
 ** Application logs
 ** Find the hanging container and get multiple Jstacks
 ** ResourceManager logs during job lifecycle
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL

 # Application failed: 
 ** Application logs
 ** ResourceManager logs during job lifecycle.
 ** NodeManager logs from NodeManager where the hanging containers of the jobs 
ran
 ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
History URL.
 ** Job related metrics like container, attempts.

 # Scheduler related issue:
 ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
 ** Multiple Jstacks of ResourceManager
 ** YARN and Scheduler Configuration
 ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API 
_/ws/v1/cluster/nodes response_
 ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response 
(YARN-10319)

 # ResourceManager / NodeManager daemon fails to start:
 ** ResourceManager and NodeManager out and log file
 ** YARN and Scheduler Configuration

To ease the load on the RM, the servlet should allow only one HTTP request at a 
time. If a new request comes in while serving another an appropriate response 
code should be returned, with the message "Diagnostics Collection in Progress”.


> Create YarnDiagnosticsServlet to serve diagnostic queries 
> --
>
> Key: YARN-10421
> URL: https://issues.apache.org/jira/browse/YARN-10421
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>
> YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet 
> forks a separate process, which executes a shell/Python/etc script. Based on 
> the use-cases listed below the script collects information, bundles it and 
> sends it to UI2. The diagnostic cases are the following:
>  # Application hanging: 
>  ** Application logs
>  ** Find the hanging container and get multiple Jstacks
>  ** ResourceManager logs during job lifecycle
>  ** NodeManager logs from NodeManager where the hanging containers of the 
> jobs ran
>  ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
> History URL
>  # Application failed: 
>  ** Application logs
>  ** ResourceManager logs during job lifecycle.
>  ** NodeManager logs from NodeManager where the hanging containers of the 
> jobs ran
>  ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez 
> History URL.
>  ** Job related metrics like container, attempts.
>  # Scheduler related