[ https://issues.apache.org/jira/browse/YARN-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956352#comment-17956352 ]
ASF GitHub Bot commented on YARN-10421: --------------------------------------- Hean-Chhinling opened a new pull request, #7727: URL: https://github.com/apache/hadoop/pull/7727 <!-- Thanks for sending a pull request! 1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute 2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'. --> ### Description of PR ### How was this patch tested? ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > Create YarnDiagnosticsService to serve diagnostic queries > ---------------------------------------------------------- > > Key: YARN-10421 > URL: https://issues.apache.org/jira/browse/YARN-10421 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Benjamin Teke > Assignee: chhinlinghean > Priority: Major > Attachments: YARN-10421.001.patch, YARN-10421.002.patch, > YARN-10421.003.patch, YARN-10421.004.patch > > > YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet > forks a separate process, which executes a shell/Python/etc script. Based on > the use-cases listed below the script collects information, bundles it and > sends it to UI2. The diagnostic options are the following: > # Application hanging: > ** Application logs > ** Find the hanging container and get multiple Jstacks > ** ResourceManager logs during job lifecycle > ** NodeManager logs from NodeManager where the hanging containers of the > jobs ran > ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez > History URL > # Application failed: > ** Application logs > ** ResourceManager logs during job lifecycle. > ** NodeManager logs from NodeManager where the hanging containers of the > jobs ran > ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez > History URL. > ** Job related metrics like container, attempts. > # Scheduler related issue: > ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes. > ** Multiple Jstacks of ResourceManager > ** YARN and Scheduler Configuration > ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API > _/ws/v1/cluster/nodes response_ > ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response > (YARN-10319) > # ResourceManager / NodeManager daemon fails to start: > ** ResourceManager and NodeManager out and log file > ** YARN and Scheduler Configuration > Two new endpoints should be added to the RM web service: one for listing the > available diagnostic options (_/common-issue/list_), and one for calling a > selected option with the user provided parameters (_/common-issue/collect_). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org