[jira] [Commented] (YARN-1806) webUI update to allow end users to request thread dump
[ https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185001#comment-17185001 ] Prabhu Joseph commented on YARN-1806: - This is very useful for debugging. Thanks [~sahuja] for the patch and [~akhilpb] for the review. Have committed the patch to trunk. > webUI update to allow end users to request thread dump > -- > > Key: YARN-1806 > URL: https://issues.apache.org/jira/browse/YARN-1806 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Ming Ma >Assignee: Siddharth Ahuja >Priority: Major > Attachments: YARN-1806.001.patch > > > Both individual container gage and containers page will support this. After > end user clicks on the request link, they can follow to get to stdout page > for the thread dump content. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1806) webUI update to allow end users to request thread dump
[ https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184949#comment-17184949 ] Akhil PB commented on YARN-1806: [~sahuja] The initial patch looks good. Any improvements could be done in follow up jiras. > webUI update to allow end users to request thread dump > -- > > Key: YARN-1806 > URL: https://issues.apache.org/jira/browse/YARN-1806 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Ming Ma >Assignee: Siddharth Ahuja >Priority: Major > Attachments: YARN-1806.001.patch > > > Both individual container gage and containers page will support this. After > end user clicks on the request link, they can follow to get to stdout page > for the thread dump content. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1806) webUI update to allow end users to request thread dump
[ https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184073#comment-17184073 ] Hadoop QA commented on YARN-1806: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} jshint {color} | {color:blue} 0m 0s{color} | {color:blue} jshint was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 37m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 54m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/86/artifact/out/Dockerfile | | JIRA Issue | YARN-1806 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13010467/YARN-1806.001.patch | | Optional Tests | dupname asflicense shadedclient jshint | | uname | Linux be4e0dd99d1c 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 82a75056463 | | Max. process+thread count | 414 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui | | Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/86/console | | versions | git=2.17.1 maven=3.6.0 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > webUI update to allow end users to request thread dump > -- > > Key: YARN-1806 > URL: https://issues.apache.org/jira/browse/YARN-1806 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Ming Ma >Assignee: Siddharth Ahuja >Priority: Major > Attachments: YARN-1806.001.patch > > > Both individual container gage and containers page will support this. After > end user clicks on the request link, they can follow to get to stdout page > for the thread dump content. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1806) webUI update to allow end users to request thread dump
[ https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183823#comment-17183823 ] Siddharth Ahuja commented on YARN-1806: --- Testing done on the platform: 1. Test Jstack collection for non-RUNNING app: a. Ensure there is a YARN application that is already present from a previous run and is NOT currently RUNNING. b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the non-running app. Jstack button should be visible. c. Click on Jstack button. Error message should be displayed -> "Jstack cannot be collected for an application that is not running." because it is not possible to collect Jstack for a non-running application as it has no running containers. 2. Test for Jstack collection for a RUNNING app: a. Ensure there is a YARN application that is currently in RUNNING state, b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the running app. Jstack button should be visible. c. Click on Jstack button. A new Jstack panel with a drop-down that has the options - "None" and "" should be shown, d. Select the currently running app attempt from the drop-down. A new drop-down that shows currently running containers for this app attempt should be shown in the drop-down panel, e. Select a container from this drop-down. A new panel with the header that shows the selected container and select attempt-id should be shown along with Stdout logs for this container containing the thread dump from this container. f. Repeat step e. from above for another container. A thread dump should be captured and visible in the panel containing the stdout logs. g. Go back and repeat step e. for the same container that was first selected. Notice that 2 thread dumps are now present in the stdout logs with the latest thread dump shown later in the stdout logs. 3. Error checking - Jstack fetch attempt for a container that is not running due to killed application: a. Kill the currently RUNNING application using: yarn application -kill , b. Now try selecting a container from the drop-down containing containers listing. Jstack collection is not possible and hence the error is displayed -> "Jstack fetch failed for container: due to: “Trying to signal an absent container ”. 4. Error checking - Jstack fetch attempt for a container while RMs/NMs not available: a. Ensure there is a YARN application that is currently in RUNNING state, b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the running app. Jstack button should be visible. c. Click on Jstack button. A new Jstack panel with a drop-down that has the options - "None" and "" should be shown, d. Select the currently running app attempt from the drop-down. A new drop-down that shows currently running containers for this app attempt should be shown in the drop-down panel, e. Select a container from this drop-down. A new panel with the header that shows the selected container and select attempt-id should be shown along with Stdout logs for this container containing the thread dump from this container. f. Stop the ResourceManager/s. g. Select a different container from the drop-down list. An error should be displayed -> "Jstack fetch failed for container: due to: “Error: Not able to connect to YARN!”". h. Restart the ResourceManager/s. i. Repeat steps a. until e. j. Stop NodeManager/s. k. Select a different container from the drop-down list. An error should be displayed -> "Logs fetch failed for container: due to: “Error: Not able to connect to YARN!”". l. Start back the NodeManager/s. 5. Check latest (and the ONLY) running app attempt id is displayed: a. Ensure there is a YARN application that is currently in RUNNING state, b. Visit ResourceManager Web UI -> Applications -> Click on application_id link for the running app. Jstack button should be visible. c. Click on Jstack button. A new Jstack panel with a drop-down that has the options - "None" and "" should be shown, d. Now, run the following command to terminate the currently running AM: yarn container -signal GRACEFUL_SHUTDOWN e. Run the following command to check the currently running app_attempt_id: yarn applicationattempt -list application_1598288770104_0003 f.
[jira] [Commented] (YARN-1806) webUI update to allow end users to request thread dump
[ https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183819#comment-17183819 ] Siddharth Ahuja commented on YARN-1806: --- This JIRA implements a feature for the addition of a "Jstack" button on the ResourceManager Web UI's individual application page accessible by visiting RM Web UI -> Applications -> Click on (So, the breadcrumb would be "Home / Applications / App [app_id] / Jstack") to trigger thread dumps for running YARN containers for a currently running application attempt. The thread dumps are captured as part of the stdout logs for the selected container and displayed as-is by querying the NodeManager node on which this container ran on. As part of this feature, there are 2 panels implemented. The first panel displays two drop-downs, the first one displaying the currently running app attempt id and a "None" option (similar to "Logs" functionality). Once this is selected, it goes on to display another drop-down in the same panel that contains a listing of currently running containers for this application attempt id. Once you select a container id from this second drop-down, another Panel is opened just below (again this is similar to the "Logs" functionality) that shows the selected attempt id and the container as the header with container's stdout logs also being displayed containing the thread dump that was triggered when the container was selected. Following sets of API calls are made: API calls made when the Jstack button is clicked: 1. http://:8088/ws/v1/cluster/apps/ -> Get application info e.g. app state from RM, 2. http://:8088/ws/v1/cluster/apps//appattempts -> Get application attempt info from RM, e.g. to get the app attempt state to see if it is RUNNING or not ([YARN-10381|https://issues.apache.org/jira/browse/YARN-10381]). If the application is not RUNNING, then, there will be an error displayed for that based on info from 1. above. If the application is RUNNING, then, by checking the application attempts info for this app (there can be more than one app attempt), we display the application attempt id for the RUNNING attempt only. This is based on the info from 2. above. API calls made when the app attempt is selected from the drop-down: 3. http://:8088/ws/v1/cluster/apps//appattempts//containers -> This is to get the list of running containers for the currently running app attempt from the RM. API calls made when the container is selected from the drop-down: 4. http://:8088/ws/v1/cluster/containers//signal/OUTPUT_THREAD_DUMP?user.name= -> This is for RM (that eventually calls NM through NM heartbeat) to send a SIGQUIT signal to the container process for the selected container ([YARN-8693|https://issues.apache.org/jira/browse/YARN-8693]). This is essentially a kill -3 and it generates a thread dump that are captured in the stdout logs of the container. http://:8042/ws/v1/node/containerlogs//stdout -> This is for the NM that is running the selected container to acquire the stdout logs from this running container that contains the thread dump by the above call. > webUI update to allow end users to request thread dump > -- > > Key: YARN-1806 > URL: https://issues.apache.org/jira/browse/YARN-1806 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Ming Ma >Assignee: Siddharth Ahuja >Priority: Major > > Both individual container gage and containers page will support this. After > end user clicks on the request link, they can follow to get to stdout page > for the thread dump content. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org