[jira] [Commented] (YARN-1806) webUI update to allow end users to request thread dump

2020-08-26 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185001#comment-17185001
 ] 

Prabhu Joseph commented on YARN-1806:
-

This is very useful for debugging. Thanks [~sahuja] for the patch and 
[~akhilpb] for the review.

Have committed the patch to trunk.

> webUI update to allow end users to request thread dump
> --
>
> Key: YARN-1806
> URL: https://issues.apache.org/jira/browse/YARN-1806
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ming Ma
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-1806.001.patch
>
>
> Both individual container gage and containers page will support this. After 
> end user clicks on the request link, they can follow to get to stdout page 
> for the thread dump content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1806) webUI update to allow end users to request thread dump

2020-08-25 Thread Akhil PB (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184949#comment-17184949
 ] 

Akhil PB commented on YARN-1806:


[~sahuja] The initial patch looks good. Any improvements could be done in 
follow up jiras.

> webUI update to allow end users to request thread dump
> --
>
> Key: YARN-1806
> URL: https://issues.apache.org/jira/browse/YARN-1806
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ming Ma
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-1806.001.patch
>
>
> Both individual container gage and containers page will support this. After 
> end user clicks on the request link, they can follow to get to stdout page 
> for the thread dump content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1806) webUI update to allow end users to request thread dump

2020-08-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184073#comment-17184073
 ] 

Hadoop QA commented on YARN-1806:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} jshint {color} | {color:blue}  0m  
0s{color} | {color:blue} jshint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
37m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/86/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-1806 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13010467/YARN-1806.001.patch |
| Optional Tests | dupname asflicense shadedclient jshint |
| uname | Linux be4e0dd99d1c 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 82a75056463 |
| Max. process+thread count | 414 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/86/console |
| versions | git=2.17.1 maven=3.6.0 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> webUI update to allow end users to request thread dump
> --
>
> Key: YARN-1806
> URL: https://issues.apache.org/jira/browse/YARN-1806
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ming Ma
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-1806.001.patch
>
>
> Both individual container gage and containers page will support this. After 
> end user clicks on the request link, they can follow to get to stdout page 
> for the thread dump content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1806) webUI update to allow end users to request thread dump

2020-08-25 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183823#comment-17183823
 ] 

Siddharth Ahuja commented on YARN-1806:
---

Testing done on the platform:

1. Test Jstack collection for non-RUNNING app:

a. Ensure there is a YARN application that is already present 
from a previous run and is NOT currently RUNNING.
b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the non-running app. Jstack button should be visible.
c. Click on Jstack button.  Error message should be displayed 
-> "Jstack cannot be collected for an application that is not running." because 
it is not possible to collect Jstack for a non-running application as it has no 
running containers.

2. Test for Jstack collection for a RUNNING app:
a. Ensure there is a YARN application that is currently in 
RUNNING state,
b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the running app. Jstack button should be visible.
c. Click on Jstack button. A new Jstack panel with a drop-down 
that has the options - "None" and 
"" should be shown,
d. Select the currently running app attempt from the drop-down. 
A new drop-down that shows currently running containers for this app attempt 
should be shown in the drop-down panel,
e. Select a container from this drop-down. A new panel with the 
header that shows the selected container and select attempt-id should be shown 
along with Stdout logs for this container containing the thread dump from this 
container.
f. Repeat step e. from above for another container. A thread 
dump should be captured and visible in the panel containing the stdout logs.
g. Go back and repeat step e. for the same container that was 
first selected. Notice that 2 thread dumps are now present in the stdout logs 
with the latest thread dump shown later in the stdout logs.

3. Error checking - Jstack fetch attempt for a container that is not 
running due to killed application:

a. Kill the currently RUNNING application using: yarn 
application -kill ,
b. Now try selecting a container from the drop-down containing 
containers listing. Jstack collection is not possible and hence the error is 
displayed -> "Jstack fetch failed for container:  due to: 
“Trying to signal an absent container ”.

4. Error checking - Jstack fetch attempt for a container while RMs/NMs 
not available:
a. Ensure there is a YARN application that is currently in 
RUNNING state,
b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the running app. Jstack button should be visible.
c. Click on Jstack button. A new Jstack panel with a drop-down 
that has the options - "None" and 
"" should be shown,
d. Select the currently running app attempt from the drop-down. 
A new drop-down that shows currently running containers for this app attempt 
should be shown in the drop-down panel,
e. Select a container from this drop-down. A new panel with the 
header that shows the selected container and select attempt-id should be shown 
along with Stdout logs for this container containing the thread dump from this 
container.
f. Stop the ResourceManager/s.
g. Select a different container from the drop-down list. An 
error should be displayed -> "Jstack fetch failed for container: 
 due to: “Error: Not able to connect to YARN!”".
h. Restart the ResourceManager/s.
i. Repeat steps a. until e.
j. Stop NodeManager/s.
k. Select a different container from the drop-down list. An 
error should be displayed -> "Logs fetch failed for container: 
 due to: “Error: Not able to connect to YARN!”".
l. Start back the NodeManager/s.

5. Check latest (and the ONLY) running app attempt id is displayed:
a. Ensure there is a YARN application that is currently in 
RUNNING state,
b. Visit ResourceManager Web UI -> Applications -> Click on 
application_id link for the running app. Jstack button should be visible.
c. Click on Jstack button. A new Jstack panel with a drop-down 
that has the options - "None" and 
"" should be shown,
d. Now, run the following command to terminate the currently 
running AM:

yarn container -signal  
GRACEFUL_SHUTDOWN

e. Run the following command to check the currently running 
app_attempt_id:

yarn applicationattempt -list application_1598288770104_0003

f. 

[jira] [Commented] (YARN-1806) webUI update to allow end users to request thread dump

2020-08-25 Thread Siddharth Ahuja (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183819#comment-17183819
 ] 

Siddharth Ahuja commented on YARN-1806:
---

This JIRA implements a feature for the addition of a "Jstack" button on the 
ResourceManager Web UI's individual application page accessible by visiting RM 
Web UI -> Applications -> Click on  (So, the breadcrumb would be "Home 
/ Applications / App [app_id] / Jstack") to trigger thread dumps for running 
YARN containers for a currently running application attempt. The thread dumps 
are captured as part of the stdout logs for the selected container and 
displayed as-is by querying the NodeManager node on which this container ran on.

As part of this feature, there are 2 panels implemented. The first panel 
displays two drop-downs, the first one displaying the currently running app 
attempt id and a "None" option (similar to "Logs" functionality). Once this is 
selected, it goes on to display another drop-down in the same panel that 
contains a listing of currently running containers for this application attempt 
id.

Once you select a container id from this second drop-down, another Panel is 
opened just below (again this is similar to the "Logs" functionality) that 
shows the selected attempt id and the container as the header with container's 
stdout logs also being displayed containing the thread dump that was triggered 
when the container was selected.

Following sets of API calls are made:

API calls made when the Jstack button is clicked:
1. http://:8088/ws/v1/cluster/apps/ -> Get application info 
e.g. app state from RM,
2. http://:8088/ws/v1/cluster/apps//appattempts -> Get 
application attempt info from RM, e.g. to get the app attempt state to see if 
it is RUNNING or not 
([YARN-10381|https://issues.apache.org/jira/browse/YARN-10381]).

If the application is not RUNNING, then, there will be an error displayed for 
that based on info from 1. above. 
If the application is RUNNING, then, by checking the application attempts info 
for this app (there can be more than one app attempt), we display the 
application attempt id for the RUNNING attempt only. This is based on the info 
from 2. above.

API calls made when the app attempt is selected from the drop-down:
3. 
http://:8088/ws/v1/cluster/apps//appattempts//containers
 -> This is to get the list of running containers for the currently running app 
attempt from the RM.

API calls made when the container is selected from the drop-down:
4. 
http://:8088/ws/v1/cluster/containers//signal/OUTPUT_THREAD_DUMP?user.name=
 -> This is for RM (that eventually calls NM through NM heartbeat) to send a 
SIGQUIT signal to the container process for the selected container 
([YARN-8693|https://issues.apache.org/jira/browse/YARN-8693]). This is 
essentially a kill -3 and it generates a thread dump that are captured in the 
stdout logs of the container.
http://:8042/ws/v1/node/containerlogs//stdout -> This is for 
the NM that is running the selected container to acquire the stdout logs from 
this running container that contains the thread dump by the above call. 

> webUI update to allow end users to request thread dump
> --
>
> Key: YARN-1806
> URL: https://issues.apache.org/jira/browse/YARN-1806
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Ming Ma
>Assignee: Siddharth Ahuja
>Priority: Major
>
> Both individual container gage and containers page will support this. After 
> end user clicks on the request link, they can follow to get to stdout page 
> for the thread dump content.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org