[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636221#comment-14636221 ] Joep Rottinghuis commented on YARN-445: --- Can we rekindle this discussion? We've had folks ask how we're letting users debug their own containers at Twitter and the answer is that we're running with the patch supplied by Ming. Giving the users a mechanism to jstack is absolutely awesome. In fact we're using a capability in our JVM that lets user do a perf record/perf report right from a link on the UI using the very same mechanism. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Jason Lowe > Labels: BB2015-05-TBR > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, > YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524805#comment-14524805 ] Hadoop QA commented on YARN-445: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12633748/YARN-445-signal-container-via-rm.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7608/console | This message was automatically generated. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, > YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524782#comment-14524782 ] Hadoop QA commented on YARN-445: \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 1s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12633748/YARN-445-signal-container-via-rm.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7601/console | This message was automatically generated. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, > YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994878#comment-13994878 ] Vinod Kumar Vavilapalli commented on YARN-445: -- Folks, I just made YARN-1515 a sub-tasks of this. This JIRA is today focusing on exposing a signalling interface on the ResourceManager. It seems like we can simply expose the same API as part of ContainerManagement and get most of the thread-dump functionality with minimal changes. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, > YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926314#comment-13926314 ] Chuan Liu commented on YARN-445: bq.Chuan, the main proposal here is regarding which components need to be involved in container signal support. Instead of having AM ask NM to signal the container, the proposal is to have client ask RM which then route the request to NM; AM isn't in the picture anymore. [~mingma], thanks for the explanation! I did not realize the old patch was using the AM-NM ContainerManager proto. I think this is indeed a better approach. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, > YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926039#comment-13926039 ] Hadoop QA commented on YARN-445: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633748/YARN-445-signal-container-via-rm.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3311//console This message is automatically generated. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, > YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925755#comment-13925755 ] Gera Shegalov commented on YARN-445: I suggest to add the ability to specify a diagnostic message when signaling containers for better audit capabilities as in YARN-1551. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924701#comment-13924701 ] Ming Ma commented on YARN-445: -- Thanks all for the comments. I will create subtasks and feel free to add or update. Chuan, the main proposal here is regarding which components need to be involved in container signal support. Instead of having AM ask NM to signal the container, the proposal is to have client ask RM which then route the request to NM; AM isn't in the picture anymore. Xuan, Hitesh, that is a good point. While the proposal here is orthogonal to the support for different OSs, the API using signal number has the assumption for Linux. SignalContainerCMD sounds a good idea. Zhijie, having "yarn container" command could be useful in the future if we decide to allow more operations on container besides signal. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924646#comment-13924646 ] Xuan Gong commented on YARN-445: Can we not use {code} signal Signal the container. Default signal number is 3 {code} Can we use something like: {code} signal SIGKILL/SIGTERM {code} SIGKILL, SIGTERM, etc are in SignalContainerCMD enum. And let NM to figure out what is the right command for SIGKILL, SIGTERM, etc based on the OS type ? > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924508#comment-13924508 ] Hitesh Shah commented on YARN-445: -- bq. I would like to create a ticket for SIGKILL only since this is the easiest one. We can still use SignalContainerRequest and SignalContainerResponse. Also, we can create a enum type called SignalContainerCMD which can contains SIGKILL, SIGTERM, etc. [~xgong] [~mingma] What does a default signal number 3 imply on Windows? Also, have you figured out what the entries in the enum will map to for Windows? > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924461#comment-13924461 ] Zhijie Shen commented on YARN-445: -- {code} -signal Signal the container. Default signal number is 3. {code} How about "yarn container -signal blah blah"? Let's group all container related options within the same scope. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924422#comment-13924422 ] Chuan Liu commented on YARN-445: [~xgong] and [~ming ma], do you plan to build on top [~aklochkov]'s patch? Your design seems match to the previous patch closely except the CLI and web ui part. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924382#comment-13924382 ] Xuan Gong commented on YARN-445: [~mingma] The plan sounds good to me. But I think that they are pretty big patches. Could we split them further ? I would like to create a ticket for SIGKILL only since this is the easiest one. We can still use SignalContainerRequest and SignalContainerResponse. Also, we can create a enum type called SignalContainerCMD which can contains SIGKILL, SIGTERM, etc. After that ticket, I expect that we will have a general framework on how we will handle the different signal on RM side. Then we can add other signal commands, and do the related changes on NM side. What do you think ? > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924368#comment-13924368 ] Hadoop QA commented on YARN-445: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633461/MRTasks.png against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3304//console This message is automatically generated. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, > YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924213#comment-13924213 ] Xuan Gong commented on YARN-445: [~mingma] I have already started to work on adding KillContainer api. I am using the similar way as you described earlier. Right now, if RM kill the RMContainer, and it will notice NM through the regular heartBeat, and NM will kill the real container there. This logic has already existed. So, I think that KillContainer might be relatively easier to implement. For other signals, such as SIGQUIT, SIGTERM, etc, we might need to make changes on NM side, too. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920507#comment-13920507 ] Ming Ma commented on YARN-445: -- Any comments on the proposal to do signaling via client -> RM -> NM? If there is no objection, I can start to create subtasks for changes necessary in yarn, MR, webUI, etc. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902313#comment-13902313 ] Hadoop QA commented on YARN-445: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608408/YARN-445--n4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3110//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/3110//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3110//console This message is automatically generated. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902278#comment-13902278 ] Ming Ma commented on YARN-445: -- [Gera Shegalov|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jira.shegalov] and I discussed the idea of providing such signal functionality at yarn layer without AM involved. I have got the basic prototype working and would like get feedback from others. The benefit of this approach is other yarn applications such as Spark don't need to write any code to get the benefit of this feature. If we decide to extend the interface to support jmap by allowing users to running any processing script onto the container in the future, all yarn java applications will get it from free. Here how it works. 1. Client is able to ask RM to signal a specific container as long as it passes authorization. {code:title=SignalContainerRequest.java|borderStyle=solid} public interface SignalContainerRequest { /** * Get the ContainerId of the container to signal. * @return ContainerId of the container to signal. */ @Public @Stable public abstract ContainerId getContainerId(); @Private @Stable public abstract void setContainerId(ContainerId containerId); @Public @Stable public abstract int getSignal(); @Private @Stable public abstract void setSignal(int signal); } {code} {code:title=ClientRMProtocol.java|borderStyle=solid} /** * Signal a running container. * * @param request the container to signal. * @return an empty response. * @throws YarnRemoteException */ public SignalContainerResponse signalContainer( SignalContainerRequest request) throws YarnRemoteException; {code} 2. RM will provide the container id to the corresponding NM in the next heartbeat. HeartbeatResponse interface is modified to provide such information. 3. AM isn't involved. 4. From customers point of view, on the CLI, customers use "bin/yarn application -signal $containerid 3" to capture jstack. On the web UI, customers can click on links on container web page as well as MR job page Of course, this is orthogonal to general signal support across different OS platforms. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813471#comment-13813471 ] Sandy Ryza commented on YARN-445: - Oops didn't realize that that feature was the original motivator for this JIRA. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813470#comment-13813470 ] Sandy Ryza commented on YARN-445: - Very true > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813468#comment-13813468 ] Jason Lowe commented on YARN-445: - However it would also be nice to not always tie SIGQUIT to SIGTERM/SIGKILL. I'd love to give users the ability to diagnose tasks by themselves without killing them in the process. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813464#comment-13813464 ] Sandy Ryza commented on YARN-445: - To expand on that, it would be nice not to require SIGQUIT-then-SIGTERM-then-SIGKILL to need multiple RPCs. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813463#comment-13813463 ] Sandy Ryza commented on YARN-445: - In 0.21, when a task was going to be killed due to timeout, a SIGQUIT would be sent to it to dump its stacks to standard out (MAPREDUCE-1119). This was a useful feature that I'm currently working on backporting to branch-1 in MAPREDUCE-5592. It would be good to make sure that whatever we do here can accommodate something similar. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795753#comment-13795753 ] Andrey Klochkov commented on YARN-445: -- Vinod, Accepting a mapping of arbitrary commands is indeed the most powerful approach. Also, this would require lots of changes in the Yarn, as well as an additional complexity for app writers. At the same time, are we sure that this flexibility is needed, and it won't be an over-engineering and probably an abstraction leak in the Yarn framework? By the latter I mean that we will give app writers an ability to run arbitrary commands on any node at any point of time, but is it in the Yarn responsibilities to do that? I'm not a Yarn expert so I'm just asking. Anyway, the scope of what I has proposed with the patch is much smaller and solves the task the initial description of this Jira stated - troubleshooting of timed out containers by dumping jstack. This would be useful for many Yarn uses, so I thought it may make sense to implement it this way now and extend in the future if there is a demand. Agree that the way it is exposed in the API may be changed to a signal value in the stopContainers request instead of a separate call which is indeed a bit confusing. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795460#comment-13795460 ] Vinod Kumar Vavilapalli commented on YARN-445: -- Sorry for jumping real late on this. I see Andrey has been working on patches, but haven't looked at them. Trying to see if we are doing it right. bq. Add YARN API support for ContainerLaunchContext to accept a mapping of externally-triggered command names to code. (i.e. ctx.setExternalCommand("gracefulShutdown", "kill -TERM $CONTAINER_PID"). I think this is a better approach overall. We already support running arbitrary command-lines as part of start-container. Even without signalling, we have a stopContainer API which clearly indicates that the container be shut-down. Either via a flag or a new API, for signalling containers, why don't we just implement it as an additional command that is run on the NM. NM can provide important information, like user-name, pid, pgrpid, sid etc in a platform agnostic manner for that command and we should be all done? > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe >Assignee: Andrey Klochkov > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794833#comment-13794833 ] Hadoop QA commented on YARN-445: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608408/YARN-445--n4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2176//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2176//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2176//console This message is automatically generated. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, > YARN-445--n4.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792357#comment-13792357 ] Chris Nauroth commented on YARN-445: I haven't had a chance to look at this patch, but I did want to link to MAPREDUCE-5387. We have discussed the possibility of using {{SetConsoleCtrlHandler}}/{{GenerateConsoleCtrlEvent}} to approximate SIGTERM on Windows. (The current task termination logic on Windows is more like a SIGKILL.) Perhaps this patch could be a foundation for that. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786798#comment-13786798 ] Hadoop QA commented on YARN-445: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606926/YARN-445--n3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2107//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2107//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2107//console This message is automatically generated. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786572#comment-13786572 ] Andrey Klochkov commented on YARN-445: -- Steve, the current implementation will send the signal to the java started with bin/hbase as it sends it to all processes in the job object, e.g. all processes of the main container process. It can be replaced with sending the signal to all processes in the group instead, and I think the behavior will be the same. BTW I don't know how to do the opposite - i.e. how to avoid sending the signal to all processes of the container, on Windows (so the behavior on Linux is different as "bin/hbase" will receive the signal). I think this is fine as long as this difference is documented. In case of hbase the shell script can create a custom hook for SIGTERM and do whatever is needed in that case (e.g. send SIGTERM to the java process it started). There is one caveat in ctrl+break handling in case of a batch file starting a java process: 1. the batch file starts the java process 2. user sends ctrl+break to all processes in the group (or job object). java process prints thread dump. batch file doesn't react yet. 3. the java processes completes successfully 4. the batch file will not exit, it will print "Terminate batch job? (Y/N)" as it received the ctrl+break signal earlier. The only way I see on how we can overcome this problem with batch file processes is to identify them somehow (by executable name?) when walking through the processes in the job object, and do not send them the signal. Sending ctrl+break to batch file processes doesn't make sense anyway as in newer Windows there's no way to disable or customize ctrl+break handling in batch files. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445--n2.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786506#comment-13786506 ] Andrey Klochkov commented on YARN-445: -- The large diffs in the tests are not due to reformatting but because of refactoring needed to implement an additional test without lots of copy/paste. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445--n2.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786318#comment-13786318 ] Steve Loughran commented on YARN-445: - c-break is special in that it can talk to the whole process group: [http://msdn.microsoft.com/en-us/library/windows/desktop/ms683155(v=vs.85).aspx] process-group signalling should be good (make it an option from the sender?) so that I can send a signal to a process started by its own bash script (e.g. bin/hbase->java). However, we do need to remember that some recent ubuntu versions (mistakenly) require a -- between signal and process group id This is quite a significant patch -and it adds a feature that many will find useful - but it its going to need careful review by the YARN experts (of which I am not). Some quick points # I wouldn't mark the interface/methods as stable yet # some of the diffs in the tests look bigger than they should be -reformatting/refactoring? It just makes it harder to distinguish changes. Ideally all the existing tests should be left alone (that way we can be confident that they will catch regressions), with new tests underneath or in their own class > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445--n2.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784292#comment-13784292 ] Andrey Klochkov commented on YARN-445: -- As I understand this Findbugs warning should be ignored as it's complaining about a valid type cast. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445--n2.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784284#comment-13784284 ] Hadoop QA commented on YARN-445: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606399/YARN-445--n2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2062//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2062//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2062//console This message is automatically generated. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445--n2.patch, YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784199#comment-13784199 ] Andrey Klochkov commented on YARN-445: -- Bikas, on Windows JVM prints full thread dump on ctrl+break. I think ctrl+c may be emulated in the same way and used in place of TERM on Windows, via the same signalContainers API. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783702#comment-13783702 ] Bikas Saha commented on YARN-445: - How does the Windows JVM handle ctrl-break? How would be emulate a ctrl-c signal that would trigger the JVM shutdown hook? > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783677#comment-13783677 ] Hadoop QA commented on YARN-445: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606252/YARN-445.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2059//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2059//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2059//console This message is automatically generated. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Jason Lowe > Attachments: YARN-445.patch > > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725111#comment-13725111 ] Steve Loughran commented on YARN-445: - I like Chris's #3 option, as it allows you to add things like a graceful shutdown to a piece of code that you don't want to/can't change. the command would have to run with the same path & other env params as the original source if you want to do things like exec an HBase decommission command > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632178#comment-13632178 ] Bikas Saha commented on YARN-445: - IMO it would be great if the API allows YARN/NM to figure out what is the intended action. That way the NM can perform that action using the Shell which makes the OS transparent. Simply passing a signal value integer with YARN/NM just being a pass through may not be the right thing. I am not quite sure how to handle Java specific behavior. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.0.5-beta >Reporter: Jason Lowe > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632028#comment-13632028 ] Chris Nauroth commented on YARN-445: Unfortunately, I don't believe the Unix signal concept maps cleanly to Windows. Some of the signal-related functions are defined on Windows, but with behavior quite different from the Unix equivalent. http://msdn.microsoft.com/en-us/library/xdkz3x12(v=vs.71).aspx For example, there are differences in exit codes seen by the signalled process, and some signal handling scenarios cause the process to start a new thread to handle it instead of interrupting an existing thread. Another alternative on Windows is console control handlers: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686016(v=vs.85).aspx I have seen projects that attempt to define a higher-level interface of "externally triggered command", using method names like gracefulShutdown, kill, and outputDebugInfo. On a Unix, the implementation can map these to signal/kill. On Windows, the implementation can map these to SetConsoleCtrlHandler/GenerateConsoleCtrlEvent. The problem is that this is a least common denominator approach that may not cover all possible use cases. Considering all of that, I can think of 3 different approaches to this feature: # Sacrifice trying to create a general-purpose signaling mechanism and just stay focused on triggering JVM features. (This is identical to Jason's #1.) # Use the Windows APIs I mentioned above to implement least-common-denominator signaling support. # Add YARN API support for ContainerLaunchContext to accept a mapping of externally-triggered command names to code. (i.e. {{ctx.setExternalCommand("gracefulShutdown", "kill -TERM $CONTAINER_PID")}}. Then, during execution, the AM could send a message to the NM saying "gracefulShutdown container_X". When the NM receives the message, it could look up "gracefulShutdown" in the map of external commands and trigger the kill. For highly custom message handling scenarios (Windows console control events/named pipes/whatever else), the AM could ship a binary as a localized resource that contains the implementation, and the external command can be mapped to call that binary. Each of these approaches gets progressively more general-purpose, but also progressively more complex. The last one in particular gives maximum flexibility, but makes the API challenging for AM writers. A side note on the last option: another variant is to add one more level of indirection in the API to support different container launch configuration per platform. This would make it easier to support heterogeneous clusters (mix of Unix and Windows nodes). This would let the AM say things like "use kill on Unix, but use something else on Windows" but without needing to know if specific nodes are running Unix or Windows. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.0.5-beta >Reporter: Jason Lowe > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631761#comment-13631761 ] Jason Lowe commented on YARN-445: - Yes, it's an enhancement request to the NM API. I filed it as signaling containers to generalize the jstack-on-task-timeout feature, at least in the UNIX sense. I'm not familiar with the Windows APIs, so I'm not sure how (or if) signals map on that platform. I could see going three different ways on this for the NM API: # methods to trigger various features specific to JVMs like jstack, jmap, etc. # methods to send generalized signals (if there is a reasonable facsimile on Windows) # give up trying to generalize the concept and put in the StopContainerRequest flag I'd prefer the generalized signal approach if we can come up with a reasonable mapping for Windows, as this could be useful for non-JVM containers. In any case, we've had a lot of requests for the ability to trigger jstacks on containers in various situations, so I'd like to see at least something done in the NM API to achieve this. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.0.5-beta >Reporter: Jason Lowe > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631399#comment-13631399 ] Bikas Saha commented on YARN-445: - Sounds like an enhancement in the NM API. Moving under YARN-386. Please unlink if that is not correct. I can see the usecase this seeks to solve. I am wondering what is the abstraction in the general case. That would help us to not change stuff for every similar use case. Keeping platform neutrality would be beneficial so that the usecases continue to work for non Java AM/tasks or on Windows. > Ability to signal containers > > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.0.5-beta >Reporter: Jason Lowe > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira