[jira] [Commented] (YARN-445) Ability to signal containers

2015-07-21 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636221#comment-14636221
 ] 

Joep Rottinghuis commented on YARN-445:
---

Can we rekindle this discussion? We've had folks ask how we're letting users 
debug their own containers at Twitter and the answer is that we're running with 
the patch supplied by Ming.

Giving the users a mechanism to jstack is absolutely awesome. In fact we're 
using a capability in our JVM that lets user do a perf record/perf report right 
from a link on the UI using the very same mechanism.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Jason Lowe
>  Labels: BB2015-05-TBR
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, 
> YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-445) Ability to signal containers

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524805#comment-14524805
 ] 

Hadoop QA commented on YARN-445:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12633748/YARN-445-signal-container-via-rm.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7608/console |


This message was automatically generated.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, 
> YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-445) Ability to signal containers

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524782#comment-14524782
 ] 

Hadoop QA commented on YARN-445:


\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  1s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12633748/YARN-445-signal-container-via-rm.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7601/console |


This message was automatically generated.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, 
> YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994878#comment-13994878
 ] 

Vinod Kumar Vavilapalli commented on YARN-445:
--

Folks, I just made YARN-1515 a sub-tasks of this.

This JIRA is today focusing on exposing a signalling interface on the 
ResourceManager. It seems like we can simply expose the same API as part of 
ContainerManagement and get most of the thread-dump functionality with minimal 
changes.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, 
> YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-10 Thread Chuan Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926314#comment-13926314
 ] 

Chuan Liu commented on YARN-445:


bq.Chuan, the main proposal here is regarding which components need to be 
involved in container signal support. Instead of having AM ask NM to signal the 
container, the proposal is to have client ask RM which then route the request 
to NM; AM isn't in the picture anymore.

[~mingma], thanks for the explanation! I did not realize the old patch was 
using the AM-NM ContainerManager proto. I think this is indeed a better 
approach.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, 
> YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926039#comment-13926039
 ] 

Hadoop QA commented on YARN-445:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12633748/YARN-445-signal-container-via-rm.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3311//console

This message is automatically generated.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, 
> YARN-445-signal-container-via-rm.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-10 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925755#comment-13925755
 ] 

Gera Shegalov commented on YARN-445:


I suggest to add the ability to specify a diagnostic message when signaling 
containers for better audit capabilities as in YARN-1551.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-07 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924701#comment-13924701
 ] 

Ming Ma commented on YARN-445:
--

Thanks all for the comments. I will create subtasks and feel free to add or 
update.

Chuan, the main proposal here is regarding which components need to be involved 
in container signal support. Instead of having AM ask NM to signal the 
container, the proposal is to have client ask RM which then route the request 
to NM; AM isn't in the picture anymore.

Xuan, Hitesh, that is a good point. While the proposal here is orthogonal to 
the support for different OSs, the API using signal number has the assumption 
for Linux. SignalContainerCMD sounds a good idea.

Zhijie, having "yarn container" command could be useful in the future if we 
decide to allow more operations on container besides signal.



> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-07 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924646#comment-13924646
 ] 

Xuan Gong commented on YARN-445:


Can we not use 
{code}
signal  Signal the container. Default signal 
number is 3
{code}

Can we use something like:
{code}
signal  SIGKILL/SIGTERM
{code}

SIGKILL, SIGTERM, etc are in SignalContainerCMD enum.

And let NM to figure out what is the right command for SIGKILL, SIGTERM, etc 
based on the OS type ?

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924508#comment-13924508
 ] 

Hitesh Shah commented on YARN-445:
--

bq. I would like to create a ticket for SIGKILL only since this is the easiest 
one. We can still use SignalContainerRequest and SignalContainerResponse. Also, 
we can create a enum type called SignalContainerCMD which can contains SIGKILL, 
SIGTERM, etc.

[~xgong] [~mingma] What does a default signal number 3 imply on Windows? Also, 
have you figured out what the entries in the enum will map to for Windows? 

 

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924461#comment-13924461
 ] 

Zhijie Shen commented on YARN-445:
--

{code}
-signal  Signal the container. Default signal 
number is 3.
{code}

How about "yarn container -signal blah blah"? Let's group all container related 
options within the same scope.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-07 Thread Chuan Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924422#comment-13924422
 ] 

Chuan Liu commented on YARN-445:


[~xgong] and [~ming ma], do you plan to build on top [~aklochkov]'s patch? Your 
design seems match to the previous patch closely except the CLI and web ui part.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-07 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924382#comment-13924382
 ] 

Xuan Gong commented on YARN-445:


[~mingma] The plan sounds good to me.
But I think that they are pretty big patches. Could we split them further ?
I would like to create a ticket for SIGKILL only since this is the easiest one. 
We can still use SignalContainerRequest and SignalContainerResponse. Also, we 
can create a enum type called SignalContainerCMD which can contains SIGKILL, 
SIGTERM, etc. 
After that ticket, I expect that we will have a general framework on how we 
will handle the different signal on RM side. Then we can add other signal 
commands, and do the related changes on NM side.

What do you think ? 

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924368#comment-13924368
 ] 

Hadoop QA commented on YARN-445:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633461/MRTasks.png
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3304//console

This message is automatically generated.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: MRJob.png, MRTasks.png, YARN-445--n2.patch, 
> YARN-445--n3.patch, YARN-445--n4.patch, YARN-445.patch, YARNContainers.png
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-07 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924213#comment-13924213
 ] 

Xuan Gong commented on YARN-445:


[~mingma] I have already started to work on adding KillContainer api. I am 
using the similar way as you described earlier. 
Right now, if RM kill the RMContainer, and it will notice NM through the 
regular heartBeat, and NM will kill the real container there. This logic has 
already existed. So, I think that KillContainer might be relatively easier to 
implement. 
For other signals, such as SIGQUIT, SIGTERM, etc, we might need to make changes 
on NM side, too.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-03-04 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920507#comment-13920507
 ] 

Ming Ma commented on YARN-445:
--

Any comments on the proposal to do signaling via client -> RM -> NM? If there 
is no objection, I can start to create subtasks for changes necessary in yarn, 
MR, webUI, etc.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-02-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902313#comment-13902313
 ] 

Hadoop QA commented on YARN-445:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608408/YARN-445--n4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3110//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3110//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3110//console

This message is automatically generated.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-445) Ability to signal containers

2014-02-14 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902278#comment-13902278
 ] 

Ming Ma commented on YARN-445:
--

[Gera 
Shegalov|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jira.shegalov]
 and I discussed the idea of providing such signal functionality at yarn layer 
without AM involved. I have got the basic prototype working and would like get 
feedback from others.

The benefit of this approach is other yarn applications such as Spark don't 
need to write any code to get the benefit of this feature. If we decide to 
extend the interface to support jmap by allowing users to running any 
processing script onto the container in the future, all yarn java applications 
will get it from free. Here how it works.

1. Client is able to ask RM to signal a specific container as long as it passes 
authorization.
{code:title=SignalContainerRequest.java|borderStyle=solid}
public interface SignalContainerRequest {
  /**
   * Get the ContainerId of the container to signal.
   * @return ContainerId of the container to signal.
   */
  @Public
  @Stable
  public abstract ContainerId getContainerId();
  
  @Private
  @Stable
  public abstract void setContainerId(ContainerId containerId);

  @Public
  @Stable
  public abstract int getSignal();

  @Private
  @Stable
  public abstract void setSignal(int signal);

}
{code}


{code:title=ClientRMProtocol.java|borderStyle=solid}


  /**
   * Signal a running container.
   *
   * @param request the container to signal.
   * @return an empty response.
   * @throws YarnRemoteException
   */
  public SignalContainerResponse signalContainer(
  SignalContainerRequest request)
  throws YarnRemoteException;

{code}

2. RM will provide the container id to the corresponding NM in the next 
heartbeat. HeartbeatResponse interface is modified to provide such information.
3. AM isn't involved.
4. From customers point of view, on the CLI, customers use "bin/yarn 
application -signal $containerid 3" to capture jstack. On the web UI, customers 
can click on links on container web page as well as MR job page

Of course, this is orthogonal to general signal support across different OS 
platforms.




> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-11-04 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813471#comment-13813471
 ] 

Sandy Ryza commented on YARN-445:
-

Oops didn't realize that that feature was the original motivator for this JIRA.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-11-04 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813470#comment-13813470
 ] 

Sandy Ryza commented on YARN-445:
-

Very true

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-11-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813468#comment-13813468
 ] 

Jason Lowe commented on YARN-445:
-

However it would also be nice to not always tie SIGQUIT to SIGTERM/SIGKILL.  
I'd love to give users the ability to diagnose tasks by themselves without 
killing them in the process.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-11-04 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813464#comment-13813464
 ] 

Sandy Ryza commented on YARN-445:
-

To expand on that, it would be nice not to require 
SIGQUIT-then-SIGTERM-then-SIGKILL to need multiple RPCs.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-11-04 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813463#comment-13813463
 ] 

Sandy Ryza commented on YARN-445:
-

In 0.21, when a task was going to be killed due to timeout, a SIGQUIT would be 
sent to it to dump its stacks to standard out (MAPREDUCE-1119).  This was a 
useful feature that I'm currently working on backporting to branch-1 in 
MAPREDUCE-5592.  It would be good to make sure that whatever we do here can 
accommodate something similar.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-15 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795753#comment-13795753
 ] 

Andrey Klochkov commented on YARN-445:
--

Vinod,
Accepting a mapping of arbitrary commands is indeed the most powerful approach. 
Also, this would require lots of changes in the Yarn, as well as an additional 
complexity for app writers. At the same time, are we sure that this flexibility 
is needed, and it won't be an over-engineering and probably an abstraction leak 
in the Yarn framework? By the latter I mean that we will give app writers an 
ability to run arbitrary commands on any node at any point of time, but is it 
in the Yarn responsibilities to do that? I'm not a Yarn expert so I'm just 
asking.

Anyway, the scope of what I has proposed with the patch is much smaller and 
solves the task the initial description of this Jira stated - troubleshooting 
of timed out containers by dumping jstack. This would be useful for many Yarn 
uses, so I thought it may make sense to implement it this way now and extend in 
the future if there is a demand. Agree that the way it is exposed in the API 
may be changed to a signal value in the stopContainers request instead of a 
separate call which is indeed a bit confusing.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795460#comment-13795460
 ] 

Vinod Kumar Vavilapalli commented on YARN-445:
--

Sorry for jumping real late on this. I see Andrey has been working on patches, 
but haven't looked at them. Trying to see if we are doing it right.

bq. Add YARN API support for ContainerLaunchContext to accept a mapping of 
externally-triggered command names to code. (i.e. 
ctx.setExternalCommand("gracefulShutdown", "kill -TERM $CONTAINER_PID").
I think this is a better approach overall. We already support running arbitrary 
command-lines as part of start-container. Even without signalling, we have a 
stopContainer API which clearly indicates that the container be shut-down. 
Either via a flag or a new API, for signalling containers, why don't we just 
implement it as an additional command that is run on the NM. NM can provide 
important information, like user-name, pid, pgrpid, sid etc in a platform 
agnostic manner for that command and we should be all done?

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
>Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794833#comment-13794833
 ] 

Hadoop QA commented on YARN-445:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608408/YARN-445--n4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2176//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2176//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2176//console

This message is automatically generated.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792357#comment-13792357
 ] 

Chris Nauroth commented on YARN-445:


I haven't had a chance to look at this patch, but I did want to link to 
MAPREDUCE-5387.  We have discussed the possibility of using 
{{SetConsoleCtrlHandler}}/{{GenerateConsoleCtrlEvent}} to approximate SIGTERM 
on Windows.  (The current task termination logic on Windows is more like a 
SIGKILL.)  Perhaps this patch could be a foundation for that.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786798#comment-13786798
 ] 

Hadoop QA commented on YARN-445:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606926/YARN-445--n3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2107//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2107//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2107//console

This message is automatically generated.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-04 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786572#comment-13786572
 ] 

Andrey Klochkov commented on YARN-445:
--

Steve, the current implementation will send the signal to the java started with 
bin/hbase as it sends it to all processes in the job object, e.g. all processes 
of the main container process. It can be replaced with sending the signal to 
all processes in the group instead, and I think the behavior will be the same. 

BTW I don't know how to do the opposite - i.e. how to avoid sending the signal 
to all processes of the container, on Windows (so the behavior on Linux is 
different as "bin/hbase" will receive the signal). I think this is fine as long 
as this difference is documented. In case of hbase the shell script can create 
a custom hook for SIGTERM and do whatever is needed in that case (e.g. send 
SIGTERM to the java process it started). 

There is one caveat in ctrl+break handling in case of a batch file starting a 
java process:
1. the batch file starts the java process
2. user sends ctrl+break to all processes in the group (or job object). java 
process prints thread dump. batch file doesn't react yet.
3. the java processes completes successfully
4. the batch file will not exit, it will print "Terminate batch job? (Y/N)" as 
it received the ctrl+break signal earlier.

The only way I see on how we can overcome this problem with batch file 
processes is to identify them somehow (by executable name?) when walking 
through the processes in the job object, and do not send them the signal. 
Sending ctrl+break to batch file processes doesn't make sense anyway as in 
newer Windows there's no way to disable or customize ctrl+break handling in 
batch files.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445--n2.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-04 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786506#comment-13786506
 ] 

Andrey Klochkov commented on YARN-445:
--

The large diffs in the tests are not due to reformatting but because of 
refactoring needed to implement an additional test without lots of copy/paste. 

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445--n2.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786318#comment-13786318
 ] 

Steve Loughran commented on YARN-445:
-

c-break is special in that it can talk to the whole process group:  
[http://msdn.microsoft.com/en-us/library/windows/desktop/ms683155(v=vs.85).aspx]

process-group signalling should be good (make it an option from the sender?) so 
that I can send a signal to a process started by its own bash script (e.g. 
bin/hbase->java). However, we do need to remember that some recent ubuntu 
versions (mistakenly) require a -- between signal and process group id

This is quite a significant patch -and it adds a feature that many will find 
useful - but it its going to need careful review by the YARN experts (of which 
I am not). Some quick points
# I wouldn't mark the interface/methods as stable yet
# some of the diffs in the tests look bigger than they should be 
-reformatting/refactoring? It just makes it harder to distinguish changes. 
Ideally all the existing tests should be left alone (that way we can be 
confident that they will catch regressions), with new tests underneath or in 
their own class

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445--n2.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-02 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784292#comment-13784292
 ] 

Andrey Klochkov commented on YARN-445:
--

As I understand this Findbugs warning should be ignored as it's complaining 
about a valid type cast. 

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445--n2.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784284#comment-13784284
 ] 

Hadoop QA commented on YARN-445:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606399/YARN-445--n2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2062//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2062//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2062//console

This message is automatically generated.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445--n2.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-02 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784199#comment-13784199
 ] 

Andrey Klochkov commented on YARN-445:
--

Bikas, on Windows JVM prints full thread dump on ctrl+break. I think ctrl+c may 
be emulated in the same way and used in place of TERM on Windows, via the same 
signalContainers API.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-01 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783702#comment-13783702
 ] 

Bikas Saha commented on YARN-445:
-

How does the Windows JVM handle ctrl-break? How would be emulate a ctrl-c 
signal that would trigger the JVM shutdown hook?

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783677#comment-13783677
 ] 

Hadoop QA commented on YARN-445:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606252/YARN-445.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2059//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2059//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2059//console

This message is automatically generated.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jason Lowe
> Attachments: YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-445) Ability to signal containers

2013-07-31 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725111#comment-13725111
 ] 

Steve Loughran commented on YARN-445:
-

I like Chris's #3 option, as it allows you to add things like a graceful 
shutdown to a piece of code that you don't want to/can't change. the command 
would have to run with the same path & other env params as the original source 
if you want to do things like exec an HBase decommission command

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-445) Ability to signal containers

2013-04-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632178#comment-13632178
 ] 

Bikas Saha commented on YARN-445:
-

IMO it would be great if the API allows YARN/NM to figure out what is the 
intended action. That way the NM can perform that action using the Shell which 
makes the OS transparent. Simply passing a signal value integer with YARN/NM 
just being a pass through may not be the right thing. I am not quite sure how 
to handle Java specific behavior.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.0.5-beta
>Reporter: Jason Lowe
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-445) Ability to signal containers

2013-04-15 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632028#comment-13632028
 ] 

Chris Nauroth commented on YARN-445:


Unfortunately, I don't believe the Unix signal concept maps cleanly to Windows. 
 Some of the signal-related functions are defined on Windows, but with behavior 
quite different from the Unix equivalent.

http://msdn.microsoft.com/en-us/library/xdkz3x12(v=vs.71).aspx

For example, there are differences in exit codes seen by the signalled process, 
and some signal handling scenarios cause the process to start a new thread to 
handle it instead of interrupting an existing thread.

Another alternative on Windows is console control handlers:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686016(v=vs.85).aspx

I have seen projects that attempt to define a higher-level interface of 
"externally triggered command", using method names like gracefulShutdown, kill, 
and outputDebugInfo.  On a Unix, the implementation can map these to 
signal/kill.  On Windows, the implementation can map these to 
SetConsoleCtrlHandler/GenerateConsoleCtrlEvent.  The problem is that this is a 
least common denominator approach that may not cover all possible use cases.

Considering all of that, I can think of 3 different approaches to this feature:

# Sacrifice trying to create a general-purpose signaling mechanism and just 
stay focused on triggering JVM features.  (This is identical to Jason's #1.)
# Use the Windows APIs I mentioned above to implement least-common-denominator 
signaling support.
# Add YARN API support for ContainerLaunchContext to accept a mapping of 
externally-triggered command names to code.  (i.e. 
{{ctx.setExternalCommand("gracefulShutdown", "kill -TERM $CONTAINER_PID")}}.  
Then, during execution, the AM could send a message to the NM saying 
"gracefulShutdown container_X".  When the NM receives the message, it could 
look up "gracefulShutdown" in the map of external commands and trigger the 
kill.  For highly custom message handling scenarios (Windows console control 
events/named pipes/whatever else), the AM could ship a binary as a localized 
resource that contains the implementation, and the external command can be 
mapped to call that binary.

Each of these approaches gets progressively more general-purpose, but also 
progressively more complex.  The last one in particular gives maximum 
flexibility, but makes the API challenging for AM writers.

A side note on the last option: another variant is to add one more level of 
indirection in the API to support different container launch configuration per 
platform.  This would make it easier to support heterogeneous clusters (mix of 
Unix and Windows nodes).  This would let the AM say things like "use kill on 
Unix, but use something else on Windows" but without needing to know if 
specific nodes are running Unix or Windows.


> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.0.5-beta
>Reporter: Jason Lowe
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-445) Ability to signal containers

2013-04-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631761#comment-13631761
 ] 

Jason Lowe commented on YARN-445:
-

Yes, it's an enhancement request to the NM API.  I filed it as signaling 
containers to generalize the jstack-on-task-timeout feature, at least in the 
UNIX sense.  I'm not familiar with the Windows APIs, so I'm not sure how (or 
if) signals map on that platform.  I could see going three different ways on 
this for the NM API:

# methods to trigger various features specific to JVMs like jstack, jmap, etc.
# methods to send generalized signals (if there is a reasonable facsimile on 
Windows)
# give up trying to generalize the concept and put in the StopContainerRequest 
flag

I'd prefer the generalized signal approach if we can come up with a reasonable 
mapping for Windows, as this could be useful for non-JVM containers.  In any 
case, we've had a lot of requests for the ability to trigger jstacks on 
containers in various situations, so I'd like to see at least something done in 
the NM API to achieve this.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.0.5-beta
>Reporter: Jason Lowe
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-445) Ability to signal containers

2013-04-14 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631399#comment-13631399
 ] 

Bikas Saha commented on YARN-445:
-

Sounds like an enhancement in the NM API. Moving under YARN-386. Please unlink 
if that is not correct.

I can see the usecase this seeks to solve. I am wondering what is the 
abstraction in the general case. That would help us to not change stuff for 
every similar use case. Keeping platform neutrality would be beneficial so that 
the usecases continue to work for non Java AM/tasks or on Windows.

> Ability to signal containers
> 
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.0.5-beta
>Reporter: Jason Lowe
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira