[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API

2016-05-09 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276822#comment-15276822
 ] 

Michael Gummelt commented on MESOS-3220:


+1.

I'm implementing this behavior in Spark.  It would be more efficient if mesos 
offered it, so we wouldn't have to reimplement at the framework level.

> Offer ability to kill tasks from the API
> 
>
> Key: MESOS-3220
> URL: https://issues.apache.org/jira/browse/MESOS-3220
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Sunil Shah
>  Labels: mesosphere
>
> We are investigating adding a {{dcos task kill}} command to our DCOS (and 
> Mesos) command line interface. Currently the ability to kill tasks is only 
> offered via the scheduler API so it would be useful to have some ability to 
> kill tasks directly.
> This would complement the Maintenance Primitives, in that it would enable the 
> operator to terminate those tasks which, for whatever reasons, do not respond 
> to Inverse Offers events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API

2015-11-11 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000738#comment-15000738
 ] 

Joseph Wu commented on MESOS-3220:
--

Keep in mind there's a distinction between maintenance and inverse offers.  
Maintenance uses inverse offers, but due to maintenance's operator-heavy 
nature, all killing is must be done by the operator.

If inverse offers were used for trimming specific tasks, perhaps as part of a 
custom allocator, it would make sense for the custom allocator to implement 
automated killing.  But it would not make sense for the default allocator to 
auto-kill.

> Offer ability to kill tasks from the API
> 
>
> Key: MESOS-3220
> URL: https://issues.apache.org/jira/browse/MESOS-3220
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Sunil Shah
>Assignee: Marco Massenzio
>Priority: Blocker
>  Labels: mesosphere
>
> We are investigating adding a {{dcos task kill}} command to our DCOS (and 
> Mesos) command line interface. Currently the ability to kill tasks is only 
> offered via the scheduler API so it would be useful to have some ability to 
> kill tasks directly.
> This would complement the Maintenance Primitives, in that it would enable the 
> operator to terminate those tasks which, for whatever reasons, do not respond 
> to Inverse Offers events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API

2015-11-11 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000340#comment-15000340
 ] 

Qian Zhang commented on MESOS-3220:
---

For 1, I think we should do the auto-enforcement for inverse offer rather than 
relying on operator's manual actions. For example, master may need to check the 
duration of the UnavailableResources in the inverse offer has been reached but 
framework has not respond to the inverse offer, if so, master need to do the 
enforcement by killing framework's tasks.

> Offer ability to kill tasks from the API
> 
>
> Key: MESOS-3220
> URL: https://issues.apache.org/jira/browse/MESOS-3220
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Sunil Shah
>Assignee: Marco Massenzio
>Priority: Blocker
>  Labels: mesosphere
>
> We are investigating adding a {{dcos task kill}} command to our DCOS (and 
> Mesos) command line interface. Currently the ability to kill tasks is only 
> offered via the scheduler API so it would be useful to have some ability to 
> kill tasks directly.
> This would complement the Maintenance Primitives, in that it would enable the 
> operator to terminate those tasks which, for whatever reasons, do not respond 
> to Inverse Offers events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API

2015-11-10 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999552#comment-14999552
 ] 

Marco Massenzio commented on MESOS-3220:


To revive this thread - a couple of clarifying points:

1. Maintenance
This is meant to augment the Maintenance Primitives (MESOS-1474) and certainly 
*not* to replace it.
In particular, this endpoint (which ought to be scriptable, for automated 
maintenance scripts) would enable operators to kill "recalcitrant" 
frameworks/tasks which, for whatever reason, do not follow the Inverse Offer 
mechanism;

2. Repairs
There may be situations in which the task itself gets in a funky state and 
needs to be killed, without Mesos necessarily noticing it (ie, we cannot rely 
on the {{TASK_LOST}}/{{TASK_KILLED}} conditions).
Once that happens, however, the Framework will be notified (via the usual Mesos 
mechanisms) and can thus decide whether to re-schedule the task (possibly, 
somewhere else).

3. Remote termination
Using tools such as the {{DCOS CLI}} we want to enable  users to reach out to 
Mesos Master directly (possibly bypassing the framework) and terminate a task, 
without requiring every framework developer to re-implement the same API (so, 
this would be a "common service" that Mesos offers to framework developers, 
that they wouldn't have to worry about).

4. Security
There is obviously the expedient (if somewhat draconian) "firewalling" ability, 
to prevent outright access to this endpoint.
At a finer-grained level, we would consider using ACLs (probably in line with 
what is currently being done for the Maintenance Primitives) to authorize 
access to this functionality.

> Offer ability to kill tasks from the API
> 
>
> Key: MESOS-3220
> URL: https://issues.apache.org/jira/browse/MESOS-3220
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Sunil Shah
>Assignee: Marco Massenzio
>Priority: Blocker
>  Labels: mesosphere
>
> We are investigating adding a `dcos task kill` command to our DCOS (and 
> Mesos) command line interface. Currently the ability to kill tasks is only 
> offered via the scheduler API so it would be useful to have some ability to 
> kill tasks directly.
> This is a blocker for the DCOS CLI!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API

2015-09-21 Thread Jian Qiu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901834#comment-14901834
 ] 

Jian Qiu commented on MESOS-3220:
-

+1 for this API, and I have the same concern with [~vinodkone]: Can the task be 
killed in a more planned way like maintenance? Maybe this should be an API that 
schedule the task killing with a timer.  So framework can be notified and kill 
task actively. If framework does not kill the taks, Mesos can enforce killing 
it after the timer expires.  

> Offer ability to kill tasks from the API
> 
>
> Key: MESOS-3220
> URL: https://issues.apache.org/jira/browse/MESOS-3220
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Sunil Shah
>Assignee: Marco Massenzio
>Priority: Blocker
>  Labels: mesosphere
>
> We are investigating adding a `dcos task kill` command to our DCOS (and 
> Mesos) command line interface. Currently the ability to kill tasks is only 
> offered via the scheduler API so it would be useful to have some ability to 
> kill tasks directly.
> This is a blocker for the DCOS CLI!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API

2015-09-04 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731323#comment-14731323
 ] 

Marco Massenzio commented on MESOS-3220:


{quote}
Framework knowing about it after the task gets killed is likely not enough. 
Frameworks typically plan redundancy of tasks (e.g., replicas) based on 
"unplanned" random failures in a DC. Adding these sort of endpoints violates 
such assumptions.
{quote}

Right - the assumption here is that the {{kill}} command is issued against 
misbehaving/rogue frameworks/tasks that do no offer that option (or maybe, they 
do, but they are not reachable from the subnet(s) where the user's machine is 
sitting).
So, this would be some sort of "emergency brake" for unplanned 
outages/emergency actions.

{quote}
For example, for planned machine maintenance ...
Should we do something similar for this endpoint?
{quote}
The main difference is (I believe) that in this case there is no "planning 
window" and/or the user cannot (or will not) wait for the task / framework to 
complete and go away by its own volition - or it could entirely ignore the 
(polite) requests from Master to relinquish resources.

{quote}
Also, I don't follow the "disparate ways to talk to every framework" point. 
Does DCOS CLI allow launching framework's tasks? If yes, it already knows how 
to communicate with frameworks. If not, why should it allow killing them? 
Moreover, I am surprised that there are Mesos frameworks out there that have 
APIs for launching tasks but not killing them!?
{quote}
Indeed, but that means we would have to re-implement the functionality *every 
time* a new framework is added (not to mention, multiple times very similar 
code, every time with the odd twists that the single f/w comes up with).  It 
makes the code bloated and unmaintainable (not to mention, brittle).
This way, we implement it once, we leverage Mesos' awesomeness and everyone is 
happy :)

{quote}
I'm asking these hard questions, because...
{quote}
and so you should!  thanks for doing so, totally appreciate that this ticket's 
description could have done with a better description of the requirements and 
maybe a few use cases - maybe we'll add them too.
And if there is some commonality that we can exploit with Aurora's 
requirements, even better!

> Offer ability to kill tasks from the API
> 
>
> Key: MESOS-3220
> URL: https://issues.apache.org/jira/browse/MESOS-3220
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Sunil Shah
>Assignee: Marco Massenzio
>Priority: Blocker
>  Labels: mesosphere
>
> We are investigating adding a `dcos task kill` command to our DCOS (and 
> Mesos) command line interface. Currently the ability to kill tasks is only 
> offered via the scheduler API so it would be useful to have some ability to 
> kill tasks directly.
> This is a blocker for the DCOS CLI!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API

2015-09-02 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728029#comment-14728029
 ] 

Vinod Kone commented on MESOS-3220:
---

Framework knowing about it *after* the task gets killed is likely not enough.  
Frameworks typically plan redundancy of tasks (e.g., replicas) based on 
"unplanned" random failures in a DC. Adding these sort of endpoints violates 
such assumptions.

For example, for planned machine maintenance we explicitly took the design 
decision for Mesos to not kill tasks but to have it send inverse offers to give 
the opportunity for frameworks to react (say yes or no).

Should we do something similar for this endpoint?

Also, I don't follow the "disparate ways to talk to every framework" point. 
Does DCOS CLI allow launching framework's tasks? If yes, it already knows how 
to communicate with frameworks. If not, why should it allow killing them? 
Moreover, I am surprised that there are Mesos frameworks out there that have 
APIs for launching tasks but not killing them!?

I'm asking these hard questions, because a similar question came up with Aurora 
devs (cc [~maximk]) with regards to a Mesos endpoint for killing all revocable 
tasks in case of emergency. I'm still debating what's the right way to go about 
such requests.



> Offer ability to kill tasks from the API
> 
>
> Key: MESOS-3220
> URL: https://issues.apache.org/jira/browse/MESOS-3220
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Sunil Shah
>Assignee: Marco Massenzio
>Priority: Blocker
>  Labels: mesosphere
>
> We are investigating adding a `dcos task kill` command to our DCOS (and 
> Mesos) command line interface. Currently the ability to kill tasks is only 
> offered via the scheduler API so it would be useful to have some ability to 
> kill tasks directly.
> This is a blocker for the DCOS CLI!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API

2015-09-02 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727936#comment-14727936
 ] 

Marco Massenzio commented on MESOS-3220:


I guess the framework would come to know about it via a {{TASK_KILLED}} status 
update?

The reason for implementing it as a standard way via a Mesos API, is that it 
greatly simplifies implementing this in the CLI: otherwise we would have to 
implement a number of disparate ways of talking to *every* framework out there 
(some of which may not even offer this ability and/or be unreachable from the 
CLI's location).

> Offer ability to kill tasks from the API
> 
>
> Key: MESOS-3220
> URL: https://issues.apache.org/jira/browse/MESOS-3220
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Sunil Shah
>Assignee: Marco Massenzio
>Priority: Blocker
>  Labels: mesosphere
>
> We are investigating adding a `dcos task kill` command to our DCOS (and 
> Mesos) command line interface. Currently the ability to kill tasks is only 
> offered via the scheduler API so it would be useful to have some ability to 
> kill tasks directly.
> This is a blocker for the DCOS CLI!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API

2015-09-02 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727871#comment-14727871
 ] 

Vinod Kone commented on MESOS-3220:
---

Why can't the DCOS CLI talk to the framework that launched the task?  Killing a 
task without a framework knowing about it seems dangerous. What if it violates 
framework's SLAs?

> Offer ability to kill tasks from the API
> 
>
> Key: MESOS-3220
> URL: https://issues.apache.org/jira/browse/MESOS-3220
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Sunil Shah
>Assignee: Marco Massenzio
>Priority: Blocker
>  Labels: mesosphere
>
> We are investigating adding a `dcos task kill` command to our DCOS (and 
> Mesos) command line interface. Currently the ability to kill tasks is only 
> offered via the scheduler API so it would be useful to have some ability to 
> kill tasks directly.
> This is a blocker for the DCOS CLI!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)