[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API
[ https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276822#comment-15276822 ] Michael Gummelt commented on MESOS-3220: +1. I'm implementing this behavior in Spark. It would be more efficient if mesos offered it, so we wouldn't have to reimplement at the framework level. > Offer ability to kill tasks from the API > > > Key: MESOS-3220 > URL: https://issues.apache.org/jira/browse/MESOS-3220 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Sunil Shah > Labels: mesosphere > > We are investigating adding a {{dcos task kill}} command to our DCOS (and > Mesos) command line interface. Currently the ability to kill tasks is only > offered via the scheduler API so it would be useful to have some ability to > kill tasks directly. > This would complement the Maintenance Primitives, in that it would enable the > operator to terminate those tasks which, for whatever reasons, do not respond > to Inverse Offers events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API
[ https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000738#comment-15000738 ] Joseph Wu commented on MESOS-3220: -- Keep in mind there's a distinction between maintenance and inverse offers. Maintenance uses inverse offers, but due to maintenance's operator-heavy nature, all killing is must be done by the operator. If inverse offers were used for trimming specific tasks, perhaps as part of a custom allocator, it would make sense for the custom allocator to implement automated killing. But it would not make sense for the default allocator to auto-kill. > Offer ability to kill tasks from the API > > > Key: MESOS-3220 > URL: https://issues.apache.org/jira/browse/MESOS-3220 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Sunil Shah >Assignee: Marco Massenzio >Priority: Blocker > Labels: mesosphere > > We are investigating adding a {{dcos task kill}} command to our DCOS (and > Mesos) command line interface. Currently the ability to kill tasks is only > offered via the scheduler API so it would be useful to have some ability to > kill tasks directly. > This would complement the Maintenance Primitives, in that it would enable the > operator to terminate those tasks which, for whatever reasons, do not respond > to Inverse Offers events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API
[ https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000340#comment-15000340 ] Qian Zhang commented on MESOS-3220: --- For 1, I think we should do the auto-enforcement for inverse offer rather than relying on operator's manual actions. For example, master may need to check the duration of the UnavailableResources in the inverse offer has been reached but framework has not respond to the inverse offer, if so, master need to do the enforcement by killing framework's tasks. > Offer ability to kill tasks from the API > > > Key: MESOS-3220 > URL: https://issues.apache.org/jira/browse/MESOS-3220 > Project: Mesos > Issue Type: Improvement > Components: master >Reporter: Sunil Shah >Assignee: Marco Massenzio >Priority: Blocker > Labels: mesosphere > > We are investigating adding a {{dcos task kill}} command to our DCOS (and > Mesos) command line interface. Currently the ability to kill tasks is only > offered via the scheduler API so it would be useful to have some ability to > kill tasks directly. > This would complement the Maintenance Primitives, in that it would enable the > operator to terminate those tasks which, for whatever reasons, do not respond > to Inverse Offers events. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API
[ https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999552#comment-14999552 ] Marco Massenzio commented on MESOS-3220: To revive this thread - a couple of clarifying points: 1. Maintenance This is meant to augment the Maintenance Primitives (MESOS-1474) and certainly *not* to replace it. In particular, this endpoint (which ought to be scriptable, for automated maintenance scripts) would enable operators to kill "recalcitrant" frameworks/tasks which, for whatever reason, do not follow the Inverse Offer mechanism; 2. Repairs There may be situations in which the task itself gets in a funky state and needs to be killed, without Mesos necessarily noticing it (ie, we cannot rely on the {{TASK_LOST}}/{{TASK_KILLED}} conditions). Once that happens, however, the Framework will be notified (via the usual Mesos mechanisms) and can thus decide whether to re-schedule the task (possibly, somewhere else). 3. Remote termination Using tools such as the {{DCOS CLI}} we want to enable users to reach out to Mesos Master directly (possibly bypassing the framework) and terminate a task, without requiring every framework developer to re-implement the same API (so, this would be a "common service" that Mesos offers to framework developers, that they wouldn't have to worry about). 4. Security There is obviously the expedient (if somewhat draconian) "firewalling" ability, to prevent outright access to this endpoint. At a finer-grained level, we would consider using ACLs (probably in line with what is currently being done for the Maintenance Primitives) to authorize access to this functionality. > Offer ability to kill tasks from the API > > > Key: MESOS-3220 > URL: https://issues.apache.org/jira/browse/MESOS-3220 > Project: Mesos > Issue Type: Improvement > Components: python api >Reporter: Sunil Shah >Assignee: Marco Massenzio >Priority: Blocker > Labels: mesosphere > > We are investigating adding a `dcos task kill` command to our DCOS (and > Mesos) command line interface. Currently the ability to kill tasks is only > offered via the scheduler API so it would be useful to have some ability to > kill tasks directly. > This is a blocker for the DCOS CLI! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API
[ https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901834#comment-14901834 ] Jian Qiu commented on MESOS-3220: - +1 for this API, and I have the same concern with [~vinodkone]: Can the task be killed in a more planned way like maintenance? Maybe this should be an API that schedule the task killing with a timer. So framework can be notified and kill task actively. If framework does not kill the taks, Mesos can enforce killing it after the timer expires. > Offer ability to kill tasks from the API > > > Key: MESOS-3220 > URL: https://issues.apache.org/jira/browse/MESOS-3220 > Project: Mesos > Issue Type: Improvement > Components: python api >Reporter: Sunil Shah >Assignee: Marco Massenzio >Priority: Blocker > Labels: mesosphere > > We are investigating adding a `dcos task kill` command to our DCOS (and > Mesos) command line interface. Currently the ability to kill tasks is only > offered via the scheduler API so it would be useful to have some ability to > kill tasks directly. > This is a blocker for the DCOS CLI! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API
[ https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731323#comment-14731323 ] Marco Massenzio commented on MESOS-3220: {quote} Framework knowing about it after the task gets killed is likely not enough. Frameworks typically plan redundancy of tasks (e.g., replicas) based on "unplanned" random failures in a DC. Adding these sort of endpoints violates such assumptions. {quote} Right - the assumption here is that the {{kill}} command is issued against misbehaving/rogue frameworks/tasks that do no offer that option (or maybe, they do, but they are not reachable from the subnet(s) where the user's machine is sitting). So, this would be some sort of "emergency brake" for unplanned outages/emergency actions. {quote} For example, for planned machine maintenance ... Should we do something similar for this endpoint? {quote} The main difference is (I believe) that in this case there is no "planning window" and/or the user cannot (or will not) wait for the task / framework to complete and go away by its own volition - or it could entirely ignore the (polite) requests from Master to relinquish resources. {quote} Also, I don't follow the "disparate ways to talk to every framework" point. Does DCOS CLI allow launching framework's tasks? If yes, it already knows how to communicate with frameworks. If not, why should it allow killing them? Moreover, I am surprised that there are Mesos frameworks out there that have APIs for launching tasks but not killing them!? {quote} Indeed, but that means we would have to re-implement the functionality *every time* a new framework is added (not to mention, multiple times very similar code, every time with the odd twists that the single f/w comes up with). It makes the code bloated and unmaintainable (not to mention, brittle). This way, we implement it once, we leverage Mesos' awesomeness and everyone is happy :) {quote} I'm asking these hard questions, because... {quote} and so you should! thanks for doing so, totally appreciate that this ticket's description could have done with a better description of the requirements and maybe a few use cases - maybe we'll add them too. And if there is some commonality that we can exploit with Aurora's requirements, even better! > Offer ability to kill tasks from the API > > > Key: MESOS-3220 > URL: https://issues.apache.org/jira/browse/MESOS-3220 > Project: Mesos > Issue Type: Improvement > Components: python api >Reporter: Sunil Shah >Assignee: Marco Massenzio >Priority: Blocker > Labels: mesosphere > > We are investigating adding a `dcos task kill` command to our DCOS (and > Mesos) command line interface. Currently the ability to kill tasks is only > offered via the scheduler API so it would be useful to have some ability to > kill tasks directly. > This is a blocker for the DCOS CLI! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API
[ https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728029#comment-14728029 ] Vinod Kone commented on MESOS-3220: --- Framework knowing about it *after* the task gets killed is likely not enough. Frameworks typically plan redundancy of tasks (e.g., replicas) based on "unplanned" random failures in a DC. Adding these sort of endpoints violates such assumptions. For example, for planned machine maintenance we explicitly took the design decision for Mesos to not kill tasks but to have it send inverse offers to give the opportunity for frameworks to react (say yes or no). Should we do something similar for this endpoint? Also, I don't follow the "disparate ways to talk to every framework" point. Does DCOS CLI allow launching framework's tasks? If yes, it already knows how to communicate with frameworks. If not, why should it allow killing them? Moreover, I am surprised that there are Mesos frameworks out there that have APIs for launching tasks but not killing them!? I'm asking these hard questions, because a similar question came up with Aurora devs (cc [~maximk]) with regards to a Mesos endpoint for killing all revocable tasks in case of emergency. I'm still debating what's the right way to go about such requests. > Offer ability to kill tasks from the API > > > Key: MESOS-3220 > URL: https://issues.apache.org/jira/browse/MESOS-3220 > Project: Mesos > Issue Type: Improvement > Components: python api >Reporter: Sunil Shah >Assignee: Marco Massenzio >Priority: Blocker > Labels: mesosphere > > We are investigating adding a `dcos task kill` command to our DCOS (and > Mesos) command line interface. Currently the ability to kill tasks is only > offered via the scheduler API so it would be useful to have some ability to > kill tasks directly. > This is a blocker for the DCOS CLI! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API
[ https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727936#comment-14727936 ] Marco Massenzio commented on MESOS-3220: I guess the framework would come to know about it via a {{TASK_KILLED}} status update? The reason for implementing it as a standard way via a Mesos API, is that it greatly simplifies implementing this in the CLI: otherwise we would have to implement a number of disparate ways of talking to *every* framework out there (some of which may not even offer this ability and/or be unreachable from the CLI's location). > Offer ability to kill tasks from the API > > > Key: MESOS-3220 > URL: https://issues.apache.org/jira/browse/MESOS-3220 > Project: Mesos > Issue Type: Improvement > Components: python api >Reporter: Sunil Shah >Assignee: Marco Massenzio >Priority: Blocker > Labels: mesosphere > > We are investigating adding a `dcos task kill` command to our DCOS (and > Mesos) command line interface. Currently the ability to kill tasks is only > offered via the scheduler API so it would be useful to have some ability to > kill tasks directly. > This is a blocker for the DCOS CLI! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API
[ https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14727871#comment-14727871 ] Vinod Kone commented on MESOS-3220: --- Why can't the DCOS CLI talk to the framework that launched the task? Killing a task without a framework knowing about it seems dangerous. What if it violates framework's SLAs? > Offer ability to kill tasks from the API > > > Key: MESOS-3220 > URL: https://issues.apache.org/jira/browse/MESOS-3220 > Project: Mesos > Issue Type: Improvement > Components: python api >Reporter: Sunil Shah >Assignee: Marco Massenzio >Priority: Blocker > Labels: mesosphere > > We are investigating adding a `dcos task kill` command to our DCOS (and > Mesos) command line interface. Currently the ability to kill tasks is only > offered via the scheduler API so it would be useful to have some ability to > kill tasks directly. > This is a blocker for the DCOS CLI! -- This message was sent by Atlassian JIRA (v6.3.4#6332)