[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179628#comment-16179628 ] Vinod Kone commented on MESOS-7975: --- cc [~bmahler] > The command/default executor can incorrectly send a TASK_FINISHED update even > when the task is killed > - > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default and the command executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175836#comment-16175836 ] Qian Zhang commented on MESOS-7975: --- [~alexr] I have sent a mail to the lists just now, let's wait for the feedback from the community. > The command/default executor can incorrectly send a TASK_FINISHED update even > when the task is killed > - > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default and the command executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175793#comment-16175793 ] Qian Zhang commented on MESOS-7975: --- [~jpe...@apache.org] When the scheduler sends a kill, will your executor send a SIGTERM to the task or SIGKILL? If it is SIGTERM, and the task handles it gracefully and exit with 0, do you think it is reasonable for executor to send a TASK_FINISHED in this case? > The command/default executor can incorrectly send a TASK_FINISHED update even > when the task is killed > - > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default and the command executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175135#comment-16175135 ] James Peach commented on MESOS-7975: FWIW the rule we have in our executor is that if we terminated a task because the scheduler send a kill, we always send a {{TASK_KILLED}} status. That is the only reason we send this status. > The command/default executor can incorrectly send a TASK_FINISHED update even > when the task is killed > - > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default and the command executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174611#comment-16174611 ] Alexander Rukletsov commented on MESOS-7975: [~qianzhang] I think we should send an email to the lists. I understand that this might seem like a lot of work for "an easy fix", but it is an important change even though it requires small code change. > The command/default executor can incorrectly send a TASK_FINISHED update even > when the task is killed > - > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default and the command executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169475#comment-16169475 ] Qian Zhang commented on MESOS-7975: --- [~alexr] Yeah, in mesos.proto, for {{TASK_FINISHED}}, I only see a comment {{// TERMINAL: The task finished successfully}} which seems not very clear, different people may have different understanding for that. We may need to send a mail to dev & user list to let everyone know our proposal and collect feedbacks, and eventually reach a consensus. > The command/default executor can incorrectly send a TASK_FINISHED update even > when the task is killed > - > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default and the command executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167567#comment-16167567 ] Alexander Rukletsov commented on MESOS-7975: [~qianzhang]: exactly. I can't find any written reference, that {{TASK_FINISHED}} means "finished on its own", i.e., not in response to a signal. Can we clarify the meaning of {{TASK_FINISHED}} first, maybe even document the contract? > The command/default executor can incorrectly send a TASK_FINISHED update even > when the task is killed > - > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default and the command executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167234#comment-16167234 ] Qian Zhang commented on MESOS-7975: --- I think it depends on how we define the meaning of {{TASK_FINISHED}}, if it means the task is terminated successfully *on its own without external interference* (as Anand said), then I think it does not make sense for scheduler to receive a {{TASK_KILLING}} followed by a {{TASK_FINISHED}} since there is indeed an external interference (killing task is initiated by scheduler). However, if {{TASK_FINISHED}} means the task is terminated successfully for whatever reason, then I think it is OK to receive a {{TASK_KILLING}} followed by a {{TASK_FINISHED}}. > The command/default executor can incorrectly send a TASK_FINISHED update even > when the task is killed > - > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default and the command executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166656#comment-16166656 ] Anand Mazumdar commented on MESOS-7975: --- hmm, Let's start with why the task was being killed in the first place. It was due to the scheduler initiating the kill. As part of the underlying kill policy associated with the task, the executor signals its intent to kill the task via the {{TASK_KILLING}} update. Once, the task terminates gracefully, the terminal status update should not be dependent on the exit code it exited with (TASK_FINISHED vs TASK_KILLED). It should always be {{TASK_KILLED}} The task could have exited with a non-zero status code when handling the {{SIGTERM}} itself. So, I don't see the motive of how a scheduler can use this extra information to be sure that it was due to the {{SIGKILL}} signal as you alluded to. The message we want to convey to the scheduler is that their task died due to them initiating the kill operation and the terminal status update should reflect that. The thing specifically weird currently is: - A task exits with a zero status code after the scheduler initiated the kill. The scheduler receives a {{TASK_KILLING}} update followed by a {{TASK_FINISHED}} update. A {{TASK_FINISHED}} means that the task terminated successfully on its own without external interference. However, here the executor executed the {{KillPolicy}} associated with the task and explicitly killed it. - A task exits with a non zero status code. The scheduler receives a {{TASK_KILLING}} followed by {{TASK_KILLED}} status update. This is in-consistent with the above. The proposed fix is to correctly send {{TASK_KILLING}} followed by {{TASK_KILLED}} i.e., the intent to kill the task is followed by the explicit terminal status update that the task has been killed. > The command/default executor can incorrectly send a TASK_FINISHED update even > when the task is killed > - > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default and the command executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166547#comment-16166547 ] Alexander Rukletsov commented on MESOS-7975: This is not 100% bug. It is a philosophical question, whether a task that terminates cleanly with zero exit code should be considered killed: we asked the task to terminate, but we did not SIGKILL it. Moreover, we actually changed the behaviour of the docker executor to match the behaviour of the command executor. See MESOS-4279 and [this review|https://reviews.apache.org/r/48428/], especially, [this|https://issues.apache.org/jira/browse/MESOS-4279?focusedCommentId=15249489=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15249489], [this|https://issues.apache.org/jira/browse/MESOS-4279?focusedCommentId=15096389=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15096389], and [this|https://issues.apache.org/jira/browse/MESOS-4279?focusedCommentId=15243232=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15243232] comments. > The command/default executor can incorrectly send a TASK_FINISHED update even > when the task is killed > - > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Qian Zhang >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default and the command executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed
[ https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166060#comment-16166060 ] Qian Zhang commented on MESOS-7975: --- RR: https://reviews.apache.org/r/62326/ https://reviews.apache.org/r/62327/ > The command/default executor can incorrectly send a TASK_FINISHED update even > when the task is killed > - > > Key: MESOS-7975 > URL: https://issues.apache.org/jira/browse/MESOS-7975 > Project: Mesos > Issue Type: Bug >Reporter: Anand Mazumdar >Assignee: Anand Mazumdar >Priority: Critical > Labels: mesosphere > > Currently, when a task is killed, the default and the command executor > incorrectly send a {{TASK_FINISHED}} status update instead of > {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when > the task exits with a zero status code. > {code} > if (WSUCCEEDED(status)) { > taskState = TASK_FINISHED; > } else if (killed) { > // Send TASK_KILLED if the task was killed as a result of > // kill() or shutdown(). > taskState = TASK_KILLED; > } else { > taskState = TASK_FAILED; > } > {code} > We should modify the code to correctly send {{TASK_KILLED}} status updates > when a task is killed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)