[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.
[ https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-8411: -- Sprint: Mesosphere Sprint 72, Mesosphere Sprint 73, Mesosphere Sprint 74 (was: Mesosphere Sprint 72, Mesosphere Sprint 73) > Killing a queued task can lead to the command executor never terminating. > - > > Key: MESOS-8411 > URL: https://issues.apache.org/jira/browse/MESOS-8411 > Project: Mesos > Issue Type: Bug > Components: agent >Affects Versions: 1.3.1, 1.4.1 >Reporter: Benjamin Mahler >Assignee: Meng Zhu >Priority: Critical > > If a task is killed while the executor is re-registering, we will remove it > from queued tasks and shut down the executor if all the its initial tasks > could not be delivered. However, there is a case (within {{Slave::___run}}) > where we leave the executor running, the race is: > # Command-executor task launched. > # Command executor sends registration message. Agent tells containerizer to > update the resources before it sends the tasks to the executor. > # Kill arrives, and we synchronously remove the task from queued tasks. > # Containerizer finishes updating the resources, and in {{Slave::___run}} the > killed task is ignored. > # Command executor stays running! > Executors could have a timeout to handle this case, but it's not clear that > all executors will implement this correctly. It would be better to have a > defensive policy that will shut down an executor if all of its initial batch > of tasks were killed prior to delivery. > In order to implement this, one approach discussed with [~vinodkone] is to > look at the running + terminated but unacked + completed tasks, and if empty, > shut the executor down in the {{Slave::___run}} path. This will require us to > check that the completed task cache size is set to at least 1, and this also > assumes that the completed tasks are not cleared based on time or during > agent recovery. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.
[ https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meng Zhu updated MESOS-8411: Story Points: 5 (was: 3) > Killing a queued task can lead to the command executor never terminating. > - > > Key: MESOS-8411 > URL: https://issues.apache.org/jira/browse/MESOS-8411 > Project: Mesos > Issue Type: Bug > Components: agent >Affects Versions: 1.3.1, 1.4.1 >Reporter: Benjamin Mahler >Assignee: Meng Zhu >Priority: Critical > > If a task is killed while the executor is re-registering, we will remove it > from queued tasks and shut down the executor if all the its initial tasks > could not be delivered. However, there is a case (within {{Slave::___run}}) > where we leave the executor running, the race is: > # Command-executor task launched. > # Command executor sends registration message. Agent tells containerizer to > update the resources before it sends the tasks to the executor. > # Kill arrives, and we synchronously remove the task from queued tasks. > # Containerizer finishes updating the resources, and in {{Slave::___run}} the > killed task is ignored. > # Command executor stays running! > Executors could have a timeout to handle this case, but it's not clear that > all executors will implement this correctly. It would be better to have a > defensive policy that will shut down an executor if all of its initial batch > of tasks were killed prior to delivery. > In order to implement this, one approach discussed with [~vinodkone] is to > look at the running + terminated but unacked + completed tasks, and if empty, > shut the executor down in the {{Slave::___run}} path. This will require us to > check that the completed task cache size is set to at least 1, and this also > assumes that the completed tasks are not cleared based on time or during > agent recovery. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.
[ https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-8411: --- Target Version/s: 1.4.2, 1.6.0, 1.5.1, 1.3.3 (was: 1.6.0) > Killing a queued task can lead to the command executor never terminating. > - > > Key: MESOS-8411 > URL: https://issues.apache.org/jira/browse/MESOS-8411 > Project: Mesos > Issue Type: Bug > Components: agent >Affects Versions: 1.3.1, 1.4.1 >Reporter: Benjamin Mahler >Assignee: Meng Zhu >Priority: Critical > > If a task is killed while the executor is re-registering, we will remove it > from queued tasks and shut down the executor if all the its initial tasks > could not be delivered. However, there is a case (within {{Slave::___run}}) > where we leave the executor running, the race is: > # Command-executor task launched. > # Command executor sends registration message. Agent tells containerizer to > update the resources before it sends the tasks to the executor. > # Kill arrives, and we synchronously remove the task from queued tasks. > # Containerizer finishes updating the resources, and in {{Slave::___run}} the > killed task is ignored. > # Command executor stays running! > Executors could have a timeout to handle this case, but it's not clear that > all executors will implement this correctly. It would be better to have a > defensive policy that will shut down an executor if all of its initial batch > of tasks were killed prior to delivery. > In order to implement this, one approach discussed with [~vinodkone] is to > look at the running + terminated but unacked + completed tasks, and if empty, > shut the executor down in the {{Slave::___run}} path. This will require us to > check that the completed task cache size is set to at least 1, and this also > assumes that the completed tasks are not cleared based on time or during > agent recovery. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.
[ https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-8411: --- Affects Version/s: (was: 1.3.0) 1.3.1 1.4.1 > Killing a queued task can lead to the command executor never terminating. > - > > Key: MESOS-8411 > URL: https://issues.apache.org/jira/browse/MESOS-8411 > Project: Mesos > Issue Type: Bug > Components: agent >Affects Versions: 1.3.1, 1.4.1 >Reporter: Benjamin Mahler >Assignee: Meng Zhu >Priority: Critical > > If a task is killed while the executor is re-registering, we will remove it > from queued tasks and shut down the executor if all the its initial tasks > could not be delivered. However, there is a case (within {{Slave::___run}}) > where we leave the executor running, the race is: > # Command-executor task launched. > # Command executor sends registration message. Agent tells containerizer to > update the resources before it sends the tasks to the executor. > # Kill arrives, and we synchronously remove the task from queued tasks. > # Containerizer finishes updating the resources, and in {{Slave::___run}} the > killed task is ignored. > # Command executor stays running! > Executors could have a timeout to handle this case, but it's not clear that > all executors will implement this correctly. It would be better to have a > defensive policy that will shut down an executor if all of its initial batch > of tasks were killed prior to delivery. > In order to implement this, one approach discussed with [~vinodkone] is to > look at the running + terminated but unacked + completed tasks, and if empty, > shut the executor down in the {{Slave::___run}} path. This will require us to > check that the completed task cache size is set to at least 1, and this also > assumes that the completed tasks are not cleared based on time or during > agent recovery. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.
[ https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-8411: -- Sprint: Mesosphere Sprint 72, Mesosphere Sprint 73 (was: Mesosphere Sprint 72) > Killing a queued task can lead to the command executor never terminating. > - > > Key: MESOS-8411 > URL: https://issues.apache.org/jira/browse/MESOS-8411 > Project: Mesos > Issue Type: Bug > Components: agent >Affects Versions: 1.3.0 >Reporter: Benjamin Mahler >Assignee: Meng Zhu >Priority: Critical > > If a task is killed while the executor is re-registering, we will remove it > from queued tasks and shut down the executor if all the its initial tasks > could not be delivered. However, there is a case (within {{Slave::___run}}) > where we leave the executor running, the race is: > # Command-executor task launched. > # Command executor sends registration message. Agent tells containerizer to > update the resources before it sends the tasks to the executor. > # Kill arrives, and we synchronously remove the task from queued tasks. > # Containerizer finishes updating the resources, and in {{Slave::___run}} the > killed task is ignored. > # Command executor stays running! > Executors could have a timeout to handle this case, but it's not clear that > all executors will implement this correctly. It would be better to have a > defensive policy that will shut down an executor if all of its initial batch > of tasks were killed prior to delivery. > In order to implement this, one approach discussed with [~vinodkone] is to > look at the running + terminated but unacked + completed tasks, and if empty, > shut the executor down in the {{Slave::___run}} path. This will require us to > check that the completed task cache size is set to at least 1, and this also > assumes that the completed tasks are not cleared based on time or during > agent recovery. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.
[ https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meng Zhu updated MESOS-8411: Affects Version/s: 1.3.0 > Killing a queued task can lead to the command executor never terminating. > - > > Key: MESOS-8411 > URL: https://issues.apache.org/jira/browse/MESOS-8411 > Project: Mesos > Issue Type: Bug > Components: agent >Affects Versions: 1.3.0 >Reporter: Benjamin Mahler >Assignee: Meng Zhu >Priority: Critical > > If a task is killed while the executor is re-registering, we will remove it > from queued tasks and shut down the executor if all the its initial tasks > could not be delivered. However, there is a case (within {{Slave::___run}}) > where we leave the executor running, the race is: > # Command-executor task launched. > # Command executor sends registration message. Agent tells containerizer to > update the resources before it sends the tasks to the executor. > # Kill arrives, and we synchronously remove the task from queued tasks. > # Containerizer finishes updating the resources, and in {{Slave::___run}} the > killed task is ignored. > # Command executor stays running! > Executors could have a timeout to handle this case, but it's not clear that > all executors will implement this correctly. It would be better to have a > defensive policy that will shut down an executor if all of its initial batch > of tasks were killed prior to delivery. > In order to implement this, one approach discussed with [~vinodkone] is to > look at the running + terminated but unacked + completed tasks, and if empty, > shut the executor down in the {{Slave::___run}} path. This will require us to > check that the completed task cache size is set to at least 1, and this also > assumes that the completed tasks are not cleared based on time or during > agent recovery. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.
[ https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Meng Zhu updated MESOS-8411: Target Version/s: 1.6.0 > Killing a queued task can lead to the command executor never terminating. > - > > Key: MESOS-8411 > URL: https://issues.apache.org/jira/browse/MESOS-8411 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Benjamin Mahler >Assignee: Meng Zhu >Priority: Critical > > If a task is killed while the executor is re-registering, we will remove it > from queued tasks and shut down the executor if all the its initial tasks > could not be delivered. However, there is a case (within {{Slave::___run}}) > where we leave the executor running, the race is: > # Command-executor task launched. > # Command executor sends registration message. Agent tells containerizer to > update the resources before it sends the tasks to the executor. > # Kill arrives, and we synchronously remove the task from queued tasks. > # Containerizer finishes updating the resources, and in {{Slave::___run}} the > killed task is ignored. > # Command executor stays running! > Executors could have a timeout to handle this case, but it's not clear that > all executors will implement this correctly. It would be better to have a > defensive policy that will shut down an executor if all of its initial batch > of tasks were killed prior to delivery. > In order to implement this, one approach discussed with [~vinodkone] is to > look at the running + terminated but unacked + completed tasks, and if empty, > shut the executor down in the {{Slave::___run}} path. This will require us to > check that the completed task cache size is set to at least 1, and this also > assumes that the completed tasks are not cleared based on time or during > agent recovery. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.
[ https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kone updated MESOS-8411: -- Priority: Critical (was: Major) > Killing a queued task can lead to the command executor never terminating. > - > > Key: MESOS-8411 > URL: https://issues.apache.org/jira/browse/MESOS-8411 > Project: Mesos > Issue Type: Bug > Components: agent >Reporter: Benjamin Mahler >Assignee: Meng Zhu >Priority: Critical > > If a task is killed while the executor is re-registering, we will remove it > from queued tasks and shut down the executor if all the its initial tasks > could not be delivered. However, there is a case (within {{Slave::___run}}) > where we leave the executor running, the race is: > # Command-executor task launched. > # Command executor sends registration message. Agent tells containerizer to > update the resources before it sends the tasks to the executor. > # Kill arrives, and we synchronously remove the task from queued tasks. > # Containerizer finishes updating the resources, and in {{Slave::___run}} the > killed task is ignored. > # Command executor stays running! > Executors could have a timeout to handle this case, but it's not clear that > all executors will implement this correctly. It would be better to have a > defensive policy that will shut down an executor if all of its initial batch > of tasks were killed prior to delivery. > In order to implement this, one approach discussed with [~vinodkone] is to > look at the running + terminated but unacked + completed tasks, and if empty, > shut the executor down in the {{Slave::___run}} path. This will require us to > check that the completed task cache size is set to at least 1, and this also > assumes that the completed tasks are not cleared based on time or during > agent recovery. -- This message was sent by Atlassian JIRA (v6.4.14#64029)