[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.

2018-02-01 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8411:
--
Sprint: Mesosphere Sprint 72, Mesosphere Sprint 73, Mesosphere Sprint 74  
(was: Mesosphere Sprint 72, Mesosphere Sprint 73)

> Killing a queued task can lead to the command executor never terminating.
> -
>
> Key: MESOS-8411
> URL: https://issues.apache.org/jira/browse/MESOS-8411
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.3.1, 1.4.1
>Reporter: Benjamin Mahler
>Assignee: Meng Zhu
>Priority: Critical
>
> If a task is killed while the executor is re-registering, we will remove it 
> from queued tasks and shut down the executor if all the its initial tasks 
> could not be delivered. However, there is a case (within {{Slave::___run}}) 
> where we leave the executor running, the race is:
> # Command-executor task launched.
> # Command executor sends registration message. Agent tells containerizer to 
> update the resources before it sends the tasks to the executor.
> # Kill arrives, and we synchronously remove the task from queued tasks.
> # Containerizer finishes updating the resources, and in {{Slave::___run}} the 
> killed task is ignored.
> # Command executor stays running!
> Executors could have a timeout to handle this case, but it's not clear that 
> all executors will implement this correctly. It would be better to have a 
> defensive policy that will shut down an executor if all of its initial batch 
> of tasks were killed prior to delivery.
> In order to implement this, one approach discussed with [~vinodkone] is to 
> look at the running + terminated but unacked + completed tasks, and if empty, 
> shut the executor down in the {{Slave::___run}} path. This will require us to 
> check that the completed task cache size is set to at least 1, and this also 
> assumes that the completed tasks are not cleared based on time or during 
> agent recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.

2018-01-29 Thread Meng Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu updated MESOS-8411:

Story Points: 5  (was: 3)

> Killing a queued task can lead to the command executor never terminating.
> -
>
> Key: MESOS-8411
> URL: https://issues.apache.org/jira/browse/MESOS-8411
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.3.1, 1.4.1
>Reporter: Benjamin Mahler
>Assignee: Meng Zhu
>Priority: Critical
>
> If a task is killed while the executor is re-registering, we will remove it 
> from queued tasks and shut down the executor if all the its initial tasks 
> could not be delivered. However, there is a case (within {{Slave::___run}}) 
> where we leave the executor running, the race is:
> # Command-executor task launched.
> # Command executor sends registration message. Agent tells containerizer to 
> update the resources before it sends the tasks to the executor.
> # Kill arrives, and we synchronously remove the task from queued tasks.
> # Containerizer finishes updating the resources, and in {{Slave::___run}} the 
> killed task is ignored.
> # Command executor stays running!
> Executors could have a timeout to handle this case, but it's not clear that 
> all executors will implement this correctly. It would be better to have a 
> defensive policy that will shut down an executor if all of its initial batch 
> of tasks were killed prior to delivery.
> In order to implement this, one approach discussed with [~vinodkone] is to 
> look at the running + terminated but unacked + completed tasks, and if empty, 
> shut the executor down in the {{Slave::___run}} path. This will require us to 
> check that the completed task cache size is set to at least 1, and this also 
> assumes that the completed tasks are not cleared based on time or during 
> agent recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.

2018-01-22 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-8411:
---
Target Version/s: 1.4.2, 1.6.0, 1.5.1, 1.3.3  (was: 1.6.0)

> Killing a queued task can lead to the command executor never terminating.
> -
>
> Key: MESOS-8411
> URL: https://issues.apache.org/jira/browse/MESOS-8411
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.3.1, 1.4.1
>Reporter: Benjamin Mahler
>Assignee: Meng Zhu
>Priority: Critical
>
> If a task is killed while the executor is re-registering, we will remove it 
> from queued tasks and shut down the executor if all the its initial tasks 
> could not be delivered. However, there is a case (within {{Slave::___run}}) 
> where we leave the executor running, the race is:
> # Command-executor task launched.
> # Command executor sends registration message. Agent tells containerizer to 
> update the resources before it sends the tasks to the executor.
> # Kill arrives, and we synchronously remove the task from queued tasks.
> # Containerizer finishes updating the resources, and in {{Slave::___run}} the 
> killed task is ignored.
> # Command executor stays running!
> Executors could have a timeout to handle this case, but it's not clear that 
> all executors will implement this correctly. It would be better to have a 
> defensive policy that will shut down an executor if all of its initial batch 
> of tasks were killed prior to delivery.
> In order to implement this, one approach discussed with [~vinodkone] is to 
> look at the running + terminated but unacked + completed tasks, and if empty, 
> shut the executor down in the {{Slave::___run}} path. This will require us to 
> check that the completed task cache size is set to at least 1, and this also 
> assumes that the completed tasks are not cleared based on time or during 
> agent recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.

2018-01-22 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-8411:
---
Affects Version/s: (was: 1.3.0)
   1.3.1
   1.4.1

> Killing a queued task can lead to the command executor never terminating.
> -
>
> Key: MESOS-8411
> URL: https://issues.apache.org/jira/browse/MESOS-8411
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.3.1, 1.4.1
>Reporter: Benjamin Mahler
>Assignee: Meng Zhu
>Priority: Critical
>
> If a task is killed while the executor is re-registering, we will remove it 
> from queued tasks and shut down the executor if all the its initial tasks 
> could not be delivered. However, there is a case (within {{Slave::___run}}) 
> where we leave the executor running, the race is:
> # Command-executor task launched.
> # Command executor sends registration message. Agent tells containerizer to 
> update the resources before it sends the tasks to the executor.
> # Kill arrives, and we synchronously remove the task from queued tasks.
> # Containerizer finishes updating the resources, and in {{Slave::___run}} the 
> killed task is ignored.
> # Command executor stays running!
> Executors could have a timeout to handle this case, but it's not clear that 
> all executors will implement this correctly. It would be better to have a 
> defensive policy that will shut down an executor if all of its initial batch 
> of tasks were killed prior to delivery.
> In order to implement this, one approach discussed with [~vinodkone] is to 
> look at the running + terminated but unacked + completed tasks, and if empty, 
> shut the executor down in the {{Slave::___run}} path. This will require us to 
> check that the completed task cache size is set to at least 1, and this also 
> assumes that the completed tasks are not cleared based on time or during 
> agent recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.

2018-01-20 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8411:
--
Sprint: Mesosphere Sprint 72, Mesosphere Sprint 73  (was: Mesosphere Sprint 
72)

> Killing a queued task can lead to the command executor never terminating.
> -
>
> Key: MESOS-8411
> URL: https://issues.apache.org/jira/browse/MESOS-8411
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.3.0
>Reporter: Benjamin Mahler
>Assignee: Meng Zhu
>Priority: Critical
>
> If a task is killed while the executor is re-registering, we will remove it 
> from queued tasks and shut down the executor if all the its initial tasks 
> could not be delivered. However, there is a case (within {{Slave::___run}}) 
> where we leave the executor running, the race is:
> # Command-executor task launched.
> # Command executor sends registration message. Agent tells containerizer to 
> update the resources before it sends the tasks to the executor.
> # Kill arrives, and we synchronously remove the task from queued tasks.
> # Containerizer finishes updating the resources, and in {{Slave::___run}} the 
> killed task is ignored.
> # Command executor stays running!
> Executors could have a timeout to handle this case, but it's not clear that 
> all executors will implement this correctly. It would be better to have a 
> defensive policy that will shut down an executor if all of its initial batch 
> of tasks were killed prior to delivery.
> In order to implement this, one approach discussed with [~vinodkone] is to 
> look at the running + terminated but unacked + completed tasks, and if empty, 
> shut the executor down in the {{Slave::___run}} path. This will require us to 
> check that the completed task cache size is set to at least 1, and this also 
> assumes that the completed tasks are not cleared based on time or during 
> agent recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.

2018-01-18 Thread Meng Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu updated MESOS-8411:

Affects Version/s: 1.3.0

> Killing a queued task can lead to the command executor never terminating.
> -
>
> Key: MESOS-8411
> URL: https://issues.apache.org/jira/browse/MESOS-8411
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.3.0
>Reporter: Benjamin Mahler
>Assignee: Meng Zhu
>Priority: Critical
>
> If a task is killed while the executor is re-registering, we will remove it 
> from queued tasks and shut down the executor if all the its initial tasks 
> could not be delivered. However, there is a case (within {{Slave::___run}}) 
> where we leave the executor running, the race is:
> # Command-executor task launched.
> # Command executor sends registration message. Agent tells containerizer to 
> update the resources before it sends the tasks to the executor.
> # Kill arrives, and we synchronously remove the task from queued tasks.
> # Containerizer finishes updating the resources, and in {{Slave::___run}} the 
> killed task is ignored.
> # Command executor stays running!
> Executors could have a timeout to handle this case, but it's not clear that 
> all executors will implement this correctly. It would be better to have a 
> defensive policy that will shut down an executor if all of its initial batch 
> of tasks were killed prior to delivery.
> In order to implement this, one approach discussed with [~vinodkone] is to 
> look at the running + terminated but unacked + completed tasks, and if empty, 
> shut the executor down in the {{Slave::___run}} path. This will require us to 
> check that the completed task cache size is set to at least 1, and this also 
> assumes that the completed tasks are not cleared based on time or during 
> agent recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.

2018-01-17 Thread Meng Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Meng Zhu updated MESOS-8411:

Target Version/s: 1.6.0

> Killing a queued task can lead to the command executor never terminating.
> -
>
> Key: MESOS-8411
> URL: https://issues.apache.org/jira/browse/MESOS-8411
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Benjamin Mahler
>Assignee: Meng Zhu
>Priority: Critical
>
> If a task is killed while the executor is re-registering, we will remove it 
> from queued tasks and shut down the executor if all the its initial tasks 
> could not be delivered. However, there is a case (within {{Slave::___run}}) 
> where we leave the executor running, the race is:
> # Command-executor task launched.
> # Command executor sends registration message. Agent tells containerizer to 
> update the resources before it sends the tasks to the executor.
> # Kill arrives, and we synchronously remove the task from queued tasks.
> # Containerizer finishes updating the resources, and in {{Slave::___run}} the 
> killed task is ignored.
> # Command executor stays running!
> Executors could have a timeout to handle this case, but it's not clear that 
> all executors will implement this correctly. It would be better to have a 
> defensive policy that will shut down an executor if all of its initial batch 
> of tasks were killed prior to delivery.
> In order to implement this, one approach discussed with [~vinodkone] is to 
> look at the running + terminated but unacked + completed tasks, and if empty, 
> shut the executor down in the {{Slave::___run}} path. This will require us to 
> check that the completed task cache size is set to at least 1, and this also 
> assumes that the completed tasks are not cleared based on time or during 
> agent recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8411) Killing a queued task can lead to the command executor never terminating.

2018-01-08 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-8411:
--
Priority: Critical  (was: Major)

> Killing a queued task can lead to the command executor never terminating.
> -
>
> Key: MESOS-8411
> URL: https://issues.apache.org/jira/browse/MESOS-8411
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Benjamin Mahler
>Assignee: Meng Zhu
>Priority: Critical
>
> If a task is killed while the executor is re-registering, we will remove it 
> from queued tasks and shut down the executor if all the its initial tasks 
> could not be delivered. However, there is a case (within {{Slave::___run}}) 
> where we leave the executor running, the race is:
> # Command-executor task launched.
> # Command executor sends registration message. Agent tells containerizer to 
> update the resources before it sends the tasks to the executor.
> # Kill arrives, and we synchronously remove the task from queued tasks.
> # Containerizer finishes updating the resources, and in {{Slave::___run}} the 
> killed task is ignored.
> # Command executor stays running!
> Executors could have a timeout to handle this case, but it's not clear that 
> all executors will implement this correctly. It would be better to have a 
> defensive policy that will shut down an executor if all of its initial batch 
> of tasks were killed prior to delivery.
> In order to implement this, one approach discussed with [~vinodkone] is to 
> look at the running + terminated but unacked + completed tasks, and if empty, 
> shut the executor down in the {{Slave::___run}} path. This will require us to 
> check that the completed task cache size is set to at least 1, and this also 
> assumes that the completed tasks are not cleared based on time or during 
> agent recovery.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)