[jira] [Assigned] (MESOS-8391) Mesos agent doesn't notice that a pod task exits or crashes after the agent restart

2018-01-10 Thread Andrei Budnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrei Budnik reassigned MESOS-8391:


Assignee: Andrei Budnik  (was: Gilbert Song)

> Mesos agent doesn't notice that a pod task exits or crashes after the agent 
> restart
> ---
>
> Key: MESOS-8391
> URL: https://issues.apache.org/jira/browse/MESOS-8391
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization, executor
>Affects Versions: 1.5.0
>Reporter: Ivan Chernetsky
>Assignee: Andrei Budnik
>Priority: Blocker
> Attachments: testing-log-2.tar.gz
>
>
> h4. (1) Agent doesn't detect that a pod task exits/crashes
> # Create a Marathon pod with two containers which just do {{sleep 1}}.
> # Restart the Mesos agent on the node the pod got launched.
> # Kill one of the pod tasks
> *Expected result*: The Mesos agent detects that one of the tasks got killed, 
> and forwards {{TASK_FAILED}} status to Marathon.
> *Actual result*: The Mesos agent does nothing, and the Mesos master thinks 
> that both tasks are running just fine. Marathon doesn't take any action 
> because it doesn't receive any update from Mesos.
> h4. (2) After the agent restart, it detects that the task crashed, forwards 
> the correct status update, but the other task stays in {{TASK_KILLING}} state 
> forever
> # Perform steps in (1).
> # Restart the Mesos agent
> *Expected result*: The Mesos agent detects that one of the tasks got crashed, 
> forwards the corresponding status update, and kills the other task too.
> *Actual result*: The Mesos agent detects that one of the tasks got crashed, 
> forwards the corresponding status update, but the other task stays in 
> `TASK_KILLING` state forever.
> Please note, that after another agent restart, the other tasks gets finally 
> killed and the correct status updates get propagated all the way to Marathon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8391) Mesos agent doesn't notice that a pod task exits or crashes after the agent restart

2018-01-04 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-8391:
---

Assignee: Gilbert Song

> Mesos agent doesn't notice that a pod task exits or crashes after the agent 
> restart
> ---
>
> Key: MESOS-8391
> URL: https://issues.apache.org/jira/browse/MESOS-8391
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization, executor
>Affects Versions: 1.5.0
>Reporter: Ivan Chernetsky
>Assignee: Gilbert Song
>Priority: Blocker
> Attachments: agent.log.gz
>
>
> h4. (1) Agent doesn't detect that a pod task exits/crashes
> # Create a Marathon pod with two containers which just do {{sleep 1}}.
> # Restart the Mesos agent on the node the pod got launched.
> # Kill one of the pod tasks
> *Expected result*: The Mesos agent detects that one of the tasks got killed, 
> and forwards {{TASK_FAILED}} status to Marathon.
> *Actual result*: The Mesos agent does nothing, and the Mesos master thinks 
> that both tasks are running just fine. Marathon doesn't take any action 
> because it doesn't receive any update from Mesos.
> h4. (2) After the agent restart, it detects that the task crashed, forwards 
> the correct status update, but the other task stays in {{TASK_KILLING}} state 
> forever
> # Perform steps in (1).
> # Restart the Mesos agent
> *Expected result*: The Mesos agent detects that one of the tasks got crashed, 
> forwards the corresponding status update, and kills the other task too.
> *Actual result*: The Mesos agent detects that one of the tasks got crashed, 
> forwards the corresponding status update, but the other task stays in 
> `TASK_KILLING` state forever.
> Please note, that after another agent restart, the other tasks gets finally 
> killed and the correct status updates get propagated all the way to Marathon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)