[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-28 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-7744:
--
Fix Version/s: (was: 1.4.1)
   1.4.0

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Assignee: Benjamin Mahler
>Priority: Critical
>  Labels: reliability
> Fix For: 1.1.3, 1.2.3, 1.3.2, 1.4.0
>
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-16 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7744:
---
Priority: Critical  (was: Blocker)

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Assignee: Benjamin Mahler
>Priority: Critical
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-16 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7744:
---
Target Version/s: 1.2.3, 1.3.2, 1.4.0, 1.5.0  (was: 1.2.3, 1.3.2, 1.5.0, 
1.4.1)

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Assignee: Benjamin Mahler
>Priority: Blocker
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-16 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7744:
---
Target Version/s: 1.1.3, 1.2.3, 1.3.2, 1.4.0  (was: 1.1.3, 1.2.3, 1.3.2, 
1.4.0, 1.5.0)

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Assignee: Benjamin Mahler
>Priority: Blocker
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-16 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7744:
---
Target Version/s: 1.1.3, 1.2.3, 1.3.2, 1.4.0, 1.5.0  (was: 1.2.3, 1.3.2, 
1.4.0, 1.5.0)

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Assignee: Benjamin Mahler
>Priority: Blocker
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-16 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7744:
---
Priority: Blocker  (was: Critical)

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Assignee: Benjamin Mahler
>Priority: Blocker
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-16 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7744:
---
Target Version/s: 1.2.3, 1.3.2, 1.5.0, 1.4.1  (was: 1.1.3, 1.2.3, 1.3.2, 
1.5.0)

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Assignee: Benjamin Mahler
>Priority: Critical
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-11 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya updated MESOS-7744:
--
Target Version/s: 1.1.3, 1.2.3, 1.3.2, 1.5.0  (was: 1.1.3, 1.2.3, 1.3.2, 
1.4.0)

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Assignee: Benjamin Mahler
>Priority: Critical
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-07 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7744:
---
Target Version/s: 1.1.3, 1.2.3, 1.3.2, 1.4.0  (was: 1.1.3, 1.2.3, 1.3.2)

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Priority: Critical
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-08-07 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7744:
---
Target Version/s: 1.1.3, 1.2.3, 1.3.2
Priority: Critical  (was: Minor)

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Priority: Critical
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-07-20 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7744:
---
Labels: reliability  (was: )

> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Priority: Minor
>  Labels: reliability
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
> task Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
> Titus-7590548-worker-0-4476 for framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
> ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
> task Titus-7590548-worker-0-4476 of framework TitusFramework
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling 
> status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for 
> task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
> Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
> task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
> TitusFramework at executor(1)@100.66.11.10:17707{
> {code}
> In our executor, we see that the launch message arrives after the master has 
> already gotten the kill update. We then send non-terminal state updates to 
> the agent, and yet it doesn't forward these to our framework. We're using a 
> custom executor which is based on the older mesos-go bindings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task

2017-06-29 Thread Sargun Dhillon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sargun Dhillon updated MESOS-7744:
--
Description: 
We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
TASK_STARTING back from the agent. Under certain conditions it can result in 
Mesos losing track of the task. The chunk of the logs which is interesting is 
here:

{code}
Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
task Titus-7590548-worker-0-4476 for framework TitusFramework
Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
Titus-7590548-worker-0-4476 for framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
TitusFramework at executor(1)@100.66.11.10:17707
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling status 
update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for task 
Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
TitusFramework at executor(1)@100.66.11.10:17707{
{code}

In our executor, we see that the launch message arrives after the master has 
already gotten the kill update. We then send non-terminal state updates to the 
agent, and yet it doesn't forward these to our framework. We're using a custom 
executor which is based on the older mesos-go bindings. 

  was:
We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
TASK_STARTING back from the agent. Under certain conditions it can result in 
Mesos losing track of the task. The chunk of the logs which is interesting is 
here:

{code}
Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned 
task Titus-7590548-worker-0-4476 for framework TitusFramework
Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:26.952251  5171 slave.cpp:1614] Launching task 
Titus-7590548-worker-0-4476 for framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:37.484611  5171 slave.cpp:1853] Queuing task 
‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework 
TitusFramework at executor(1)@100.66.11.10:17707
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:37.487876  5171 slave.cpp:2035] Asked to kill 
task Titus-7590548-worker-0-4476 of framework TitusFramework
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:37.488994  5171 slave.cpp:3211] Handling status 
update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for task 
Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0
Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
mesos-slave[4290]: I0629 23:22:37.490603  5171 slave.cpp:2005] Sending queued 
task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework 
TitusFramework at executor(1)@100.66.11.10:17707{
{code}

In our executor, we see that the launch message arrives after the master has 
already gotten the kill update. We then send non-terminal state updates to the 
agent, and yet it doesn't forward these to our framework. 


> Mesos Agent Sends TASK_KILL status update to Master, and still launches task
> 
>
> Key: MESOS-7744
> URL: https://issues.apache.org/jira/browse/MESOS-7744
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
>Reporter: Sargun Dhillon
>Priority: Minor
>
> We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a 
> TASK_STARTING back from the agent. Under certain conditions it can result in 
> Mesos losing track of the task. The chunk of the logs which is interesting is 
> here:
> {code}
> Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c 
> mesos-slave[4290]: I0629 23:22:26.951799  5171 slave.cpp:1495] Got assigned