[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-7744: -- Fix Version/s: (was: 1.4.1) 1.4.0 > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Assignee: Benjamin Mahler >Priority: Critical > Labels: reliability > Fix For: 1.1.3, 1.2.3, 1.3.2, 1.4.0 > > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7744: --- Priority: Critical (was: Blocker) > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Assignee: Benjamin Mahler >Priority: Critical > Labels: reliability > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7744: --- Target Version/s: 1.2.3, 1.3.2, 1.4.0, 1.5.0 (was: 1.2.3, 1.3.2, 1.5.0, 1.4.1) > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Assignee: Benjamin Mahler >Priority: Blocker > Labels: reliability > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7744: --- Target Version/s: 1.1.3, 1.2.3, 1.3.2, 1.4.0 (was: 1.1.3, 1.2.3, 1.3.2, 1.4.0, 1.5.0) > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Assignee: Benjamin Mahler >Priority: Blocker > Labels: reliability > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7744: --- Target Version/s: 1.1.3, 1.2.3, 1.3.2, 1.4.0, 1.5.0 (was: 1.2.3, 1.3.2, 1.4.0, 1.5.0) > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Assignee: Benjamin Mahler >Priority: Blocker > Labels: reliability > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7744: --- Priority: Blocker (was: Critical) > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Assignee: Benjamin Mahler >Priority: Blocker > Labels: reliability > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7744: --- Target Version/s: 1.2.3, 1.3.2, 1.5.0, 1.4.1 (was: 1.1.3, 1.2.3, 1.3.2, 1.5.0) > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Assignee: Benjamin Mahler >Priority: Critical > Labels: reliability > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kapil Arya updated MESOS-7744: -- Target Version/s: 1.1.3, 1.2.3, 1.3.2, 1.5.0 (was: 1.1.3, 1.2.3, 1.3.2, 1.4.0) > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Assignee: Benjamin Mahler >Priority: Critical > Labels: reliability > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7744: --- Target Version/s: 1.1.3, 1.2.3, 1.3.2, 1.4.0 (was: 1.1.3, 1.2.3, 1.3.2) > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Priority: Critical > Labels: reliability > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Mahler updated MESOS-7744: --- Target Version/s: 1.1.3, 1.2.3, 1.3.2 Priority: Critical (was: Minor) > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Priority: Critical > Labels: reliability > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-7744: --- Labels: reliability (was: ) > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Priority: Minor > Labels: reliability > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned > task Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task > Titus-7590548-worker-0-4476 for framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task > ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill > task Titus-7590548-worker-0-4476 of framework TitusFramework > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling > status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for > task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 > Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued > task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework > TitusFramework at executor(1)@100.66.11.10:17707{ > {code} > In our executor, we see that the launch message arrives after the master has > already gotten the kill update. We then send non-terminal state updates to > the agent, and yet it doesn't forward these to our framework. We're using a > custom executor which is based on the older mesos-go bindings. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (MESOS-7744) Mesos Agent Sends TASK_KILL status update to Master, and still launches task
[ https://issues.apache.org/jira/browse/MESOS-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sargun Dhillon updated MESOS-7744: -- Description: We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a TASK_STARTING back from the agent. Under certain conditions it can result in Mesos losing track of the task. The chunk of the logs which is interesting is here: {code} Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned task Titus-7590548-worker-0-4476 for framework TitusFramework Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task Titus-7590548-worker-0-4476 for framework TitusFramework Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework TitusFramework at executor(1)@100.66.11.10:17707 Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill task Titus-7590548-worker-0-4476 of framework TitusFramework Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework TitusFramework at executor(1)@100.66.11.10:17707{ {code} In our executor, we see that the launch message arrives after the master has already gotten the kill update. We then send non-terminal state updates to the agent, and yet it doesn't forward these to our framework. We're using a custom executor which is based on the older mesos-go bindings. was: We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a TASK_STARTING back from the agent. Under certain conditions it can result in Mesos losing track of the task. The chunk of the logs which is interesting is here: {code} Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned task Titus-7590548-worker-0-4476 for framework TitusFramework Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:26.952251 5171 slave.cpp:1614] Launching task Titus-7590548-worker-0-4476 for framework TitusFramework Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.484611 5171 slave.cpp:1853] Queuing task ‘Titus-7590548-worker-0-4476’ for executor ‘docker-executor’ of framework TitusFramework at executor(1)@100.66.11.10:17707 Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.487876 5171 slave.cpp:2035] Asked to kill task Titus-7590548-worker-0-4476 of framework TitusFramework Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.488994 5171 slave.cpp:3211] Handling status update TASK_KILLED (UUID: 898215d6-a244-4dbe-bc9c-878a22d36ea4) for task Titus-7590548-worker-0-4476 of framework TitusFramework from @0.0.0.0:0 Jun 29 23:22:37 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c mesos-slave[4290]: I0629 23:22:37.490603 5171 slave.cpp:2005] Sending queued task ‘Titus-7590548-worker-0-4476’ to executor ‘docker-executor’ of framework TitusFramework at executor(1)@100.66.11.10:17707{ {code} In our executor, we see that the launch message arrives after the master has already gotten the kill update. We then send non-terminal state updates to the agent, and yet it doesn't forward these to our framework. > Mesos Agent Sends TASK_KILL status update to Master, and still launches task > > > Key: MESOS-7744 > URL: https://issues.apache.org/jira/browse/MESOS-7744 > Project: Mesos > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Sargun Dhillon >Priority: Minor > > We sometimes launch jobs, and cancel them in ~7 seconds, if we don't get a > TASK_STARTING back from the agent. Under certain conditions it can result in > Mesos losing track of the task. The chunk of the logs which is interesting is > here: > {code} > Jun 29 23:22:26 titusagent-mainvpc-r3.8xlarge.2-i-04907efc9f1f8535c > mesos-slave[4290]: I0629 23:22:26.951799 5171 slave.cpp:1495] Got assigned