[jira] [Commented] (MESOS-8337) Invalid state transition attempted when agent is lost.
[ https://issues.apache.org/jira/browse/MESOS-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324742#comment-16324742 ] Yan Xu commented on MESOS-8337: --- {noformat:title=} commit 35ac2f047abf2c0ea452b98a249c3dbb90d64282 (HEAD -> 1.5.x, apache/1.5.x) Author: Jiang Yan XuDate: Fri Jan 12 15:30:15 2018 -0800 Updated CHANGELOG with MESOS-6406, MESOS-7215 and MESOS-8337. These are all changes we made around partition-awareness in 1.5.0 so they are grouped together. commit d59109808443ab2987fd0204d94f9a4e3e84dd9b Author: James Peach Date: Fri Jan 12 13:46:27 2018 -0800 Prevented a crash when an agent with terminal tasks is partitioned. If an agent is lost, we try to remove all the tasks that might have been lost. If a task is already terminal but has unacknowleged status updates, it is expected that we track it in the unreachable tasks list so we should remove the CHECK that prevents this. This patch also changes to how unreachable tasks are presented in the HTTP endpoints so that terminal but unacknowleged tasks are shown in in the list of unreachable tasks and not completed tasks, which is different than 1.4.x where they are shown as completed. Review: https://reviews.apache.org/r/64940/ {noformat} > Invalid state transition attempted when agent is lost. > -- > > Key: MESOS-8337 > URL: https://issues.apache.org/jira/browse/MESOS-8337 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: James Peach >Assignee: James Peach >Priority: Blocker > Fix For: 1.5.0 > > > The change in MESOS-7215 can attempt to transition a task from {{FAILED}} to > {{LOST}} when removing a lost agent. This ends up triggering a {{CHECK}} that > was added in the same patch. > {noformat} > I1214 23:42:16.507931 22396 master.cpp:10155] Removing task > mobius-mloop-1512774555_3661616380-xxx with resources disk(allocated: *):200; > cpus(allocated: *):0.01; mem(allocated: *):200; ports(allocated: > *):[31068-31068, 31069-31069, 31072-31072] of framework > afcbfa05-7973-4ad3-8399-4153556a8fa9-3607 on agent > daceae53-448b-4349-8503-9dd8132a6828-S4 at slave(1)@17.147.52.220:5 > (magent0006.xxx.com) > F1214 23:42:16.507961 22396 master.hpp:2342] Check failed: task->state() == > TASK_UNREACHABLE || task->state() == TASK_LOST TASK_FAILED > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8337) Invalid state transition attempted when agent is lost.
[ https://issues.apache.org/jira/browse/MESOS-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302898#comment-16302898 ] James Peach commented on MESOS-8337: [~jieyu] This is a blocker for 1.5. I have a wacky patch that needs some cleanup and analysis before I can post it. > Invalid state transition attempted when agent is lost. > -- > > Key: MESOS-8337 > URL: https://issues.apache.org/jira/browse/MESOS-8337 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: James Peach > > The change in MESOS-7215 can attempt to transition a task from {{FAILED}} to > {{LOST}} when removing a lost agent. This ends up triggering a {{CHECK}} that > was added in the same patch. > {noformat} > I1214 23:42:16.507931 22396 master.cpp:10155] Removing task > mobius-mloop-1512774555_3661616380-xxx with resources disk(allocated: *):200; > cpus(allocated: *):0.01; mem(allocated: *):200; ports(allocated: > *):[31068-31068, 31069-31069, 31072-31072] of framework > afcbfa05-7973-4ad3-8399-4153556a8fa9-3607 on agent > daceae53-448b-4349-8503-9dd8132a6828-S4 at slave(1)@17.147.52.220:5 > (magent0006.xxx.com) > F1214 23:42:16.507961 22396 master.hpp:2342] Check failed: task->state() == > TASK_UNREACHABLE || task->state() == TASK_LOST TASK_FAILED > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-8337) Invalid state transition attempted when agent is lost.
[ https://issues.apache.org/jira/browse/MESOS-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302039#comment-16302039 ] Jie Yu commented on MESOS-8337: --- [~jpe...@apache.org] who is working on this issue? Is that a blocker for 1.5.0? > Invalid state transition attempted when agent is lost. > -- > > Key: MESOS-8337 > URL: https://issues.apache.org/jira/browse/MESOS-8337 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: James Peach > > The change in MESOS-7215 can attempt to transition a task from {{FAILED}} to > {{LOST}} when removing a lost agent. This ends up triggering a {{CHECK}} that > was added in the same patch. > {noformat} > I1214 23:42:16.507931 22396 master.cpp:10155] Removing task > mobius-mloop-1512774555_3661616380-xxx with resources disk(allocated: *):200; > cpus(allocated: *):0.01; mem(allocated: *):200; ports(allocated: > *):[31068-31068, 31069-31069, 31072-31072] of framework > afcbfa05-7973-4ad3-8399-4153556a8fa9-3607 on agent > daceae53-448b-4349-8503-9dd8132a6828-S4 at slave(1)@17.147.52.220:5 > (magent0006.xxx.com) > F1214 23:42:16.507961 22396 master.hpp:2342] Check failed: task->state() == > TASK_UNREACHABLE || task->state() == TASK_LOST TASK_FAILED > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)