[jira] [Commented] (MESOS-8337) Invalid state transition attempted when agent is lost.

2018-01-12 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324742#comment-16324742
 ] 

Yan Xu commented on MESOS-8337:
---

{noformat:title=}
commit 35ac2f047abf2c0ea452b98a249c3dbb90d64282 (HEAD -> 1.5.x, apache/1.5.x)
Author: Jiang Yan Xu 
Date:   Fri Jan 12 15:30:15 2018 -0800

Updated CHANGELOG with MESOS-6406, MESOS-7215 and MESOS-8337.

These are all changes we made around partition-awareness in 1.5.0 so
they are grouped together.

commit d59109808443ab2987fd0204d94f9a4e3e84dd9b
Author: James Peach 
Date:   Fri Jan 12 13:46:27 2018 -0800

Prevented a crash when an agent with terminal tasks is partitioned.

If an agent is lost, we try to remove all the tasks that might have
been lost. If a task is already terminal but has unacknowleged status
updates, it is expected that we track it in the unreachable tasks list
so we should remove the CHECK that prevents this. This patch also
changes to how unreachable tasks are presented in the HTTP endpoints
so that terminal but unacknowleged tasks are shown in in the list of
unreachable tasks and not completed tasks, which is different than
1.4.x where they are shown as completed.

Review: https://reviews.apache.org/r/64940/
{noformat}

> Invalid state transition attempted when agent is lost.
> --
>
> Key: MESOS-8337
> URL: https://issues.apache.org/jira/browse/MESOS-8337
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: James Peach
>Assignee: James Peach
>Priority: Blocker
> Fix For: 1.5.0
>
>
> The change in MESOS-7215 can attempt to transition a task from {{FAILED}} to 
> {{LOST}} when removing a lost agent. This ends up triggering a {{CHECK}} that 
> was added in the same patch.
> {noformat}
> I1214 23:42:16.507931 22396 master.cpp:10155] Removing task 
> mobius-mloop-1512774555_3661616380-xxx with resources disk(allocated: *):200; 
> cpus(allocated: *):0.01; mem(allocated: *):200; ports(allocated: 
> *):[31068-31068, 31069-31069, 31072-31072] of framework 
> afcbfa05-7973-4ad3-8399-4153556a8fa9-3607 on agent 
> daceae53-448b-4349-8503-9dd8132a6828-S4 at slave(1)@17.147.52.220:5 
> (magent0006.xxx.com)
> F1214 23:42:16.507961 22396 master.hpp:2342] Check failed: task->state() == 
> TASK_UNREACHABLE || task->state() == TASK_LOST TASK_FAILED
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8337) Invalid state transition attempted when agent is lost.

2017-12-24 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302898#comment-16302898
 ] 

James Peach commented on MESOS-8337:


[~jieyu] This is a blocker for 1.5. I have a wacky patch that needs some 
cleanup and analysis before I can post it.

> Invalid state transition attempted when agent is lost.
> --
>
> Key: MESOS-8337
> URL: https://issues.apache.org/jira/browse/MESOS-8337
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: James Peach
>
> The change in MESOS-7215 can attempt to transition a task from {{FAILED}} to 
> {{LOST}} when removing a lost agent. This ends up triggering a {{CHECK}} that 
> was added in the same patch.
> {noformat}
> I1214 23:42:16.507931 22396 master.cpp:10155] Removing task 
> mobius-mloop-1512774555_3661616380-xxx with resources disk(allocated: *):200; 
> cpus(allocated: *):0.01; mem(allocated: *):200; ports(allocated: 
> *):[31068-31068, 31069-31069, 31072-31072] of framework 
> afcbfa05-7973-4ad3-8399-4153556a8fa9-3607 on agent 
> daceae53-448b-4349-8503-9dd8132a6828-S4 at slave(1)@17.147.52.220:5 
> (magent0006.xxx.com)
> F1214 23:42:16.507961 22396 master.hpp:2342] Check failed: task->state() == 
> TASK_UNREACHABLE || task->state() == TASK_LOST TASK_FAILED
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8337) Invalid state transition attempted when agent is lost.

2017-12-22 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302039#comment-16302039
 ] 

Jie Yu commented on MESOS-8337:
---

[~jpe...@apache.org] who is working on this issue? Is that a blocker for 1.5.0?

> Invalid state transition attempted when agent is lost.
> --
>
> Key: MESOS-8337
> URL: https://issues.apache.org/jira/browse/MESOS-8337
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: James Peach
>
> The change in MESOS-7215 can attempt to transition a task from {{FAILED}} to 
> {{LOST}} when removing a lost agent. This ends up triggering a {{CHECK}} that 
> was added in the same patch.
> {noformat}
> I1214 23:42:16.507931 22396 master.cpp:10155] Removing task 
> mobius-mloop-1512774555_3661616380-xxx with resources disk(allocated: *):200; 
> cpus(allocated: *):0.01; mem(allocated: *):200; ports(allocated: 
> *):[31068-31068, 31069-31069, 31072-31072] of framework 
> afcbfa05-7973-4ad3-8399-4153556a8fa9-3607 on agent 
> daceae53-448b-4349-8503-9dd8132a6828-S4 at slave(1)@17.147.52.220:5 
> (magent0006.xxx.com)
> F1214 23:42:16.507961 22396 master.hpp:2342] Check failed: task->state() == 
> TASK_UNREACHABLE || task->state() == TASK_LOST TASK_FAILED
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)