[jira] [Commented] (MESOS-8750) Check failed: !slaves.registered.contains(task->slave_id)
[ https://issues.apache.org/jira/browse/MESOS-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769727#comment-16769727 ] Vinod Kone commented on MESOS-8750: --- [~megha.sharma] [~xujyan] Why was this not backported to older versions? > Check failed: !slaves.registered.contains(task->slave_id) > - > > Key: MESOS-8750 > URL: https://issues.apache.org/jira/browse/MESOS-8750 > Project: Mesos > Issue Type: Bug > Components: master >Affects Versions: 1.6.0 >Reporter: Megha Sharma >Assignee: Megha Sharma >Priority: Critical > Fix For: 1.6.0 > > > It appears that in certain circumstances an unreachable task doesn't get > cleaned up from the framework.unreachableTasks when the respective agent > re-registers leading to this check failure later when the framework is being > removed. When an agent goes unreachable master adds the tasks from this agent > to {{framework.unreachableTasks}} and when such an agent re-registers the > master removes the tasks that it specifies during re-registeration from this > datastructure but there could be tasks that the agent doesn't know about e.g. > if the runTask message for them got dropped and so such tasks will not get > removed from unreachableTasks. > {noformat} > F0310 13:30:58.856665 62740 master.cpp:9671] Check failed: > !slaves.registered.contains(task->slave_id()) Unreachable task of > framework 4f57975b-05dd-4118-8674-5b29a86c6a6c-0850 was found on registered > agent 683c4a92-b5a0-490c-998a-6113fc86d37a-S1428 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8750) Check failed: !slaves.registered.contains(task->slave_id)
[ https://issues.apache.org/jira/browse/MESOS-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463379#comment-16463379 ] Yan Xu commented on MESOS-8750: --- {code:title=} commit 520b729857223aeade345cbdf61209ec4f395ad9 Author: Megha Sharma Date: Thu May 3 22:09:02 2018 -0700 Remove unknown unreachable tasks when agent reregisters. A RunTaskMesssage could get dropped for an agent while it's disconnected from the master and when such an agent goes unreachable then this dropped task message gets added to the unreachable tasks. When the agent reregisters, the master sends status updates for the tasks that the agent reported when re-registering and these tasks are also removed from the unreachableTasks on the framework but since the agent doesn't know about the dropped task so it doesn't get removed from the unreachableTasks leading to a check failure when this inconsistency is detected during framework removal. Review: https://reviews.apache.org/r/66644/ {code} > Check failed: !slaves.registered.contains(task->slave_id) > - > > Key: MESOS-8750 > URL: https://issues.apache.org/jira/browse/MESOS-8750 > Project: Mesos > Issue Type: Task > Components: master >Affects Versions: 1.6.0 >Reporter: Megha Sharma >Assignee: Megha Sharma >Priority: Critical > > It appears that in certain circumstances an unreachable task doesn't get > cleaned up from the framework.unreachableTasks when the respective agent > re-registers leading to this check failure later when the framework is being > removed. When an agent goes unreachable master adds the tasks from this agent > to {{framework.unreachableTasks}} and when such an agent re-registers the > master removes the tasks that it specifies during re-registeration from this > datastructure but there could be tasks that the agent doesn't know about e.g. > if the runTask message for them got dropped and so such tasks will not get > removed from unreachableTasks. > {noformat} > F0310 13:30:58.856665 62740 master.cpp:9671] Check failed: > !slaves.registered.contains(task->slave_id()) Unreachable task of > framework 4f57975b-05dd-4118-8674-5b29a86c6a6c-0850 was found on registered > agent 683c4a92-b5a0-490c-998a-6113fc86d37a-S1428 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8750) Check failed: !slaves.registered.contains(task->slave_id)
[ https://issues.apache.org/jira/browse/MESOS-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448618#comment-16448618 ] ASF GitHub Bot commented on MESOS-8750: --- Github user m9a closed the pull request at: https://github.com/apache/mesos/pull/279 > Check failed: !slaves.registered.contains(task->slave_id) > - > > Key: MESOS-8750 > URL: https://issues.apache.org/jira/browse/MESOS-8750 > Project: Mesos > Issue Type: Task > Components: master >Affects Versions: 1.6.0 >Reporter: Megha Sharma >Assignee: Megha Sharma >Priority: Critical > > It appears that in certain circumstances an unreachable task doesn't get > cleaned up from the framework.unreachableTasks when the respective agent > re-registers leading to this check failure later when the framework is being > removed. When an agent goes unreachable master adds the tasks from this agent > to {{framework.unreachableTasks}} and when such an agent re-registers the > master removes the tasks that it specifies during re-registeration from this > datastructure but there could be tasks that the agent doesn't know about e.g. > if the runTask message for them got dropped and so such tasks will not get > removed from unreachableTasks. > {noformat} > F0310 13:30:58.856665 62740 master.cpp:9671] Check failed: > !slaves.registered.contains(task->slave_id()) Unreachable task of > framework 4f57975b-05dd-4118-8674-5b29a86c6a6c-0850 was found on registered > agent 683c4a92-b5a0-490c-998a-6113fc86d37a-S1428 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8750) Check failed: !slaves.registered.contains(task->slave_id)
[ https://issues.apache.org/jira/browse/MESOS-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424306#comment-16424306 ] ASF GitHub Bot commented on MESOS-8750: --- Github user m9a commented on the issue: https://github.com/apache/mesos/pull/279 The JIRA for this PR: https://issues.apache.org/jira/browse/MESOS-8750 Since @xujyan is shepherding it I intended to set him as the reviewer but it doesn't look like I can change those fields on the PR. > Check failed: !slaves.registered.contains(task->slave_id) > - > > Key: MESOS-8750 > URL: https://issues.apache.org/jira/browse/MESOS-8750 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Megha Sharma >Assignee: Megha Sharma >Priority: Major > > It appears that in certain circumstances an unreachable task doesn't get > cleaned up from the framework.unreachableTasks when the respective agent > re-registers leading to this check failure later when the framework is being > removed. When an agent goes unreachable master adds the tasks from this agent > to framework.unreachableTasks and when such an agent re-registers the master > removes the tasks that it specifies during re-registeration from this > datastructure but there could be tasks that the agent doesn't know about e.g. > if the runTask message for them got dropped and so such tasks will not get > removed from unreachableTasks. > F0112 21:50:39.272985 44038 master.cpp:9617] Check failed: > !slaves.registered.contains(task->slave_id()) > Check failure stack trace: *** > @ 0x7fb7260692bd (unknown) > @ 0x7fb72606b04d (unknown) > @ 0x7fb726068e42 (unknown) > @ 0x7fb72606ba29 (unknown) > @ 0x7fb7251f5226 (unknown) > @ 0x7fb725120081 (unknown) > @ 0x7fb72519ca37 (unknown) > @ 0x7fb725fbb2fe (unknown) > @ 0x7fb724f75de9 (unknown) > @ 0x7fb725fb4fc2 (unknown) > @ 0x7fb725fc4a17 (unknown) > @ 0x7fb725fca276 (unknown) > @ 0x7fb72352d470 (unknown) > @ 0x7fb723784aa1 start_thread > @ 0x7fb722f47bcd clone > @ (nil) (unknown) > Aborted > -- This message was sent by Atlassian JIRA (v7.6.3#76005)