[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters
[ https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288018#comment-16288018 ] Yan Xu commented on MESOS-6406: --- {noformat:title=} commit 5e5a8102c3281db25a37157dac123b0ca546e030 (HEAD -> master, apache/master) Author: Megha SharmaDate: Tue Dec 12 08:21:19 2017 -0800 Send status updates when an unreachable agent re-registers. Master will send task status updates to frameworks upon agent re-registration if the agent: - has previously been removed by the master for being unreachable or - is unknown to the master due to the garbage collection of the unreachable and gone agents in the registry and the master's state. Review: https://reviews.apache.org/r/64098/ commit 34503f8b429e3459a7a132ca8cf02acdec3c7881 Author: Megha Sharma Date: Tue Dec 12 08:21:14 2017 -0800 Added a new reason to task status. Added new reason `REASON_AGENT_REREGISTERED` (`REASON_SLAVE_REREGISTERED` in v0) to task status. The new reason will be used when master starts to send status update during the re-registration of an unreachable or unknown agent. Review: https://reviews.apache.org/r/64250/ {noformat} > Send latest status for partition-aware tasks when agent reregisters > --- > > Key: MESOS-6406 > URL: https://issues.apache.org/jira/browse/MESOS-6406 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway >Assignee: Megha Sharma > Labels: mesosphere > > When an agent reregisters, we should notify frameworks about the current > status of any partition-aware tasks that were/are running on the agent -- > i.e., report the current state of the task at the agent to the framework. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters
[ https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271862#comment-16271862 ] Yan Xu commented on MESOS-6406: --- [~ipronin] no if the agent's entry was GCed. The master does know all the "registered" agents. I guess to support this the master can choose to send status updates for agents that are 1) either unreachable or 2) totally unknown. Would this work? I am mainly not sure it's a good idea to send status updates for all non-completed (pending, running, terminated but unacked) tasks during master failover, which is a time when the master is very loaded. > Send latest status for partition-aware tasks when agent reregisters > --- > > Key: MESOS-6406 > URL: https://issues.apache.org/jira/browse/MESOS-6406 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway >Assignee: Megha Sharma > Labels: mesosphere > > When an agent reregisters, we should notify frameworks about the current > status of any partition-aware tasks that were/are running on the agent -- > i.e., report the current state of the task at the agent to the framework. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters
[ https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271799#comment-16271799 ] Ilya Pronin commented on MESOS-6406: What if the agent becomes unreachable, then master failover happens and then the agent re-registers? Let's pretend that the agent's entry was GCd from the registry. In this case the framework will not know that the task came back, right? > Send latest status for partition-aware tasks when agent reregisters > --- > > Key: MESOS-6406 > URL: https://issues.apache.org/jira/browse/MESOS-6406 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway >Assignee: Megha Sharma > Labels: mesosphere > > When an agent reregisters, we should notify frameworks about the current > status of any partition-aware tasks that were/are running on the agent -- > i.e., report the current state of the task at the agent to the framework. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters
[ https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271772#comment-16271772 ] Yan Xu commented on MESOS-6406: --- So I think we can probably improve on the approach stated in the JIRA: when the master fails over and for agents that haven't been unreachable, perhaps we don't need to send status updates for these tasks? For unreachable agents we have informed the frameworks about these tasks via {{TASK_UNREACHABLE}} so upon reregistration we need to inform frameworks that these tasks are back. For other agents, if the state of a task has changed during master failover, the agent is going to send new status updates with retries so we don't need to worry about the schedulers not getting updates; if the state hasn't changed, the scheduler is already aware of the latest state of the task so the master doesn't need to send me either. /cc [~megha.sharma] [~ipronin] [~vinodkone] > Send latest status for partition-aware tasks when agent reregisters > --- > > Key: MESOS-6406 > URL: https://issues.apache.org/jira/browse/MESOS-6406 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway >Assignee: Megha Sharma > Labels: mesosphere > > When an agent reregisters, we should notify frameworks about the current > status of any partition-aware tasks that were/are running on the agent -- > i.e., report the current state of the task at the agent to the framework. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters
[ https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092266#comment-16092266 ] Yan Xu commented on MESOS-6406: --- The master should probably send updates about non-partition-aware framework tasks as well. Especially in light of MESOS-7215 for which we are going to stop killing tasks in all cases. > Send latest status for partition-aware tasks when agent reregisters > --- > > Key: MESOS-6406 > URL: https://issues.apache.org/jira/browse/MESOS-6406 > Project: Mesos > Issue Type: Bug >Reporter: Neil Conway >Assignee: Neil Conway > Labels: mesosphere > > When an agent reregisters, we should notify frameworks about the current > status of any partition-aware tasks that were/are running on the agent -- > i.e., report the current state of the task at the agent to the framework. -- This message was sent by Atlassian JIRA (v6.4.14#64029)