[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters

2017-12-12 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288018#comment-16288018
 ] 

Yan Xu commented on MESOS-6406:
---

{noformat:title=}
commit 5e5a8102c3281db25a37157dac123b0ca546e030 (HEAD -> master, apache/master)
Author: Megha Sharma 
Date:   Tue Dec 12 08:21:19 2017 -0800

Send status updates when an unreachable agent re-registers.

Master will send task status updates to frameworks upon agent
re-registration if the agent:
- has previously been removed by the master for being unreachable or
- is unknown to the master due to the garbage collection of the
  unreachable and gone agents in the registry and the master's state.

Review: https://reviews.apache.org/r/64098/

commit 34503f8b429e3459a7a132ca8cf02acdec3c7881
Author: Megha Sharma 
Date:   Tue Dec 12 08:21:14 2017 -0800

Added a new reason to task status.

Added new reason `REASON_AGENT_REREGISTERED`
(`REASON_SLAVE_REREGISTERED` in v0) to task status.

The new reason will be used when master starts to send status update
during the re-registration of an unreachable or unknown agent.

Review: https://reviews.apache.org/r/64250/
{noformat}

> Send latest status for partition-aware tasks when agent reregisters
> ---
>
> Key: MESOS-6406
> URL: https://issues.apache.org/jira/browse/MESOS-6406
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Megha Sharma
>  Labels: mesosphere
>
> When an agent reregisters, we should notify frameworks about the current 
> status of any partition-aware tasks that were/are running on the agent -- 
> i.e., report the current state of the task at the agent to the framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters

2017-11-29 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271862#comment-16271862
 ] 

Yan Xu commented on MESOS-6406:
---

[~ipronin] no if the agent's entry was GCed. The master does know all the 
"registered" agents. I guess to support this the master can choose to send 
status updates for agents that are 1) either unreachable or 2) totally unknown. 
Would this work?

I am mainly not sure it's a good idea to send status updates for all 
non-completed (pending, running, terminated but unacked) tasks during master 
failover, which is a time when the master is very loaded.


> Send latest status for partition-aware tasks when agent reregisters
> ---
>
> Key: MESOS-6406
> URL: https://issues.apache.org/jira/browse/MESOS-6406
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Megha Sharma
>  Labels: mesosphere
>
> When an agent reregisters, we should notify frameworks about the current 
> status of any partition-aware tasks that were/are running on the agent -- 
> i.e., report the current state of the task at the agent to the framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters

2017-11-29 Thread Ilya Pronin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271799#comment-16271799
 ] 

Ilya Pronin commented on MESOS-6406:


What if the agent becomes unreachable, then master failover happens and then 
the agent re-registers? Let's pretend that the agent's entry was GCd from the 
registry. In this case the framework will not know that the task came back, 
right?

> Send latest status for partition-aware tasks when agent reregisters
> ---
>
> Key: MESOS-6406
> URL: https://issues.apache.org/jira/browse/MESOS-6406
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Megha Sharma
>  Labels: mesosphere
>
> When an agent reregisters, we should notify frameworks about the current 
> status of any partition-aware tasks that were/are running on the agent -- 
> i.e., report the current state of the task at the agent to the framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters

2017-11-29 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271772#comment-16271772
 ] 

Yan Xu commented on MESOS-6406:
---

So I think we can probably improve on the approach stated in the JIRA: when the 
master fails over and for agents that haven't been unreachable, perhaps we 
don't need to send status updates for these tasks? 

For unreachable agents we have informed the frameworks about these tasks via 
{{TASK_UNREACHABLE}} so upon reregistration we need to inform frameworks that 
these tasks are back.

For other agents, if the state of a task has changed during master failover, 
the agent is going to send new status updates with retries so we don't need to 
worry about the schedulers not getting updates; if the state hasn't changed, 
the scheduler is already aware of the latest state of the task so the master 
doesn't need to send me either.

/cc [~megha.sharma] [~ipronin] [~vinodkone]

> Send latest status for partition-aware tasks when agent reregisters
> ---
>
> Key: MESOS-6406
> URL: https://issues.apache.org/jira/browse/MESOS-6406
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Megha Sharma
>  Labels: mesosphere
>
> When an agent reregisters, we should notify frameworks about the current 
> status of any partition-aware tasks that were/are running on the agent -- 
> i.e., report the current state of the task at the agent to the framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6406) Send latest status for partition-aware tasks when agent reregisters

2017-07-18 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092266#comment-16092266
 ] 

Yan Xu commented on MESOS-6406:
---

The master should probably send updates about non-partition-aware framework 
tasks as well. Especially in light of MESOS-7215 for which we are going to stop 
killing tasks in all cases.

> Send latest status for partition-aware tasks when agent reregisters
> ---
>
> Key: MESOS-6406
> URL: https://issues.apache.org/jira/browse/MESOS-6406
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> When an agent reregisters, we should notify frameworks about the current 
> status of any partition-aware tasks that were/are running on the agent -- 
> i.e., report the current state of the task at the agent to the framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)