[jira] [Updated] (MESOS-8430) Race between operation status updates and agent update

2018-01-10 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-8430:

Affects Version/s: 1.5.0

> Race between operation status updates and agent update
> --
>
> Key: MESOS-8430
> URL: https://issues.apache.org/jira/browse/MESOS-8430
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.5.0
>Reporter: Benjamin Bannier
>
> Currently, there exists a possible race between operation status updates 
> triggered by a status update manager in the agent and updates to the agent's 
> resources.
> Consider a master failover where an agent has a resource provider with an 
> operation which was not terminal. Now let the operation succeed and become 
> terminal in the agent, but have the master failover before it processes the 
> update. After master failover, the new master would learn about the resource 
> provider resources via an {{UpdateSlaveMessage}}. Simultaneously, a status 
> update manager in the agent could inform the master about the unacknowledged, 
> successful operation. If the operation status update arrives in the master 
> before the {{UpdateSlaveMessage}}, the operation status update handler could 
> attempt to apply the operation on resources unknown to it, yet. This would 
> likely trigger a {{CHECK}} failure in a contains check in the master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8430) Race between operation status updates and agent update

2018-01-10 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-8430:

Issue Type: Bug  (was: Task)

> Race between operation status updates and agent update
> --
>
> Key: MESOS-8430
> URL: https://issues.apache.org/jira/browse/MESOS-8430
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.5.0
>Reporter: Benjamin Bannier
>
> Currently, there exists a possible race between operation status updates 
> triggered by a status update manager in the agent and updates to the agent's 
> resources.
> Consider a master failover where an agent has a resource provider with an 
> operation which was not terminal. Now let the operation succeed and become 
> terminal in the agent, but have the master failover before it processes the 
> update. After master failover, the new master would learn about the resource 
> provider resources via an {{UpdateSlaveMessage}}. Simultaneously, a status 
> update manager in the agent could inform the master about the unacknowledged, 
> successful operation. If the operation status update arrives in the master 
> before the {{UpdateSlaveMessage}}, the operation status update handler could 
> attempt to apply the operation on resources unknown to it, yet. This would 
> likely trigger a {{CHECK}} failure in a contains check in the master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8430) Race between operation status updates and agent update

2018-01-10 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-8430:

Description: 
Currently, there exists a possible race between operation status updates 
triggered by a status update manager in the agent and updates to the agent's 
resources.

Consider a master failover where an agent has a resource provider with an 
operation which was not terminal. Now let the operation succeed and become 
terminal in the agent, but have the master failover before it processes the 
update. After master failover, the new master would learn about the resource 
provider resources via an {{UpdateSlaveMessage}}. Simultaneously, a status 
update manager in the agent could inform the master about the unacknowledged, 
successful operation. If the operation status update arrives in the master 
before the {{UpdateSlaveMessage}}, the operation status update handler could 
attempt to apply the operation on resources unknown to it, yet. This would 
likely trigger a {{CHECK}} failure in a contains check in the master.

  was:
Currently, there exists a possible race between operation status updates 
triggered by a status update manager in the agent and updates to the agent's 
resources.

Consider a master failover where an agent has a resource provider with an 
operation which was not terminal. Now let the operation succeed and become 
terminal in the agent, but have the master failover before it processes the 
update. After master failover, the new master would learn about the resource 
provider resources via an `UpdateSlaveMessage`. Simultaneously, a status update 
manager in the agent could inform the master about the unacknowledged, 
successful operation. If the operation status update arrives in the master 
before the `UpdateSlaveMessage`, the operation status update handler could 
attempt to apply the operation on resources unknown to it, yet. This would 
likely trigger a `CHECK` failure in a contains check.


> Race between operation status updates and agent update
> --
>
> Key: MESOS-8430
> URL: https://issues.apache.org/jira/browse/MESOS-8430
> Project: Mesos
>  Issue Type: Task
>  Components: agent
>Reporter: Benjamin Bannier
>
> Currently, there exists a possible race between operation status updates 
> triggered by a status update manager in the agent and updates to the agent's 
> resources.
> Consider a master failover where an agent has a resource provider with an 
> operation which was not terminal. Now let the operation succeed and become 
> terminal in the agent, but have the master failover before it processes the 
> update. After master failover, the new master would learn about the resource 
> provider resources via an {{UpdateSlaveMessage}}. Simultaneously, a status 
> update manager in the agent could inform the master about the unacknowledged, 
> successful operation. If the operation status update arrives in the master 
> before the {{UpdateSlaveMessage}}, the operation status update handler could 
> attempt to apply the operation on resources unknown to it, yet. This would 
> likely trigger a {{CHECK}} failure in a contains check in the master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)