[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition

2016-07-11 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5576:
--
Fix Version/s: 0.28.3

> Masters may drop the first message they send between masters after a network 
> partition
> --
>
> Key: MESOS-5576
> URL: https://issues.apache.org/jira/browse/MESOS-5576
> Project: Mesos
>  Issue Type: Improvement
>  Components: leader election, master, replicated log
>Affects Versions: 0.28.2
> Environment: Observed in an OpenStack environment where each master 
> lives on a separate VM.
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
> Fix For: 0.28.3, 1.0.0
>
>
> We observed the following situation in a cluster of five masters:
> || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
> | 0 | Follower | Follower | Follower | Follower | Leader |
> | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster 
> by downing this VM's network ||
> | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
> leadership |
> | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
> leader | Still down |
> | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
> Still down |
> | 5 | Leader | Follower | Follower | Follower | Still down |
> | 6 | Leader | Follower | Follower | Follower | Comes back up |
> | 7 | Leader | Follower | Follower | Follower | Follower |
> | 8 || Partitioned in the same way as Master 5 | Follower | Follower | 
> Follower | Follower |
> | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
> Follower | Follower |
> | 10 | Still down | Performs consensus | Replies to leader | Replies to 
> leader || Doesn't get the message! ||
> | 11 | Still down | Performs writing | Acks to leader | Acks to leader || 
> Acks to leader ||
> | 12 | Still down | Leader | Follower | Follower | Follower |
> Master 2 sends a series of messages to the recently-restarted Master 5.  The 
> first message is dropped, but subsequent messages are not dropped.
> This appears to be due to a stale link between the masters.  Before leader 
> election, the replicated log actors create a network watcher, which adds 
> links to masters that join the ZK group:
> https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159
> This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
> perhaps due to how the network partition was induced (in the hypervisor 
> layer, rather than in the VM itself).
> When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
> observe the [expected log 
> message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]
> Instead, we see a log line in Master 2:
> {code}
> process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
> not connected
> {code}
> The broken link is removed by the libprocess {{socket_manager}} and the 
> following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new 
> socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition

2016-06-30 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5576:
---
Shepherd: Benjamin Mahler

> Masters may drop the first message they send between masters after a network 
> partition
> --
>
> Key: MESOS-5576
> URL: https://issues.apache.org/jira/browse/MESOS-5576
> Project: Mesos
>  Issue Type: Improvement
>  Components: leader election, master, replicated log
>Affects Versions: 0.28.2
> Environment: Observed in an OpenStack environment where each master 
> lives on a separate VM.
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> We observed the following situation in a cluster of five masters:
> || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
> | 0 | Follower | Follower | Follower | Follower | Leader |
> | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster 
> by downing this VM's network ||
> | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
> leadership |
> | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
> leader | Still down |
> | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
> Still down |
> | 5 | Leader | Follower | Follower | Follower | Still down |
> | 6 | Leader | Follower | Follower | Follower | Comes back up |
> | 7 | Leader | Follower | Follower | Follower | Follower |
> | 8 || Partitioned in the same way as Master 5 | Follower | Follower | 
> Follower | Follower |
> | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
> Follower | Follower |
> | 10 | Still down | Performs consensus | Replies to leader | Replies to 
> leader || Doesn't get the message! ||
> | 11 | Still down | Performs writing | Acks to leader | Acks to leader || 
> Acks to leader ||
> | 12 | Still down | Leader | Follower | Follower | Follower |
> Master 2 sends a series of messages to the recently-restarted Master 5.  The 
> first message is dropped, but subsequent messages are not dropped.
> This appears to be due to a stale link between the masters.  Before leader 
> election, the replicated log actors create a network watcher, which adds 
> links to masters that join the ZK group:
> https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159
> This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
> perhaps due to how the network partition was induced (in the hypervisor 
> layer, rather than in the VM itself).
> When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
> observe the [expected log 
> message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]
> Instead, we see a log line in Master 2:
> {code}
> process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
> not connected
> {code}
> The broken link is removed by the libprocess {{socket_manager}} and the 
> following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new 
> socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition

2016-06-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5576:
-
Sprint:   (was: Mesosphere Sprint 37)

> Masters may drop the first message they send between masters after a network 
> partition
> --
>
> Key: MESOS-5576
> URL: https://issues.apache.org/jira/browse/MESOS-5576
> Project: Mesos
>  Issue Type: Improvement
>  Components: leader election, master, replicated log
>Affects Versions: 0.28.2
> Environment: Observed in an OpenStack environment where each master 
> lives on a separate VM.
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> We observed the following situation in a cluster of five masters:
> || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
> | 0 | Follower | Follower | Follower | Follower | Leader |
> | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster 
> by downing this VM's network ||
> | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
> leadership |
> | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
> leader | Still down |
> | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
> Still down |
> | 5 | Leader | Follower | Follower | Follower | Still down |
> | 6 | Leader | Follower | Follower | Follower | Comes back up |
> | 7 | Leader | Follower | Follower | Follower | Follower |
> | 8 || Partitioned in the same way as Master 5 | Follower | Follower | 
> Follower | Follower |
> | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
> Follower | Follower |
> | 10 | Still down | Performs consensus | Replies to leader | Replies to 
> leader || Doesn't get the message! ||
> | 11 | Still down | Performs writing | Acks to leader | Acks to leader || 
> Acks to leader ||
> | 12 | Still down | Leader | Follower | Follower | Follower |
> Master 2 sends a series of messages to the recently-restarted Master 5.  The 
> first message is dropped, but subsequent messages are not dropped.
> This appears to be due to a stale link between the masters.  Before leader 
> election, the replicated log actors create a network watcher, which adds 
> links to masters that join the ZK group:
> https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159
> This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
> perhaps due to how the network partition was induced (in the hypervisor 
> layer, rather than in the VM itself).
> When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
> observe the [expected log 
> message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]
> Instead, we see a log line in Master 2:
> {code}
> process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
> not connected
> {code}
> The broken link is removed by the libprocess {{socket_manager}} and the 
> following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new 
> socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition

2016-06-23 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5576:
-
Sprint: Mesosphere Sprint 38

> Masters may drop the first message they send between masters after a network 
> partition
> --
>
> Key: MESOS-5576
> URL: https://issues.apache.org/jira/browse/MESOS-5576
> Project: Mesos
>  Issue Type: Improvement
>  Components: leader election, master, replicated log
>Affects Versions: 0.28.2
> Environment: Observed in an OpenStack environment where each master 
> lives on a separate VM.
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> We observed the following situation in a cluster of five masters:
> || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
> | 0 | Follower | Follower | Follower | Follower | Leader |
> | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster 
> by downing this VM's network ||
> | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
> leadership |
> | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
> leader | Still down |
> | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
> Still down |
> | 5 | Leader | Follower | Follower | Follower | Still down |
> | 6 | Leader | Follower | Follower | Follower | Comes back up |
> | 7 | Leader | Follower | Follower | Follower | Follower |
> | 8 || Partitioned in the same way as Master 5 | Follower | Follower | 
> Follower | Follower |
> | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
> Follower | Follower |
> | 10 | Still down | Performs consensus | Replies to leader | Replies to 
> leader || Doesn't get the message! ||
> | 11 | Still down | Performs writing | Acks to leader | Acks to leader || 
> Acks to leader ||
> | 12 | Still down | Leader | Follower | Follower | Follower |
> Master 2 sends a series of messages to the recently-restarted Master 5.  The 
> first message is dropped, but subsequent messages are not dropped.
> This appears to be due to a stale link between the masters.  Before leader 
> election, the replicated log actors create a network watcher, which adds 
> links to masters that join the ZK group:
> https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159
> This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
> perhaps due to how the network partition was induced (in the hypervisor 
> layer, rather than in the VM itself).
> When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
> observe the [expected log 
> message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]
> Instead, we see a log line in Master 2:
> {code}
> process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
> not connected
> {code}
> The broken link is removed by the libprocess {{socket_manager}} and the 
> following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new 
> socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition

2016-06-17 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5576:
-
Issue Type: Improvement  (was: Bug)

Changing type from {{Bug}} to {{Improvement}} because the masters will still 
recover *eventually* in this case.  Bad sockets are cleaned out when the 
masters abort due to {{--registry_fetch_timeout}}.

> Masters may drop the first message they send between masters after a network 
> partition
> --
>
> Key: MESOS-5576
> URL: https://issues.apache.org/jira/browse/MESOS-5576
> Project: Mesos
>  Issue Type: Improvement
>  Components: leader election, master, replicated log
>Affects Versions: 0.28.2
> Environment: Observed in an OpenStack environment where each master 
> lives on a separate VM.
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> We observed the following situation in a cluster of five masters:
> || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
> | 0 | Follower | Follower | Follower | Follower | Leader |
> | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster 
> by downing this VM's network ||
> | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
> leadership |
> | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
> leader | Still down |
> | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
> Still down |
> | 5 | Leader | Follower | Follower | Follower | Still down |
> | 6 | Leader | Follower | Follower | Follower | Comes back up |
> | 7 | Leader | Follower | Follower | Follower | Follower |
> | 8 || Partitioned in the same way as Master 5 | Follower | Follower | 
> Follower | Follower |
> | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
> Follower | Follower |
> | 10 | Still down | Performs consensus | Replies to leader | Replies to 
> leader || Doesn't get the message! ||
> | 11 | Still down | Performs writing | Acks to leader | Acks to leader || 
> Acks to leader ||
> | 12 | Still down | Leader | Follower | Follower | Follower |
> Master 2 sends a series of messages to the recently-restarted Master 5.  The 
> first message is dropped, but subsequent messages are not dropped.
> This appears to be due to a stale link between the masters.  Before leader 
> election, the replicated log actors create a network watcher, which adds 
> links to masters that join the ZK group:
> https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159
> This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
> perhaps due to how the network partition was induced (in the hypervisor 
> layer, rather than in the VM itself).
> When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
> observe the [expected log 
> message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]
> Instead, we see a log line in Master 2:
> {code}
> process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
> not connected
> {code}
> The broken link is removed by the libprocess {{socket_manager}} and the 
> following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new 
> socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition

2016-06-10 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5576:
-
  Sprint: Mesosphere Sprint 37
Story Points: 5

> Masters may drop the first message they send between masters after a network 
> partition
> --
>
> Key: MESOS-5576
> URL: https://issues.apache.org/jira/browse/MESOS-5576
> Project: Mesos
>  Issue Type: Bug
>  Components: leader election, master, replicated log
>Affects Versions: 0.28.2
> Environment: Observed in an OpenStack environment where each master 
> lives on a separate VM.
>Reporter: Joseph Wu
>  Labels: mesosphere
>
> We observed the following situation in a cluster of five masters:
> || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
> | 0 | Follower | Follower | Follower | Follower | Leader |
> | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster 
> by downing this VM's network ||
> | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
> leadership |
> | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
> leader | Still down |
> | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
> Still down |
> | 5 | Leader | Follower | Follower | Follower | Still down |
> | 6 | Leader | Follower | Follower | Follower | Comes back up |
> | 7 | Leader | Follower | Follower | Follower | Follower |
> | 8 || Partitioned in the same way as Master 5 | Follower | Follower | 
> Follower | Follower |
> | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
> Follower | Follower |
> | 10 | Still down | Performs consensus | Replies to leader | Replies to 
> leader || Doesn't get the message! ||
> | 11 | Still down | Performs writing | Acks to leader | Acks to leader || 
> Acks to leader ||
> | 12 | Still down | Leader | Follower | Follower | Follower |
> Master 2 sends a series of messages to the recently-restarted Master 5.  The 
> first message is dropped, but subsequent messages are not dropped.
> This appears to be due to a stale link between the masters.  Before leader 
> election, the replicated log actors create a network watcher, which adds 
> links to masters that join the ZK group:
> https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159
> This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
> perhaps due to how the network partition was induced (in the hypervisor 
> layer, rather than in the VM itself).
> When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
> observe the [expected log 
> message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]
> Instead, we see a log line in Master 2:
> {code}
> process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
> not connected
> {code}
> The broken link is removed by the libprocess {{socket_manager}} and the 
> following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new 
> socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition

2016-06-08 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5576:
-
Description: 
We observed the following situation in a cluster of five masters:
|| Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
| 0 | Follower | Follower | Follower | Follower | Leader |
| 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by 
downing this VM's network ||
| 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
leadership |
| 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
leader | Still down |
| 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
Still down |
| 5 | Leader | Follower | Follower | Follower | Still down |
| 6 | Leader | Follower | Follower | Follower | Comes back up |
| 7 | Leader | Follower | Follower | Follower | Follower |
| 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower 
| Follower |
| 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
Follower | Follower |
| 10 | Still down | Performs consensus | Replies to leader | Replies to leader 
|| Doesn't get the message! ||
| 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks 
to leader ||
| 12 | Still down | Leader | Follower | Follower | Follower |

Master 2 sends a series of messages to the recently-restarted Master 5.  The 
first message is dropped, but subsequent messages are not dropped.

This appears to be due to a stale link between the masters.  Before leader 
election, the replicated log actors create a network watcher, which adds links 
to masters that join the ZK group:
https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159

This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
perhaps due to how the network partition was induced (in the hypervisor layer, 
rather than in the VM itself).

When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
observe the [expected log 
message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]

Instead, we see a log line in Master 2:
{code}
process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
not connected
{code}

The broken link is removed by the libprocess {{socket_manager}} and the 
following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new socket.

  was:
We observed the following situation in a cluster of five masters:
|| Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
| 0 | Follower | Follower | Follower | Follower | Leader |
| 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by 
downing this VM's network ||
| 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
leadership |
| 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
leader | Still down |
| 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
Still down |
| 5 | Leader | Follower | Follower | Follower | Still down |
| 6 | Leader | Follower | Follower | Follower | Comes back up |
| 7 | Leader | Follower | Follower | Follower | Follower |
| 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower 
| Follower |
| 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
Follower | Follower |
| 10 | Still down | Performs consensus | Replies to leader | Replies to leader 
|| Doesn't get the message! ||
| 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks 
to leader ||
| 12 | Still down | Leader | Follower | Follower | Follower |

Master 1 sends a series of messages to the recently-restarted Master 5.  The 
first message is dropped, but subsequent messages are not dropped.

This appears to be due to a stale link between the masters.  Before leader 
election, the replicated log actors create a network watcher, which adds links 
to masters that join the ZK group:
https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159

This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
perhaps due to how the network partition was induced (in the hypervisor layer, 
rather than in the VM itself).

When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
observe the [expected log 
message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]

Instead, we see a log line in Master 2:
{code}
process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
not connected
{code}

The broken link is removed by the libprocess {{socket_manager}} and the 
following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new socket.


> Masters 

[jira] [Updated] (MESOS-5576) Masters may drop the first message they send between masters after a network partition

2016-06-08 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5576:
-
Description: 
We observed the following situation in a cluster of five masters:
|| Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
| 0 | Follower | Follower | Follower | Follower | Leader |
| 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by 
downing this VM's network ||
| 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
leadership |
| 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
leader | Still down |
| 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
Still down |
| 5 | Leader | Follower | Follower | Follower | Still down |
| 6 | Leader | Follower | Follower | Follower | Comes back up |
| 7 | Leader | Follower | Follower | Follower | Follower |
| 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower 
| Follower |
| 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
Follower | Follower |
| 10 | Still down | Performs consensus | Replies to leader | Replies to leader 
|| Doesn't get the message! ||
| 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks 
to leader ||
| 12 | Still down | Leader | Follower | Follower | Follower |

Master 1 sends a series of messages to the recently-restarted Master 5.  The 
first message is dropped, but subsequent messages are not dropped.

This appears to be due to a stale link between the masters.  Before leader 
election, the replicated log actors create a network watcher, which adds links 
to masters that join the ZK group:
https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159

This link does not appear to break (Master 2 -> 5) when Master 5 goes down, 
perhaps due to how the network partition was induced (in the hypervisor layer, 
rather than in the VM itself).

When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not 
observe the [expected log 
message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]

Instead, we see a log line in Master 2:
{code}
process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
not connected
{code}

The broken link is removed by the libprocess {{socket_manager}} and the 
following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new socket.

  was:
We observed the following situation in a cluster of five masters:
|| Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
| 0 | Follower | Follower | Follower | Follower | Leader |
| 1 | Follower | Follower | Follower | Follower || Partitioned from cluster by 
downing this VM's network ||
| 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost 
leadership |
| 3 | Performs consensus | Replies to leader | Replies to leader | Replies to 
leader | Still down |
| 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader | 
Still down |
| 5 | Leader | Follower | Follower | Follower | Still down |
| 6 | Leader | Follower | Follower | Follower | Comes back up |
| 7 | Leader | Follower | Follower | Follower | Follower |
| 8 || Partitioned in the same way as Master 5 | Follower | Follower | Follower 
| Follower |
| 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower | 
Follower | Follower |
| 10 | Still down | Performs consensus | Replies to leader | Replies to leader 
|| Doesn't get the message! ||
| 11 | Still down | Performs writing | Acks to leader | Acks to leader || Acks 
to leader ||
| 12 | Still down | Leader | Follower | Follower | Follower |

Master 1 sends a series of messages to the recently-restarted Master 5.  The 
first message is dropped, but subsequent messages are not dropped.

This appears to be due to a stale link between the masters.  Before leader 
election, the replicated log actors create a network watcher, which adds links 
to masters that join the ZK group:
https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159

This link does not appear to break (Master 1 -> 5) when Master 5 goes down, 
perhaps due to how the network partition was induced (in the hypervisor layer, 
rather than in the VM itself).

When Master 1 tries to send an {{PromiseRequest}} to Master 5, we do not 
observe the [expected log 
message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]

Instead, we see a log line in Master 1:
{code}
process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is 
not connected
{code}

The broken link is removed by the libprocess {{socket_manager}} and the 
following {{WriteRequest}} from Master 1 to Master 5 succeeds via a new socket.


> Masters