Re: [akka-user] Re: Understanding 'Leader can currently not perform its duties' message

2017-04-04 Thread Patrik Nordwall
I'm pretty sure that is not the case. If you create a minimized
example/reproducer I will take a look. I prefer that you base it on the
SimpleClusterApp example:
scala:
https://github.com/akka/akka-samples/blob/master/akka-sample-cluster-scala/src/main/scala/sample/cluster/simple/SimpleClusterApp.scala
or java:
https://github.com/akka/akka-samples/blob/master/akka-sample-cluster-java/src/main/java/sample/cluster/simple/SimpleClusterApp.java


On Tue, Apr 4, 2017 at 1:48 PM, Unmesh Joshi  wrote:

> It looks like logic of identifying new incarnation based on ip/port and
> downing previous incarnation of actorsystem happens only on seed nodes and
> not on members. So if I crash seed node and restart it again, the
> membership list will have the new incarnation of the seed in its membership
> list, but it wont down the old incarnation.
>
> On Tuesday, 4 April 2017 16:09:49 UTC+5:30, Patrik Nordwall wrote:
>>
>> No, there is no majority decision here. Perhaps you don't join the right
>> node. You should have both in seed-nodes and in same order. It should be
>> clear from info level logging what is going on.
>> tis 4 apr. 2017 kl. 12:26 skrev Unmesh Joshi :
>>
>>> Curiously, my observation is that if instead of two, I have four node
>>> cluster and crash/restart a node with same host/port, I do not get this
>>> warning. I get this only on two node cluster, which made me think that
>>> there is majority needed to mark the new incarnation as 'seen' and then
>>> down the previous incarnation.
>>>
>>>
>>> On Tuesday, 4 April 2017 15:17:45 UTC+5:30, Patrik Nordwall wrote:
>>>
 One way of downing is to join again with same host:port. That will
 trigger downing of previous incarnation and when removal is done the new
 incarnation can join by trying to join again. The seed-nodes joining will
 retry the joining automatically.

 However, sooner or later you will need a real downing strategy.

 /Patrik

>>> tis 4 apr. 2017 kl. 10:31 skrev 'Michal Borowiecki' via Akka User List <
 akka...@googlegroups.com>:

>>> Indeed. This is the relevant bit of docs I believe (
> http://doc.akka.io/docs/akka/2.4.17/common/cluster.html#Membership):
>
> The node identifier internally also contains a UID that uniquely
> identifies this actor system instance at that hostname:port. Akka
> uses the UID to be able to reliably trigger remote death watch. This means
> that the same actor system can never join a cluster again once it's been
> removed from that cluster. To re-join an actor system with the same
> hostname:port to a cluster you have to stop the actor system and
> start a new one with the same hostname:portwhich will then receive a
> different  UID.
>
> After re-starting the node it will get a new UID and will be
> considered a new member.
>
> The unreachable member (the previous incarnation of your node) needs
> to be downed first for new members to be admitted by the leader.
>
> Cheers,
>
> Michal
>

> On 04/04/17 08:58, Viktor Klang wrote:
>
 No, it needs to be Downed.
>
>
> On Tue, Apr 4, 2017 at 9:50 AM, Unmesh Joshi 
> wrote:
>
>> Hi,
>>
>> If I restart the crashed node on same host and port, it should be
>> reachable now and consensus should be reached isnt it?
>>
>> Thanks,
>> Unmesh
>>
>> On Tuesday, 4 April 2017 13:09:22 UTC+5:30, Michal Borowiecki wrote:
>>>
>>> Hi Unmesh,
>>>
>>> AFAIK, the crashed node has to be downed (whether manually or
>>> automatically) for the cluster to reach convergence.
>>>
>>> Only once there are no unreachable nodes observed by any member can
>>> the leader resume it's duties and allow the new member (your re-started
>>> instance) to join.
>>>
>>> For testing & dev, you can use auto-downing. For production you need
>>> to choose a more resilient approach I'm afraid, as out of the box
>>> auto-downing doesn't provide a way to address the split-brain-problem 
>>> which
>>> most likely would bite you in a real life environment sooner or later.
>>>
>>> Cheers,
>>>
>>> Michal
>>>
>>> On 04/04/17 08:31, Unmesh Joshi wrote:
>>>
>>> Is it possibly because in a two node cluster, there can never be
>>> majority ( > 50%) nodes agreeing on membership to mark a node as 'seen'?
>>>
>>> On Tuesday, 4 April 2017 12:46:17 UTC+5:30, Unmesh Joshi wrote:

 Hi,

 I have a two node cluster in a cluster. If I crash one of the nodes
 (*10.131.22.26:3552 ), *and bring it up
 again, I start getting following messages from other nodes.  Now that 
 the
 node is reachable and there are only two nodes in the cluster, why 
 should
 it give following message 

Re: [akka-user] Re: Understanding 'Leader can currently not perform its duties' message

2017-04-04 Thread Unmesh Joshi
It looks like logic of identifying new incarnation based on ip/port and 
downing previous incarnation of actorsystem happens only on seed nodes and 
not on members. So if I crash seed node and restart it again, the 
membership list will have the new incarnation of the seed in its membership 
list, but it wont down the old incarnation.

On Tuesday, 4 April 2017 16:09:49 UTC+5:30, Patrik Nordwall wrote:
>
> No, there is no majority decision here. Perhaps you don't join the right 
> node. You should have both in seed-nodes and in same order. It should be 
> clear from info level logging what is going on.
> tis 4 apr. 2017 kl. 12:26 skrev Unmesh Joshi  >:
>
>> Curiously, my observation is that if instead of two, I have four node 
>> cluster and crash/restart a node with same host/port, I do not get this 
>> warning. I get this only on two node cluster, which made me think that 
>> there is majority needed to mark the new incarnation as 'seen' and then 
>> down the previous incarnation. 
>>
>>
>> On Tuesday, 4 April 2017 15:17:45 UTC+5:30, Patrik Nordwall wrote:
>>
>>> One way of downing is to join again with same host:port. That will 
>>> trigger downing of previous incarnation and when removal is done the new 
>>> incarnation can join by trying to join again. The seed-nodes joining will 
>>> retry the joining automatically.
>>>
>>> However, sooner or later you will need a real downing strategy.
>>>
>>> /Patrik
>>>
>> tis 4 apr. 2017 kl. 10:31 skrev 'Michal Borowiecki' via Akka User List <
>>> akka...@googlegroups.com>:
>>>
>> Indeed. This is the relevant bit of docs I believe (
 http://doc.akka.io/docs/akka/2.4.17/common/cluster.html#Membership):

 The node identifier internally also contains a UID that uniquely 
 identifies this actor system instance at that hostname:port. Akka uses 
 the UID to be able to reliably trigger remote death watch. This means that 
 the same actor system can never join a cluster again once it's been 
 removed 
 from that cluster. To re-join an actor system with the same 
 hostname:port to a cluster you have to stop the actor system and start 
 a new one with the same hostname:portwhich will then receive a 
 different  UID.

 After re-starting the node it will get a new UID and will be considered 
 a new member.

 The unreachable member (the previous incarnation of your node) needs to 
 be downed first for new members to be admitted by the leader.

 Cheers,

 Michal

>>>
 On 04/04/17 08:58, Viktor Klang wrote:

>>> No, it needs to be Downed.


 On Tue, Apr 4, 2017 at 9:50 AM, Unmesh Joshi  
 wrote:

> Hi, 
>
> If I restart the crashed node on same host and port, it should be 
> reachable now and consensus should be reached isnt it?
>
> Thanks,
> Unmesh
>
> On Tuesday, 4 April 2017 13:09:22 UTC+5:30, Michal Borowiecki wrote: 
>>
>> Hi Unmesh,
>>
>> AFAIK, the crashed node has to be downed (whether manually or 
>> automatically) for the cluster to reach convergence.
>>
>> Only once there are no unreachable nodes observed by any member can 
>> the leader resume it's duties and allow the new member (your re-started 
>> instance) to join.
>>
>> For testing & dev, you can use auto-downing. For production you need 
>> to choose a more resilient approach I'm afraid, as out of the box 
>> auto-downing doesn't provide a way to address the split-brain-problem 
>> which 
>> most likely would bite you in a real life environment sooner or later.
>>
>> Cheers,
>>
>> Michal
>>
>> On 04/04/17 08:31, Unmesh Joshi wrote:
>>
>> Is it possibly because in a two node cluster, there can never be 
>> majority ( > 50%) nodes agreeing on membership to mark a node as 'seen'? 
>>
>> On Tuesday, 4 April 2017 12:46:17 UTC+5:30, Unmesh Joshi wrote: 
>>>
>>> Hi,
>>>
>>> I have a two node cluster in a cluster. If I crash one of the nodes 
>>> (*10.131.22.26:3552 
>>> ), *and bring it up again, I start 
>>> getting following messages from other nodes.  Now that the node is 
>>> reachable and there are only two nodes in the cluster, why should it 
>>> give 
>>> following message with seen=false for 1*0.131.22.26:3552 
>>> ? *
>>> For members to be seen, is there any other configuration that needs 
>>> to be tuned?
>>>
>>>
>>> [INFO] [04/04/2017 12:38:49.623] 
>>> [csw-cluster-akka.actor.default-dispatcher-14] 
>>> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
>>> csw-cluster@10.131.22.26:41574] - Leader can currently not perform 
>>> its duties, reachability status: [akka.tcp://
>>> csw-cluster@10.131.22.26:41574 -> akka.tcp://
>>> csw-cluster@10.131.22.26:3552: 

Re: [akka-user] Re: Understanding 'Leader can currently not perform its duties' message

2017-04-04 Thread Patrik Nordwall
No, there is no majority decision here. Perhaps you don't join the right
node. You should have both in seed-nodes and in same order. It should be
clear from info level logging what is going on.
tis 4 apr. 2017 kl. 12:26 skrev Unmesh Joshi :

> Curiously, my observation is that if instead of two, I have four node
> cluster and crash/restart a node with same host/port, I do not get this
> warning. I get this only on two node cluster, which made me think that
> there is majority needed to mark the new incarnation as 'seen' and then
> down the previous incarnation.
>
>
> On Tuesday, 4 April 2017 15:17:45 UTC+5:30, Patrik Nordwall wrote:
>
> One way of downing is to join again with same host:port. That will trigger
> downing of previous incarnation and when removal is done the new
> incarnation can join by trying to join again. The seed-nodes joining will
> retry the joining automatically.
>
> However, sooner or later you will need a real downing strategy.
>
> /Patrik
>
> tis 4 apr. 2017 kl. 10:31 skrev 'Michal Borowiecki' via Akka User List <
> akka...@googlegroups.com>:
>
> Indeed. This is the relevant bit of docs I believe (
> http://doc.akka.io/docs/akka/2.4.17/common/cluster.html#Membership):
>
> The node identifier internally also contains a UID that uniquely
> identifies this actor system instance at that hostname:port. Akka uses
> the UID to be able to reliably trigger remote death watch. This means that
> the same actor system can never join a cluster again once it's been removed
> from that cluster. To re-join an actor system with the same hostname:port to
> a cluster you have to stop the actor system and start a new one with the
> same hostname:portwhich will then receive a different  UID.
>
> After re-starting the node it will get a new UID and will be considered a
> new member.
>
> The unreachable member (the previous incarnation of your node) needs to be
> downed first for new members to be admitted by the leader.
>
> Cheers,
>
> Michal
>
>
> On 04/04/17 08:58, Viktor Klang wrote:
>
> No, it needs to be Downed.
>
>
> On Tue, Apr 4, 2017 at 9:50 AM, Unmesh Joshi  wrote:
>
> Hi,
>
> If I restart the crashed node on same host and port, it should be
> reachable now and consensus should be reached isnt it?
>
> Thanks,
> Unmesh
>
> On Tuesday, 4 April 2017 13:09:22 UTC+5:30, Michal Borowiecki wrote:
>
> Hi Unmesh,
>
> AFAIK, the crashed node has to be downed (whether manually or
> automatically) for the cluster to reach convergence.
>
> Only once there are no unreachable nodes observed by any member can the
> leader resume it's duties and allow the new member (your re-started
> instance) to join.
>
> For testing & dev, you can use auto-downing. For production you need to
> choose a more resilient approach I'm afraid, as out of the box auto-downing
> doesn't provide a way to address the split-brain-problem which most likely
> would bite you in a real life environment sooner or later.
>
> Cheers,
>
> Michal
>
> On 04/04/17 08:31, Unmesh Joshi wrote:
>
> Is it possibly because in a two node cluster, there can never be majority
> ( > 50%) nodes agreeing on membership to mark a node as 'seen'?
>
> On Tuesday, 4 April 2017 12:46:17 UTC+5:30, Unmesh Joshi wrote:
>
> Hi,
>
> I have a two node cluster in a cluster. If I crash one of the nodes 
> (*10.131.22.26:3552
> ), *and bring it up again, I start getting
> following messages from other nodes.  Now that the node is reachable and
> there are only two nodes in the cluster, why should it give following
> message with seen=false for 1*0.131.22.26:3552
> ? *
> For members to be seen, is there any other configuration that needs to be
> tuned?
>
>
> [INFO] [04/04/2017 12:38:49.623]
> [csw-cluster-akka.actor.default-dispatcher-14]
> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its
> duties, reachability status: [akka.tcp://csw-cluster@10.131.22.26:41574
> -> akka.tcp://csw-cluster@10.131.22.26:3552: Unreachable [Unreachable]
> (1)], member status: [akka.tcp://csw-cluster@10.131.22.26:3552 Up
> seen=false, akka.tcp://csw-cluster@10.131.22.26:41574 Up seen=true]
> [INFO] [04/04/2017 12:39:49.634]
> [csw-cluster-akka.actor.default-dispatcher-2]
> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its
> duties, reachability status: [akka.tcp://csw-cluster@10.131.22.26:41574
> -> akka.tcp://csw-cluster@10.131.22.26:3552: Unreachable [Unreachable]
> (1)], member status:* [akka.tcp://csw-cluster@10.131.22.26:3552
>  Up seen=false*, akka.tcp://
> csw-cluster@10.131.22.26:41574 Up seen=true]
> [INFO] [04/04/2017 12:40:49.632]
> [csw-cluster-akka.actor.default-dispatcher-17]
> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
> 

Re: [akka-user] Re: Understanding 'Leader can currently not perform its duties' message

2017-04-04 Thread Unmesh Joshi
Curiously, my observation is that if instead of two, I have four node 
cluster and crash/restart a node with same host/port, I do not get this 
warning. I get this only on two node cluster, which made me think that 
there is majority needed to mark the new incarnation as 'seen' and then 
down the previous incarnation. 

On Tuesday, 4 April 2017 15:17:45 UTC+5:30, Patrik Nordwall wrote:
>
> One way of downing is to join again with same host:port. That will trigger 
> downing of previous incarnation and when removal is done the new 
> incarnation can join by trying to join again. The seed-nodes joining will 
> retry the joining automatically.
>
> However, sooner or later you will need a real downing strategy.
>
> /Patrik
> tis 4 apr. 2017 kl. 10:31 skrev 'Michal Borowiecki' via Akka User List <
> akka...@googlegroups.com >:
>
>> Indeed. This is the relevant bit of docs I believe (
>> http://doc.akka.io/docs/akka/2.4.17/common/cluster.html#Membership):
>>
>> The node identifier internally also contains a UID that uniquely 
>> identifies this actor system instance at that hostname:port. Akka uses 
>> the UID to be able to reliably trigger remote death watch. This means that 
>> the same actor system can never join a cluster again once it's been removed 
>> from that cluster. To re-join an actor system with the same hostname:port
>>  to a cluster you have to stop the actor system and start a new one with 
>> the same hostname:portwhich will then receive a different  UID.
>>
>> After re-starting the node it will get a new UID and will be considered a 
>> new member.
>>
>> The unreachable member (the previous incarnation of your node) needs to 
>> be downed first for new members to be admitted by the leader.
>>
>> Cheers,
>>
>> Michal
>>
>> On 04/04/17 08:58, Viktor Klang wrote:
>>
>> No, it needs to be Downed.
>>
>> On Tue, Apr 4, 2017 at 9:50 AM, Unmesh Joshi > > wrote:
>>
>>> Hi, 
>>>
>>> If I restart the crashed node on same host and port, it should be 
>>> reachable now and consensus should be reached isnt it?
>>>
>>> Thanks,
>>> Unmesh
>>>
>>> On Tuesday, 4 April 2017 13:09:22 UTC+5:30, Michal Borowiecki wrote: 

 Hi Unmesh,

 AFAIK, the crashed node has to be downed (whether manually or 
 automatically) for the cluster to reach convergence.

 Only once there are no unreachable nodes observed by any member can the 
 leader resume it's duties and allow the new member (your re-started 
 instance) to join.

 For testing & dev, you can use auto-downing. For production you need to 
 choose a more resilient approach I'm afraid, as out of the box 
 auto-downing 
 doesn't provide a way to address the split-brain-problem which most likely 
 would bite you in a real life environment sooner or later.

 Cheers,

 Michal

 On 04/04/17 08:31, Unmesh Joshi wrote:

 Is it possibly because in a two node cluster, there can never be 
 majority ( > 50%) nodes agreeing on membership to mark a node as 'seen'? 

 On Tuesday, 4 April 2017 12:46:17 UTC+5:30, Unmesh Joshi wrote: 
>
> Hi,
>
> I have a two node cluster in a cluster. If I crash one of the nodes 
> (*10.131.22.26:3552 
> ), *and bring it up again, I start getting 
> following messages from other nodes.  Now that the node is reachable and 
> there are only two nodes in the cluster, why should it give following 
> message with seen=false for 1*0.131.22.26:3552 
> ? *
> For members to be seen, is there any other configuration that needs to 
> be tuned?
>
>
> [INFO] [04/04/2017 12:38:49.623] 
> [csw-cluster-akka.actor.default-dispatcher-14] 
> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
> csw-cluster@10.131.22.26:41574] - Leader can currently not perform 
> its duties, reachability status: [akka.tcp://
> csw-cluster@10.131.22.26:41574 -> akka.tcp://
> csw-cluster@10.131.22.26:3552: Unreachable [Unreachable] (1)], member 
> status: [akka.tcp://csw-cluster@10.131.22.26:3552 Up seen=false, 
> akka.tcp://csw-cluster@10.131.22.26:41574 Up seen=true]
> [INFO] [04/04/2017 12:39:49.634] 
> [csw-cluster-akka.actor.default-dispatcher-2] 
> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
> csw-cluster@10.131.22.26:41574] - Leader can currently not perform 
> its duties, reachability status: [akka.tcp://
> csw-cluster@10.131.22.26:41574 -> akka.tcp://
> csw-cluster@10.131.22.26:3552: Unreachable [Unreachable] (1)], member 
> status:* [akka.tcp://csw-cluster@10.131.22.26:3552 
>  Up seen=false*, akka.tcp://
> csw-cluster@10.131.22.26:41574 Up seen=true]
> [INFO] [04/04/2017 12:40:49.632] 
> [csw-cluster-akka.actor.default-dispatcher-17] 
> [akka.cluster.Cluster(akka://csw-cluster)] 

Re: [akka-user] Re: Understanding 'Leader can currently not perform its duties' message

2017-04-04 Thread Patrik Nordwall
One way of downing is to join again with same host:port. That will trigger
downing of previous incarnation and when removal is done the new
incarnation can join by trying to join again. The seed-nodes joining will
retry the joining automatically.

However, sooner or later you will need a real downing strategy.

/Patrik
tis 4 apr. 2017 kl. 10:31 skrev 'Michal Borowiecki' via Akka User List <
akka-user@googlegroups.com>:

> Indeed. This is the relevant bit of docs I believe (
> http://doc.akka.io/docs/akka/2.4.17/common/cluster.html#Membership):
>
> The node identifier internally also contains a UID that uniquely
> identifies this actor system instance at that hostname:port. Akka uses
> the UID to be able to reliably trigger remote death watch. This means that
> the same actor system can never join a cluster again once it's been removed
> from that cluster. To re-join an actor system with the same hostname:port to
> a cluster you have to stop the actor system and start a new one with the
> same hostname:portwhich will then receive a different  UID.
>
> After re-starting the node it will get a new UID and will be considered a
> new member.
>
> The unreachable member (the previous incarnation of your node) needs to be
> downed first for new members to be admitted by the leader.
>
> Cheers,
>
> Michal
>
> On 04/04/17 08:58, Viktor Klang wrote:
>
> No, it needs to be Downed.
>
> On Tue, Apr 4, 2017 at 9:50 AM, Unmesh Joshi 
> wrote:
>
> Hi,
>
> If I restart the crashed node on same host and port, it should be
> reachable now and consensus should be reached isnt it?
>
> Thanks,
> Unmesh
>
> On Tuesday, 4 April 2017 13:09:22 UTC+5:30, Michal Borowiecki wrote:
>
> Hi Unmesh,
>
> AFAIK, the crashed node has to be downed (whether manually or
> automatically) for the cluster to reach convergence.
>
> Only once there are no unreachable nodes observed by any member can the
> leader resume it's duties and allow the new member (your re-started
> instance) to join.
>
> For testing & dev, you can use auto-downing. For production you need to
> choose a more resilient approach I'm afraid, as out of the box auto-downing
> doesn't provide a way to address the split-brain-problem which most likely
> would bite you in a real life environment sooner or later.
>
> Cheers,
>
> Michal
>
> On 04/04/17 08:31, Unmesh Joshi wrote:
>
> Is it possibly because in a two node cluster, there can never be majority
> ( > 50%) nodes agreeing on membership to mark a node as 'seen'?
>
> On Tuesday, 4 April 2017 12:46:17 UTC+5:30, Unmesh Joshi wrote:
>
> Hi,
>
> I have a two node cluster in a cluster. If I crash one of the nodes 
> (*10.131.22.26:3552
> ), *and bring it up again, I start getting
> following messages from other nodes.  Now that the node is reachable and
> there are only two nodes in the cluster, why should it give following
> message with seen=false for 1*0.131.22.26:3552
> ? *
> For members to be seen, is there any other configuration that needs to be
> tuned?
>
>
> [INFO] [04/04/2017 12:38:49.623]
> [csw-cluster-akka.actor.default-dispatcher-14]
> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its
> duties, reachability status: [akka.tcp://csw-cluster@10.131.22.26:41574
> -> akka.tcp://csw-cluster@10.131.22.26:3552: Unreachable [Unreachable]
> (1)], member status: [akka.tcp://csw-cluster@10.131.22.26:3552 Up
> seen=false, akka.tcp://csw-cluster@10.131.22.26:41574 Up seen=true]
> [INFO] [04/04/2017 12:39:49.634]
> [csw-cluster-akka.actor.default-dispatcher-2]
> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its
> duties, reachability status: [akka.tcp://csw-cluster@10.131.22.26:41574
> -> akka.tcp://csw-cluster@10.131.22.26:3552: Unreachable [Unreachable]
> (1)], member status:* [akka.tcp://csw-cluster@10.131.22.26:3552
>  Up seen=false*, akka.tcp://
> csw-cluster@10.131.22.26:41574 Up seen=true]
> [INFO] [04/04/2017 12:40:49.632]
> [csw-cluster-akka.actor.default-dispatcher-17]
> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its
> duties, reachability status: [akka.t
>
>
>
> Thanks,
> Unmesh
>
> --
> >> Read the docs: http://akka.io/docs/
> >> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+...@googlegroups.com.
> To post to this group, send email to akka...@googlegroups.com.
> Visit this group at 

Re: [akka-user] Re: Understanding 'Leader can currently not perform its duties' message

2017-04-04 Thread 'Michal Borowiecki' via Akka User List
Indeed. This is the relevant bit of docs I believe 
(http://doc.akka.io/docs/akka/2.4.17/common/cluster.html#Membership):


The node identifier internally also contains a UID that uniquely 
identifies this actor system instance at thathostname:port. Akka uses 
the UID to be able to reliably trigger remote death watch. This means 
that the same actor system can never join a cluster again once it's 
been removed from that cluster. To re-join an actor system with the 
samehostname:portto a cluster you have to stop the actor system and 
start a new one with the samehostname:portwhich will then receive a 
differentUID.
After re-starting the node it will get a new UID and will be considered 
a new member.


The unreachable member (the previous incarnation of your node) needs to 
be downed first for new members to be admitted by the leader.


Cheers,

Michal


On 04/04/17 08:58, Viktor Klang wrote:

No, it needs to be Downed.

On Tue, Apr 4, 2017 at 9:50 AM, Unmesh Joshi > wrote:


Hi,

If I restart the crashed node on same host and port, it should be
reachable now and consensus should be reached isnt it?

Thanks,
Unmesh

On Tuesday, 4 April 2017 13:09:22 UTC+5:30, Michal Borowiecki wrote:

Hi Unmesh,

AFAIK, the crashed node has to be downed (whether manually or
automatically) for the cluster to reach convergence.

Only once there are no unreachable nodes observed by any
member can the leader resume it's duties and allow the new
member (your re-started instance) to join.

For testing & dev, you can use auto-downing. For production
you need to choose a more resilient approach I'm afraid, as
out of the box auto-downing doesn't provide a way to address
the split-brain-problem which most likely would bite you in a
real life environment sooner or later.

Cheers,

Michal


On 04/04/17 08:31, Unmesh Joshi wrote:

Is it possibly because in a two node cluster, there can never
be majority ( > 50%) nodes agreeing on membership to mark a
node as 'seen'?

On Tuesday, 4 April 2017 12:46:17 UTC+5:30, Unmesh Joshi wrote:

Hi,

I have a two node cluster in a cluster. If I crash one of
the nodes (*10.131.22.26:3552
), *and bring it up again, I
start getting following messages from other nodes.  Now
that the node is reachable and there are only two nodes
in the cluster, why should it give following message with
seen=false for 1*0.131.22.26:3552
? *
For members to be seen, is there any other configuration
that needs to be tuned?


[INFO] [04/04/2017 12:38:49.623]
[csw-cluster-akka.actor.default-dispatcher-14]
[akka.cluster.Cluster(akka://csw-cluster)] Cluster
Node [akka.tcp://csw-cluster@10.131.22.26:41574
] - Leader can
currently not perform its duties, reachability
status: [akka.tcp://csw-cluster@10.131.22.26:41574
 ->
akka.tcp://csw-cluster@10.131.22.26:3552
: Unreachable
[Unreachable] (1)], member status:
[akka.tcp://csw-cluster@10.131.22.26:3552
 Up seen=false,
akka.tcp://csw-cluster@10.131.22.26:41574
 Up seen=true]
[INFO] [04/04/2017 12:39:49.634]
[csw-cluster-akka.actor.default-dispatcher-2]
[akka.cluster.Cluster(akka://csw-cluster)] Cluster
Node [akka.tcp://csw-cluster@10.131.22.26:41574
] - Leader can
currently not perform its duties, reachability
status: [akka.tcp://csw-cluster@10.131.22.26:41574
 ->
akka.tcp://csw-cluster@10.131.22.26:3552
: Unreachable
[Unreachable] (1)], member
status:*[akka.tcp://csw-cluster@10.131.22.26:3552
 Up
seen=false*,
akka.tcp://csw-cluster@10.131.22.26:41574
 Up seen=true]
[INFO] [04/04/2017 12:40:49.632]
[csw-cluster-akka.actor.default-dispatcher-17]
[akka.cluster.Cluster(akka://csw-cluster)] Cluster
Node [akka.tcp://csw-cluster@10.131.22.26:41574

Re: [akka-user] Re: Understanding 'Leader can currently not perform its duties' message

2017-04-04 Thread Viktor Klang
No, it needs to be Downed.

On Tue, Apr 4, 2017 at 9:50 AM, Unmesh Joshi  wrote:

> Hi,
>
> If I restart the crashed node on same host and port, it should be
> reachable now and consensus should be reached isnt it?
>
> Thanks,
> Unmesh
>
> On Tuesday, 4 April 2017 13:09:22 UTC+5:30, Michal Borowiecki wrote:
>>
>> Hi Unmesh,
>>
>> AFAIK, the crashed node has to be downed (whether manually or
>> automatically) for the cluster to reach convergence.
>>
>> Only once there are no unreachable nodes observed by any member can the
>> leader resume it's duties and allow the new member (your re-started
>> instance) to join.
>>
>> For testing & dev, you can use auto-downing. For production you need to
>> choose a more resilient approach I'm afraid, as out of the box auto-downing
>> doesn't provide a way to address the split-brain-problem which most likely
>> would bite you in a real life environment sooner or later.
>>
>> Cheers,
>>
>> Michal
>>
>> On 04/04/17 08:31, Unmesh Joshi wrote:
>>
>> Is it possibly because in a two node cluster, there can never be majority
>> ( > 50%) nodes agreeing on membership to mark a node as 'seen'?
>>
>> On Tuesday, 4 April 2017 12:46:17 UTC+5:30, Unmesh Joshi wrote:
>>>
>>> Hi,
>>>
>>> I have a two node cluster in a cluster. If I crash one of the nodes 
>>> (*10.131.22.26:3552
>>> ), *and bring it up again, I start getting
>>> following messages from other nodes.  Now that the node is reachable and
>>> there are only two nodes in the cluster, why should it give following
>>> message with seen=false for 1*0.131.22.26:3552
>>> ? *
>>> For members to be seen, is there any other configuration that needs to
>>> be tuned?
>>>
>>>
>>> [INFO] [04/04/2017 12:38:49.623] 
>>> [csw-cluster-akka.actor.default-dispatcher-14]
>>> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
>>> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its
>>> duties, reachability status: [akka.tcp://csw-cluster@10.131.22.26:41574
>>> -> akka.tcp://csw-cluster@10.131.22.26:3552: Unreachable [Unreachable]
>>> (1)], member status: [akka.tcp://csw-cluster@10.131.22.26:3552 Up
>>> seen=false, akka.tcp://csw-cluster@10.131.22.26:41574 Up seen=true]
>>> [INFO] [04/04/2017 12:39:49.634] 
>>> [csw-cluster-akka.actor.default-dispatcher-2]
>>> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
>>> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its
>>> duties, reachability status: [akka.tcp://csw-cluster@10.131.22.26:41574
>>> -> akka.tcp://csw-cluster@10.131.22.26:3552: Unreachable [Unreachable]
>>> (1)], member status:* [akka.tcp://csw-cluster@10.131.22.26:3552
>>>  Up seen=false*, akka.tcp://
>>> csw-cluster@10.131.22.26:41574 Up seen=true]
>>> [INFO] [04/04/2017 12:40:49.632] 
>>> [csw-cluster-akka.actor.default-dispatcher-17]
>>> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
>>> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its
>>> duties, reachability status: [akka.t
>>>
>>>
>>>
>>> Thanks,
>>> Unmesh
>>>
>> --
>> >> Read the docs: http://akka.io/docs/
>> >> Check the FAQ: http://doc.akka.io/docs/akka/c
>> urrent/additional/faq.html
>> >> Search the archives: https://groups.google.com/group/akka-user
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to akka-user+...@googlegroups.com.
>> To post to this group, send email to akka...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>>  Michal Borowiecki
>> Senior Software Engineer L4
>> T: +44 208 742 1600 <+44%2020%208742%201600>
>>
>>
>> +44 203 249 8448 <+44%2020%203249%208448>
>>
>>
>>
>> E: michal.b...@openbet.com
>> W: www.openbet.com
>> OpenBet Ltd
>>
>> Chiswick Park Building 9
>>
>> 566 Chiswick High Rd
>>
>> London
>>
>> W4 5XT
>>
>> UK
>> 
>> This message is confidential and intended only for the addressee. If you
>> have received this message in error, please immediately notify the
>> ...@openbet.com and delete it from your system as well as any copies.
>> The content of e-mails as well as traffic data may be monitored by OpenBet
>> for employment and security purposes. To protect the environment please do
>> not print this e-mail unless necessary. OpenBet Ltd. Registered Office:
>> Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT, United
>> Kingdom. A company registered in England and Wales. Registered no. 3134634.
>> VAT no. GB927523612
>>
> --
> >> Read the docs: http://akka.io/docs/
> >> Check the FAQ: http://doc.akka.io/docs/akka/
> current/additional/faq.html
> 

Re: [akka-user] Re: Understanding 'Leader can currently not perform its duties' message

2017-04-04 Thread Unmesh Joshi
Hi,

If I restart the crashed node on same host and port, it should be reachable 
now and consensus should be reached isnt it?

Thanks,
Unmesh

On Tuesday, 4 April 2017 13:09:22 UTC+5:30, Michal Borowiecki wrote:
>
> Hi Unmesh,
>
> AFAIK, the crashed node has to be downed (whether manually or 
> automatically) for the cluster to reach convergence.
>
> Only once there are no unreachable nodes observed by any member can the 
> leader resume it's duties and allow the new member (your re-started 
> instance) to join.
>
> For testing & dev, you can use auto-downing. For production you need to 
> choose a more resilient approach I'm afraid, as out of the box auto-downing 
> doesn't provide a way to address the split-brain-problem which most likely 
> would bite you in a real life environment sooner or later.
>
> Cheers,
>
> Michal
>
> On 04/04/17 08:31, Unmesh Joshi wrote:
>
> Is it possibly because in a two node cluster, there can never be majority 
> ( > 50%) nodes agreeing on membership to mark a node as 'seen'? 
>
> On Tuesday, 4 April 2017 12:46:17 UTC+5:30, Unmesh Joshi wrote: 
>>
>> Hi,
>>
>> I have a two node cluster in a cluster. If I crash one of the nodes 
>> (*10.131.22.26:3552 
>> ), *and bring it up again, I start getting 
>> following messages from other nodes.  Now that the node is reachable and 
>> there are only two nodes in the cluster, why should it give following 
>> message with seen=false for 1*0.131.22.26:3552 
>> ? *
>> For members to be seen, is there any other configuration that needs to be 
>> tuned?
>>
>>
>> [INFO] [04/04/2017 12:38:49.623] 
>> [csw-cluster-akka.actor.default-dispatcher-14] 
>> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
>> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its 
>> duties, reachability status: [akka.tcp://csw-cluster@10.131.22.26:41574 
>> -> akka.tcp://csw-cluster@10.131.22.26:3552: Unreachable [Unreachable] 
>> (1)], member status: [akka.tcp://csw-cluster@10.131.22.26:3552 Up 
>> seen=false, akka.tcp://csw-cluster@10.131.22.26:41574 Up seen=true]
>> [INFO] [04/04/2017 12:39:49.634] 
>> [csw-cluster-akka.actor.default-dispatcher-2] 
>> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
>> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its 
>> duties, reachability status: [akka.tcp://csw-cluster@10.131.22.26:41574 
>> -> akka.tcp://csw-cluster@10.131.22.26:3552: Unreachable [Unreachable] 
>> (1)], member status:* [akka.tcp://csw-cluster@10.131.22.26:3552 
>>  Up seen=false*, akka.tcp://
>> csw-cluster@10.131.22.26:41574 Up seen=true]
>> [INFO] [04/04/2017 12:40:49.632] 
>> [csw-cluster-akka.actor.default-dispatcher-17] 
>> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
>> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its 
>> duties, reachability status: [akka.t
>>
>>
>>
>> Thanks,
>> Unmesh 
>>
> -- 
> >> Read the docs: http://akka.io/docs/
> >> Check the FAQ: 
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >> Search the archives: https://groups.google.com/group/akka-user
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to akka-user+...@googlegroups.com .
> To post to this group, send email to akka...@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>
>
> -- 
>  Michal Borowiecki 
> Senior Software Engineer L4 
> T: +44 208 742 1600 
>
>
> +44 203 249 8448 
>
>
>   
> E: michal.b...@openbet.com  
> W: www.openbet.com 
> OpenBet Ltd 
>
> Chiswick Park Building 9 
>
> 566 Chiswick High Rd 
>
> London 
>
> W4 5XT 
>
> UK 
>  
> This message is confidential and intended only for the addressee. If you 
> have received this message in error, please immediately notify the 
> ...@openbet.com  and delete it from your system as well as 
> any copies. The content of e-mails as well as traffic data may be monitored 
> by OpenBet for employment and security purposes. To protect the environment 
> please do not print this e-mail unless necessary. OpenBet Ltd. Registered 
> Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT, 
> United Kingdom. A company registered in England and Wales. Registered no. 
> 3134634. VAT no. GB927523612 
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this 

Re: [akka-user] Re: Understanding 'Leader can currently not perform its duties' message

2017-04-04 Thread 'Michal Borowiecki' via Akka User List

Hi Unmesh,

AFAIK, the crashed node has to be downed (whether manually or 
automatically) for the cluster to reach convergence.


Only once there are no unreachable nodes observed by any member can the 
leader resume it's duties and allow the new member (your re-started 
instance) to join.


For testing & dev, you can use auto-downing. For production you need to 
choose a more resilient approach I'm afraid, as out of the box 
auto-downing doesn't provide a way to address the split-brain-problem 
which most likely would bite you in a real life environment sooner or later.


Cheers,

Michal


On 04/04/17 08:31, Unmesh Joshi wrote:
Is it possibly because in a two node cluster, there can never be 
majority ( > 50%) nodes agreeing on membership to mark a node as 'seen'?


On Tuesday, 4 April 2017 12:46:17 UTC+5:30, Unmesh Joshi wrote:

Hi,

I have a two node cluster in a cluster. If I crash one of the
nodes (*10.131.22.26:3552 ), *and bring
it up again, I start getting following messages from other nodes.
 Now that the node is reachable and there are only two nodes in
the cluster, why should it give following message with seen=false
for 1*0.131.22.26:3552 ? *
For members to be seen, is there any other configuration that
needs to be tuned?


[INFO] [04/04/2017 12:38:49.623]
[csw-cluster-akka.actor.default-dispatcher-14]
[akka.cluster.Cluster(akka://csw-cluster)] Cluster Node
[akka.tcp://csw-cluster@10.131.22.26:41574
] - Leader can
currently not perform its duties, reachability status:
[akka.tcp://csw-cluster@10.131.22.26:41574
 ->
akka.tcp://csw-cluster@10.131.22.26:3552
: Unreachable
[Unreachable] (1)], member status:
[akka.tcp://csw-cluster@10.131.22.26:3552
 Up seen=false,
akka.tcp://csw-cluster@10.131.22.26:41574
 Up seen=true]
[INFO] [04/04/2017 12:39:49.634]
[csw-cluster-akka.actor.default-dispatcher-2]
[akka.cluster.Cluster(akka://csw-cluster)] Cluster Node
[akka.tcp://csw-cluster@10.131.22.26:41574
] - Leader can
currently not perform its duties, reachability status:
[akka.tcp://csw-cluster@10.131.22.26:41574
 ->
akka.tcp://csw-cluster@10.131.22.26:3552
: Unreachable
[Unreachable] (1)], member
status:*[akka.tcp://csw-cluster@10.131.22.26:3552
 Up seen=false*,
akka.tcp://csw-cluster@10.131.22.26:41574
 Up seen=true]
[INFO] [04/04/2017 12:40:49.632]
[csw-cluster-akka.actor.default-dispatcher-17]
[akka.cluster.Cluster(akka://csw-cluster)] Cluster Node
[akka.tcp://csw-cluster@10.131.22.26:41574
] - Leader can
currently not perform its duties, reachability status: [akka.t



Thanks,
Unmesh

--
>> Read the docs: http://akka.io/docs/
>> Check the FAQ: 
http://doc.akka.io/docs/akka/current/additional/faq.html

>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google 
Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to akka-user+unsubscr...@googlegroups.com 
.
To post to this group, send email to akka-user@googlegroups.com 
.

Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


--
Signature
 Michal Borowiecki
Senior Software Engineer L4
T:  +44 208 742 1600


+44 203 249 8448



E:  michal.borowie...@openbet.com
W:  www.openbet.com 


OpenBet Ltd

Chiswick Park Building 9

566 Chiswick High Rd

London

W4 5XT

UK




This message is confidential and intended only for the addressee. If you 
have received this message in error, please immediately notify the 
postmas...@openbet.com  and delete it 
from your system as well as any copies. The content of e-mails as well 
as traffic data may be monitored by OpenBet for employment and security 
purposes. To protect the environment please do not print this e-mail 
unless necessary. OpenBet Ltd. Registered 

[akka-user] Re: Understanding 'Leader can currently not perform its duties' message

2017-04-04 Thread Unmesh Joshi
Is it possibly because in a two node cluster, there can never be majority ( 
> 50%) nodes agreeing on membership to mark a node as 'seen'? 

On Tuesday, 4 April 2017 12:46:17 UTC+5:30, Unmesh Joshi wrote:
>
> Hi,
>
> I have a two node cluster in a cluster. If I crash one of the nodes 
> (*10.131.22.26:3552 
> ), *and bring it up again, I start getting 
> following messages from other nodes.  Now that the node is reachable and 
> there are only two nodes in the cluster, why should it give following 
> message with seen=false for 1*0.131.22.26:3552 
> ? *
> For members to be seen, is there any other configuration that needs to be 
> tuned?
>
>
> [INFO] [04/04/2017 12:38:49.623] 
> [csw-cluster-akka.actor.default-dispatcher-14] 
> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its 
> duties, reachability status: [akka.tcp://csw-cluster@10.131.22.26:41574 
> -> akka.tcp://csw-cluster@10.131.22.26:3552: Unreachable [Unreachable] 
> (1)], member status: [akka.tcp://csw-cluster@10.131.22.26:3552 Up 
> seen=false, akka.tcp://csw-cluster@10.131.22.26:41574 Up seen=true]
> [INFO] [04/04/2017 12:39:49.634] 
> [csw-cluster-akka.actor.default-dispatcher-2] 
> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its 
> duties, reachability status: [akka.tcp://csw-cluster@10.131.22.26:41574 
> -> akka.tcp://csw-cluster@10.131.22.26:3552: Unreachable [Unreachable] 
> (1)], member status:* [akka.tcp://csw-cluster@10.131.22.26:3552 
>  Up seen=false*, akka.tcp://
> csw-cluster@10.131.22.26:41574 Up seen=true]
> [INFO] [04/04/2017 12:40:49.632] 
> [csw-cluster-akka.actor.default-dispatcher-17] 
> [akka.cluster.Cluster(akka://csw-cluster)] Cluster Node [akka.tcp://
> csw-cluster@10.131.22.26:41574] - Leader can currently not perform its 
> duties, reachability status: [akka.t
>
>
>
> Thanks,
> Unmesh 
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.