from:"Benjamin Black"

Re: [akka-user] Re: Node quarantined

2016-04-29 Thread Benjamin Black

This is the latest version of akka for java 7. 

On Friday, April 29, 2016 at 3:18:55 PM UTC-4, Patrik Nordwall wrote:
>
> There can be several reasons, but a good start is to use latest Akka 
> version.
> tors 28 apr. 2016 kl. 21:13 skrev Guido Medina  >:
>
>> Hi Ben,
>>
>> As my experience goes Netty 3 doesn't get much love, issues are barely 
>> fixed,
>> like I mentioned before I'm running my own Netty 3.10.6 built internally, 
>> also; 3.10.0 is not even a good version,
>> if you want force your version to 3.10.5.Final until they release 
>> 3.10.6.Final which has nice fixes.
>>
>> or
>>
>> you could get my branch, set the version to whatever is comfortable for 
>> you and build your own Netty,
>>
>> My branch: https://github.com/guidomedina/netty/commits/3.10-SFS
>>
>> has the following milestone: 
>> https://github.com/netty/netty/issues?q=milestone%3A3.10.6.Final+is%3Aclosed
>>
>> plus some minor fixes I added myself, as of interest there is a race 
>> condition fixed at 3.10.6 and
>> I saw another between 3.10.0 and 3.10.5 which might be causing the issue 
>> you are experiencing.
>>
>> HTH,
>>
>> Guido.
>>
>> -- 
>> >> Read the docs: http://akka.io/docs/
>> >> Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >> Search the archives: https://groups.google.com/group/akka-user
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to akka-user+...@googlegroups.com .
>> To post to this group, send email to akka...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Re: Node quarantined

2016-04-28 Thread Benjamin Black

:00:10.356 WARN  [geyser-akka.remote.default-remote-dispatcher-9] 
Remoting - Association to [akka.tcp://geyser@172.16.125.13:7000] having UID 
[-1471771858] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.
14:00:10.385 WARN  [geyser-akka.remote.default-remote-dispatcher-10] 
a.r.EndpointWriter - AssociationError 
[akka.tcp://geyser@172.16.119.46:7000] -> 
[akka.tcp://geyser@172.16.125.13:7000]: Error [Invalid address: 
akka.tcp://geyser@172.16.125.13:7000] [
akka.remote.InvalidAssociation: Invalid address: 
akka.tcp://geyser@172.16.125.13:7000
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The 
remote system has a UID that has been quarantined. Association aborted.
]
14:00:10.386 INFO  [geyser-akka.remote.default-remote-dispatcher-27] 
Remoting - Quarantined address [akka.tcp://geyser@172.16.125.13:7000] is 
still unreachable or has not been restarted. Keeping it quarantined.

IP: 139
13:59:57.544 INFO  [geyser-akka.actor.default-dispatcher-187] 
AngelOfTheAbyss - Unreachable member (Member(address = 
akka.tcp://geyser@172.16.119.42:7000, status = Up)|Size:4)
13:59:58.359 INFO  [geyser-akka.actor.default-dispatcher-178] 
AngelOfTheAbyss - Unreachable member (Member(address = 
akka.tcp://geyser@172.16.125.13:7000, status = Up)|Size:3)
14:00:11.358 INFO  [geyser-akka.actor.default-dispatcher-32] 
AngelOfTheAbyss - Member removed (Member(address = 
akka.tcp://geyser@172.16.119.42:7000, status = Removed)|Size:3)
14:00:11.359 INFO  [geyser-akka.actor.default-dispatcher-32] 
AngelOfTheAbyss - Member removed (Member(address = 
akka.tcp://geyser@172.16.125.13:7000, status = Removed)|Size:3)
14:00:11.361 WARN  [geyser-akka.remote.default-remote-dispatcher-27] 
Remoting - Association to [akka.tcp://geyser@172.16.119.42:7000] having UID 
[-477546934] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.
14:00:11.361 WARN  [geyser-akka.remote.default-remote-dispatcher-27] 
Remoting - Association to [akka.tcp://geyser@172.16.125.13:7000] having UID 
[-1471771858] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.

Is there anything abnormal in the logs?

Regards,
Ben

On Wednesday, March 23, 2016 at 9:33:02 AM UTC-4, Benjamin Black wrote:
>
> I look forward to trying out the new version. Not totally sure it is the 
> same issue I'm seeing this happen on a cluster where no node is being 
> restarted. I shall continue to investigate what has changed on my side, 
> because I wasn't see this before I upgraded other libraries.
>
> On Wednesday, March 23, 2016 at 2:08:10 AM UTC-4, Patrik Nordwall wrote:
>>
>> We have fixed the issue that is noticed as 
>> "Error encountered while processing system message acknowledgement 
>> buffer: [-1 {}] ack: ACK[6, {}]"
>>
>> https://github.com/akka/akka/pull/20093
>>
>> It will be released in 2.4.3 and 2.3.15, probably by end of next week.
>>
>> /Patrik
>> tis 22 mars 2016 kl. 23:39 skrev Guido Medina <oxy...@gmail.com>:
>>
>>> Yeah sorry I thought it was related with rolling restart.
>>>
>>> As for Netty, I'm using a *non-published yet* Netty with the following 
>>> fixes:
>>>
>>> https://github.com/netty/netty/issues?q=milestone%3A3.10.6.Final+is%3Aclosed
>>>
>>> You can just get it from Git and:
>>>
>>> $ git checkout 3.10
>>> $ mvn versions:set -DnewVersion=3.10.6.Final -DgenerateBackupPoms=false
>>> $ mvn clean install
>>>
>>> And see if your problem goes away,
>>>
>>> Guido.
>>>
>>> On Tuesday, March 22, 2016 at 10:27:26 PM UTC, Benjamin Black wrote:
>>>>
>>>> Hi Guido, yes I'm aware of the leaving cluster conversation as I 
>>>> started it :-) This is separate issue. I am observing this behavior whilst 
>>>> the cluster seems stable with no nodes being added/removed. I suspect that 
>>>> this issue was first observed when I upgraded a different library that 
>>>> brought in a new version of the netty library.
>>>>
>>>> On Tuesday, March 22, 2016 at 6:23:14 PM UTC-4, Guido Medina wrote:
>>>>>
>>>>> Hi Benjamin,
>>>>>
>>>>> You have nodes with predefined ports, one thing I have which 
>>>>> eliminates that problem for these nodes is that
>>>>> only my seed node(s) have the port set, the rest will just get a 
>>

Re: [akka-user] Re: Node quarantined

2016-03-23 Thread Benjamin Black

I look forward to trying out the new version. Not totally sure it is the 
same issue I'm seeing this happen on a cluster where no node is being 
restarted. I shall continue to investigate what has changed on my side, 
because I wasn't see this before I upgraded other libraries.

On Wednesday, March 23, 2016 at 2:08:10 AM UTC-4, Patrik Nordwall wrote:
>
> We have fixed the issue that is noticed as 
> "Error encountered while processing system message acknowledgement buffer: 
> [-1 {}] ack: ACK[6, {}]"
>
> https://github.com/akka/akka/pull/20093
>
> It will be released in 2.4.3 and 2.3.15, probably by end of next week.
>
> /Patrik
> tis 22 mars 2016 kl. 23:39 skrev Guido Medina <oxy...@gmail.com 
> >:
>
>> Yeah sorry I thought it was related with rolling restart.
>>
>> As for Netty, I'm using a *non-published yet* Netty with the following 
>> fixes:
>>
>> https://github.com/netty/netty/issues?q=milestone%3A3.10.6.Final+is%3Aclosed
>>
>> You can just get it from Git and:
>>
>> $ git checkout 3.10
>> $ mvn versions:set -DnewVersion=3.10.6.Final -DgenerateBackupPoms=false
>> $ mvn clean install
>>
>> And see if your problem goes away,
>>
>> Guido.
>>
>> On Tuesday, March 22, 2016 at 10:27:26 PM UTC, Benjamin Black wrote:
>>>
>>> Hi Guido, yes I'm aware of the leaving cluster conversation as I started 
>>> it :-) This is separate issue. I am observing this behavior whilst the 
>>> cluster seems stable with no nodes being added/removed. I suspect that this 
>>> issue was first observed when I upgraded a different library that brought 
>>> in a new version of the netty library.
>>>
>>> On Tuesday, March 22, 2016 at 6:23:14 PM UTC-4, Guido Medina wrote:
>>>>
>>>> Hi Benjamin,
>>>>
>>>> You have nodes with predefined ports, one thing I have which eliminates 
>>>> that problem for these nodes is that
>>>> only my seed node(s) have the port set, the rest will just get a 
>>>> dynamic and available port, making it get a different port when you
>>>> do a rolling restart.
>>>>
>>>> I suspect you are doing a rolling restart right? so you need to wait 
>>>> for that node with that address to completely leave the cluster (I'm also 
>>>> doing that),
>>>> basically you terminate your system when you receive the message 
>>>> *MemberRemoved* for *_self_* address.
>>>>
>>>> I think I saw a discussion related to quarantine nodes when they are 
>>>> re-joining using the same address, not sure if here or if it is an actual 
>>>> Git ticket.
>>>>
>>>> HTH,
>>>>
>>>> Guido.
>>>>
>>> -- 
>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>> >>>>>>>>>> Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to akka-user+...@googlegroups.com .
>> To post to this group, send email to akka...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
>>>>>>>>>>  Read the docs: http://akka.io/docs/
>>>>>>>>>>  Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Re: Node quarantined

2016-03-22 Thread Benjamin Black

Hi Guido, yes I'm aware of the leaving cluster conversation as I started it 
:-) This is separate issue. I am observing this behavior whilst the cluster 
seems stable with no nodes being added/removed. I suspect that this issue 
was first observed when I upgraded a different library that brought in a 
new version of the netty library.

On Tuesday, March 22, 2016 at 6:23:14 PM UTC-4, Guido Medina wrote:
>
> Hi Benjamin,
>
> You have nodes with predefined ports, one thing I have which eliminates 
> that problem for these nodes is that
> only my seed node(s) have the port set, the rest will just get a dynamic 
> and available port, making it get a different port when you
> do a rolling restart.
>
> I suspect you are doing a rolling restart right? so you need to wait for 
> that node with that address to completely leave the cluster (I'm also doing 
> that),
> basically you terminate your system when you receive the message 
> *MemberRemoved* for *_self_* address.
>
> I think I saw a discussion related to quarantine nodes when they are 
> re-joining using the same address, not sure if here or if it is an actual 
> Git ticket.
>
> HTH,
>
> Guido.
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Re: Node quarantined

2016-03-22 Thread Benjamin Black

I see the same issue with 2.3.14.

On Tuesday, March 22, 2016 at 2:00:15 PM UTC-4, Guido Medina wrote:
>
> To eliminate noise please update to 2.3.14 which from 2.3.11 has some 
> cluster fixes, there are also several fixes on Scala 2.11.8 (not related)
>
> I don't know, I just have the custom of keeping my libs up to date.
>
> HTH,
>
> Guido.
>
> On Tuesday, March 22, 2016 at 5:34:23 PM UTC, Benjamin Black wrote:
>>
>> Hello,
>>
>> I'm trying to understand the cause of nodes being quarantined and 
>> possible solutions to fixing it. I'm using akka 2.3.11. On the quarantined 
>> node I see this logging:
>>
>> 2:45:44.204 ERROR [geyser-akka.remote.default-remote-dispatcher-6] 
>> a.r.EndpointWriter - AssociationError [akka.tcp://
>> geyser@172.16.120.174:7000] <- [akka.tcp://geyser@172.17.100.105:7000]: 
>> Error [Invalid address: akka.tcp://geyser@172.17.100.105:7000] [
>> akka.remote.InvalidAssociation: Invalid address: akka.tcp://
>> geyser@172.17.100.105:7000
>> Caused by: akka.remote.transport.Transport$InvalidAssociationException: 
>> The remote system has quarantined this system. No further associations to 
>> the remote system are possible until this system is restarted.
>> ]
>> 12:45:44.205 WARN  [geyser-akka.remote.default-remote-dispatcher-25] 
>> Remoting - Tried to associate with unreachable remote address [akka.tcp://
>> geyser@172.17.100.105:7000]. Address is now gated for 5000 ms, all 
>> messages to this address will be delivered to dead letters. Reason: [The 
>> remote system has quarantined this system. No further associations to the 
>> remote system are possible until this system is restarted.]
>>
>> And on the node that cause the box to be quarantined I see this logging:
>>
>> 12:45:44.194 WARN  [geyser-akka.remote.default-remote-dispatcher-6] 
>> Remoting - Association to [akka.tcp://geyser@172.16.120.174:7000] having 
>> UID [-450748474] is irrecoverably failed. UID is now quarantined and all 
>> messages to this UID will be delivered to dead letters. Remote actorsystem 
>> must be restarted to recover from this situation.
>> 12:45:44.202 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
>> a.r.EndpointWriter - AssociationError [akka.tcp://
>> geyser@172.17.100.105:7000] -> [akka.tcp://geyser@172.16.120.174:7000]: 
>> Error [Invalid address: akka.tcp://geyser@172.16.120.174:7000] [
>> akka.remote.InvalidAssociation: Invalid address: akka.tcp://
>> geyser@172.16.120.174:7000
>> Caused by: akka.remote.transport.Transport$InvalidAssociationException: 
>> The remote system has a UID that has been quarantined. Association aborted.
>> ]
>> 12:45:44.203 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
>> Remoting - Tried to associate with unreachable remote address [akka.tcp://
>> geyser@172.16.120.174:7000]. Address is now gated for 5000 ms, all 
>> messages to this address will be delivered to dead letters. Reason: [The 
>> remote system has a UID that has been quarantined. Association aborted.]
>> 12:45:44.221 ERROR [geyser-akka.remote.default-remote-dispatcher-7] 
>> Remoting - Association to [akka.tcp://geyser@172.16.120.174:7000] with 
>> UID [-450748474] irrecoverably failed. Quarantining address.
>> java.lang.IllegalStateException: Error encountered while processing 
>> system message acknowledgement buffer: [-1 {}] ack: ACK[6, {}]
>> at 
>> akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:288)
>>  
>> ~[geyser.jar:1.1.17-SNAPSHOT]
>> at akka.actor.Actor$class.aroundReceive(Actor.scala:467) 
>> ~[geyser.jar:1.1.17-SNAPSHOT]
>> Caused by: java.lang.IllegalArgumentException: Highest SEQ so far was -1 
>> but cumulative ACK is 6
>> at 
>> akka.remote.AckedSendBuffer.acknowledge(AckedDelivery.scala:103) 
>> ~[geyser.jar:1.1.17-SNAPSHOT]
>> at 
>> akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:284)
>>  
>> ~[geyser.jar:1.1.17-SNAPSHOT]
>> ... 11 common frames omitted
>> 12:45:44.221 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
>> Remoting - Association to [akka.tcp://geyser@172.16.120.174:7000] having 
>> UID [-450748474] is irrecoverably failed. UID is now quarantined and all 
>> messages to this UID will be delivered to dead letters. Remote actorsystem 
>> must be restarted to recover from this situation.
>>
>> Quite a bit of data can be passed between the nodes ~200 Mb/sec and maybe 
>> the system is hitting a capacity issue although I don't see any issue with 
>>

[akka-user] Re: Clarification on unreachable nodes in cluster

2016-03-19 Thread Benjamin Black

Hi Guido,

I think in your case you are shutting down before the node has communicated 
to the leader that it wants to leave. I wait to get the MemberExited 
message before shutting down the node. Maybe I should wait for the 
MemberRemoved? Either way the ultimate aim is to not have the unreachable 
logic kick in and have to wait x seconds (I use 10 seconds) for the node to 
be auto downed by the leader. And the reason why I don't want to wait is 
according to the docs the leader wouldn't be able to add nodes whilst any 
node in the cluster is considered unreachable, which is a problem if I'm 
doing a rolling restart of all the nodes.

Regards,
Ben

On Thursday, March 17, 2016 at 4:51:30 PM UTC-4, Guido Medina wrote:
>
> As for cluster.leave(cluster.selfAddress) my micro-services use the 
> following to leave:
>
> Runtime.getRuntime().addShutdownHook(new Thread() {
>   @Override
>   public void run() {
> final Cluster cluster = Cluster.get(system);
> cluster.leave(cluster.selfAddress());
> system.terminate();
> Configurator.shutdown((LoggerContext) LogManager.getContext());
>   }
> });
>
> But honestly I have never seen that work, the other nodes just report it 
> as unreachable until it times out and it is completely removed,
> maybe the shutdown happens so fast that it is useless in my case.
>
> HTH,
>
> Guido.
>
> On Thursday, March 17, 2016 at 8:33:29 PM UTC, Guido Medina wrote:
>>
>> Hi Benjamin,
>>
>> I also rely on cluster events and AFAIK you can expect (and trust) 
>> *MemberUp* and *MemberRemoved*, these IMHO are the only two consistent 
>> states you can trust.
>> In other words, I register some actors only when their nodes reach 
>> *MemberUp* and unregister only when their nodes reach *MemberRemoved*
>> Any other state in between I would treat them as information only.
>>
>> So far I haven't got any issue with my mini-shard implementation relying 
>> on these only 2 statuses, the draw back is that it will only have to wait 
>> for a longer time to react.
>>
>> HTH,
>>
>> Guido.
>>
>> On Thursday, March 17, 2016 at 6:07:48 PM UTC, Benjamin Black wrote:
>>>
>>> Hello,
>>>
>>> I'm adding logic to our service so that when a node is being restarted 
>>> it gracefully leaves the cluster using cluster.leave(cluster
>>> .selfAddress). In the cluster specification doc it states:
>>>
>>> If a node is unreachable then gossip convergence is not possible and 
>>> therefore any leader actions are also not possible (for instance, 
>>> allowing a node to become a part of the cluster). To be able to move 
>>> forward the state of theunreachable nodes must be changed. It must 
>>> become reachable again or marked as down
>>>
>>> Is this totally true? If a node is unreachable and is the 
>>> leaving/exiting/removed state will this stop the leader from adding a new 
>>> node? I ask because I have an actor that subscribes to cluster events and I 
>>> can see a node is being added whilst another node is considered unreachable 
>>> and in the exiting status:
>>>
>>> 14:02:46.843 INFO  Exited member Member(address = akka.tcp://
>>> geyser@172.16.120.160:7000, status = Exiting)
>>> 14:02:51.842 INFO  Unreachable member Member(address = akka.tcp://
>>> geyser@172.16.120.160:7000, status = Exiting)
>>> 14:02:53.843 INFO  Removing member Member(address = akka.tcp://
>>> geyser@172.16.120.160:7000, status = Removed)
>>> 14:02:57.843 INFO  Exited member Member(address = akka.tcp://
>>> geyser@172.16.119.46:7000, status = Exiting)
>>> 14:03:02.760 INFO  Unreachable member Member(address = akka.tcp://
>>> geyser@172.16.119.46:7000, status = Exiting)
>>> 14:03:04.843 INFO  Adding member Member(address = akka.tcp://
>>> geyser@172.16.120.160:7000, status = Up)
>>>
>>> Thanks,
>>> Ben
>>>
>>>

-- 
>>>>>>>>>>  Read the docs: http://akka.io/docs/
>>>>>>>>>>  Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Clarification on unreachable nodes in cluster

2016-03-19 Thread Benjamin Black

Hello,

I'm adding logic to our service so that when a node is being restarted it 
gracefully leaves the cluster using cluster.leave(cluster.selfAddress). In 
the cluster specification doc it states:

If a node is unreachable then gossip convergence is not possible and 
therefore any leader actions are also not possible (for instance, allowing 
a node to become a part of the cluster). To be able to move forward the 
state of theunreachable nodes must be changed. It must become reachable again 
or marked as down

Is this totally true? If a node is unreachable and is the 
leaving/exiting/removed state will this stop the leader from adding a new 
node? I ask because I have an actor that subscribes to cluster events and I 
can see a node is being added whilst another node is considered unreachable 
and in the exiting status:

14:02:46.843 INFO  Exited member Member(address = 
akka.tcp://geyser@172.16.120.160:7000, status = Exiting)
14:02:51.842 INFO  Unreachable member Member(address = 
akka.tcp://geyser@172.16.120.160:7000, status = Exiting)
14:02:53.843 INFO  Removing member Member(address = 
akka.tcp://geyser@172.16.120.160:7000, status = Removed)
14:02:57.843 INFO  Exited member Member(address = 
akka.tcp://geyser@172.16.119.46:7000, status = Exiting)
14:03:02.760 INFO  Unreachable member Member(address = 
akka.tcp://geyser@172.16.119.46:7000, status = Exiting)
14:03:04.843 INFO  Adding member Member(address = 
akka.tcp://geyser@172.16.120.160:7000, status = Up)

Thanks,
Ben

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Lost actor communication

2015-07-06 Thread Benjamin Black

I am running a 24 node cluster that is roughly split into two roles: 
frontend and backend. There is a streamer actor on the frontend node 
talking to a tracker actor on the backend node. There can be many streamer 
actors on several frontend nodes talking to one tracker actor. It would 
seem that at some point the streamer actor on a frontend node stop being 
able to communicate with the tracker actor. It would seem that communicate 
between the frontend node and the backend node has been lost, but the 
backend node can still receive messages from the frontend. I say this 
because the streamer was able to send a poison pill to the tracker, which 
successfully killed the actor, but the streamer wasn't informed about the 
termination.

I see no indication that a node has fallen from the cluster or is having 
problems communicating (I have logging set to INFO). Is there anything I 
can do to get a better idea of what is happening?

Thanks,
Ben

-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Akka cluster: node identity crisis

2014-07-02 Thread Benjamin Black

I upgraded to Akka 2.3.4 (scala 2.10), but still seeing the same issue. 
When I log Cluster(system).selfUniqueAddress I get something 
like UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,630715883), 
with the last number changing everytime I restart.

The gossip message error always has the same number -1482656725. For 
example,

18:12:27.472 INFO  [streaming-akka.actor.default-dispatcher-15] 
Cluster(akka://streaming) - Cluster Node 
[akka.tcp://streaming@172.17.102.128:7000] - Ignoring received gossip 
intended for someone else, from [akka.tcp://streaming@172.17.110.143:7000] 
to [UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,-1482656725)]



On Tuesday, July 1, 2014 2:33:52 PM UTC-4, Patrik Nordwall wrote:

 Please use latest version, i.e. 2.3.4

 There you find Cluster(system).selfUniqueAddress that includes the uid.

 I will look into this in more detail tomorrow.

 /Patrik

 1 jul 2014 kl. 19:57 skrev Benjamin Black benbl...@gmail.com 
 javascript::

 I have a cluster of 15 nodes/boxes. I start the nodes roughly at the same 
 time. One of the nodes is behaving oddly and continually logging Ignoring 
 received gossip intended for someone else. However, the node does seem to 
 work for a while before being being dropped from the cluster. Basically 
 this one node seems to think it is someone else, whilst also behaviouring 
 as itself. The code and config is exactly the same on all 15 nodes so I 
 don't understand why I'm getting this issue on only one node. Maybe this is 
 a hardware issue?

 Some logging:

 11:27:45.412 INFO  [main] Remoting - Starting remoting
 11:27:45.638 INFO  [main] Remoting - Remoting started; listening on 
 addresses :[akka.tcp://streaming@172.17.102.128:7000]
 11:27:45.660 INFO  [main] Cluster(akka://streaming) - Cluster Node 
 [akka.tcp://streaming@172.17.102.128:7000] - Starting up...
 11:27:45.714 INFO  [main] Cluster(akka://streaming) - Cluster Node 
 [akka.tcp://streaming@172.17.102.128:7000] - Registered cluster JMX MBean 
 [akka:type=Cluster]
 11:27:45.715 INFO  [main] Cluster(akka://streaming) - Cluster Node 
 [akka.tcp://streaming@172.17.102.128:7000] - Started up successfully
 11:27:45.830 INFO  [streaming-akka.actor.default-dispatcher-3] 
 a.a.LocalActorRef - Message 
 [akka.cluster.InternalClusterAction$InitJoinAck] from Actor[akka.tcp://
 streaming@172.17.100.98:7000/system/cluster/core/daemon#1997515880] to 
 Actor[akka://streaming/system/cluster/core/daemon/joinSeedNodeProcess-1#1132911]
  
 was not delivered. [1] dead letters encountered. This logging can be turned 
 off or adjusted with configuration settings 'akka.log-dead-letters' and 
 'akka.log-dead-letters-during-shutdown'.
 11:27:45.872 INFO  [streaming-akka.actor.default-dispatcher-5] 
 Cluster(akka://streaming) - Cluster Node [akka.tcp://
 streaming@172.17.102.128:7000] - Welcome from [akka.tcp://
 streaming@172.17.102.125:7000]
 11:27:45.911 INFO  [streaming-akka.actor.default-dispatcher-2] 
 Cluster(akka://streaming) - Cluster Node [akka.tcp://
 streaming@172.17.102.128:7000] - Ignoring received gossip intended for 
 someone else, from [akka.tcp://streaming@172.17.102.68:7000] to 
 [UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,-1482656725)]
 11:27:45.943 INFO  [streaming-akka.actor.default-dispatcher-16] 
 Cluster(akka://streaming) - Cluster Node [akka.tcp://
 streaming@172.17.102.128:7000] - Ignoring received gossip intended for 
 someone else, from [akka.tcp://streaming@172.17.102.70:7000] to 
 [UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,-1482656725)]
 11:27:46.122 INFO  [streaming-akka.actor.default-dispatcher-16] 
 Cluster(akka://streaming) - Cluster Node [akka.tcp://
 streaming@172.17.102.128:7000] - Ignoring received gossip intended for 
 someone else, from [akka.tcp://streaming@172.17.102.69:7000] to 
 [UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,-1482656725)]

 Config:

 akka {
   cluster {
 seed-nodes = [
   akka.tcp://streaming@172.17.102.125:7000
   akka.tcp://streaming@172.17.100.98:7000
 ]
   }
   remote.netty.tcp.hostname = 172.17.102.128
 }

 I thought it was weird that the unique address in the gossip messages 
 referred to a negative number. I added log.info(smy unique ID: 
 ${AddressUidExtension(actorSystem).addressUid}) to the confused node (I 
 hope this is the correct code) and it gave me the answer 1549799231, 
 whilst continuing to give -1482656725 in the gossip messages. I'm 
 guessing the problem is that the gossip messages have a corrupted address, 
 which is why the confused node believes these messages are not for itself. 
 I'm using Akka 2.3.2.


  -- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
 --- 
 You received this message because you are subscribed to the Google Groups 
 Akka User List group.
 To unsubscribe from this group and stop receiving emails from it, send

Re: [akka-user] Akka cluster: node identity crisis

2014-07-02 Thread Benjamin Black

Problem resolved. I was running two clusters (dev and prod) and somehow 
(the mystery remains) the dev cluster was interacting with this one box in 
the prod cluster. My advice to anyone else who sees this issue is to check 
the IP addresses of the from nodes.

On Wednesday, July 2, 2014 4:31:32 AM UTC-4, Patrik Nordwall wrote:




 On Wed, Jul 2, 2014 at 12:21 AM, Benjamin Black benbl...@gmail.com 
 javascript: wrote:

 I upgraded to Akka 2.3.4 (scala 2.10), but still seeing the same issue. 
 When I log Cluster(system).selfUniqueAddress I get something 
 like UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,630715883), 
 with the last number changing everytime I restart.


 Then it would be great if you could describe how to reproduce the problem. 
 Preferably using a minimal sample, such as the SimpleClusterListener in the 
 cluster 
 sample https://typesafe.com/activator/template/akka-sample-cluster-scala
 .
  


 The gossip message error always has the same number -1482656725. For 
 example,

 18:12:27.472 INFO  [streaming-akka.actor.default-dispatcher-15] 
 Cluster(akka://streaming) - Cluster Node [akka.tcp://
 streaming@172.17.102.128:7000] - Ignoring received gossip intended for 
 someone else, from [akka.tcp://streaming@172.17.110.143:7000] to 
 [UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,-1482656725)]


 It is not an error to see a few of these while joining (especially when 
 joining several nodes at the same time), but if this logging continues 
 (never ends) something is wrong.

 Negative uid should not be an issue. It is just a random integer that is 
 generated when the ActorSystem is started. The reason for using the uid is 
 to be able to differentiate new and old actor system on the same host:port 
 from each other when it is restarted.

 Regards,
 Patrik
  



 On Tuesday, July 1, 2014 2:33:52 PM UTC-4, Patrik Nordwall wrote:

 Please use latest version, i.e. 2.3.4

 There you find Cluster(system).selfUniqueAddress that includes the uid.

 I will look into this in more detail tomorrow.

 /Patrik

 1 jul 2014 kl. 19:57 skrev Benjamin Black benbl...@gmail.com:

 I have a cluster of 15 nodes/boxes. I start the nodes roughly at the 
 same time. One of the nodes is behaving oddly and continually logging 
 Ignoring received gossip intended for someone else. However, the node 
 does seem to work for a while before being being dropped from the cluster. 
 Basically this one node seems to think it is someone else, whilst also 
 behaviouring as itself. The code and config is exactly the same on all 15 
 nodes so I don't understand why I'm getting this issue on only one node. 
 Maybe this is a hardware issue?

 Some logging:

 11:27:45.412 INFO  [main] Remoting - Starting remoting
 11:27:45.638 INFO  [main] Remoting - Remoting started; listening on 
 addresses :[akka.tcp://streaming@172.17.102.128:7000]
 11:27:45.660 INFO  [main] Cluster(akka://streaming) - Cluster Node 
 [akka.tcp://streaming@172.17.102.128:7000] - Starting up...
 11:27:45.714 INFO  [main] Cluster(akka://streaming) - Cluster Node 
 [akka.tcp://streaming@172.17.102.128:7000] - Registered cluster JMX 
 MBean [akka:type=Cluster]
 11:27:45.715 INFO  [main] Cluster(akka://streaming) - Cluster Node 
 [akka.tcp://streaming@172.17.102.128:7000] - Started up successfully
 11:27:45.830 INFO  [streaming-akka.actor.default-dispatcher-3] 
 a.a.LocalActorRef - Message 
 [akka.cluster.InternalClusterAction$InitJoinAck] 
 from Actor[akka.tcp://streaming@172.17.100.98:7000/system/
 cluster/core/daemon#1997515880] to Actor[akka://streaming/system/
 cluster/core/daemon/joinSeedNodeProcess-1#1132911] was not delivered. 
 [1] dead letters encountered. This logging can be turned off or adjusted 
 with configuration settings 'akka.log-dead-letters' and 
 'akka.log-dead-letters-during-shutdown'.
 11:27:45.872 INFO  [streaming-akka.actor.default-dispatcher-5] 
 Cluster(akka://streaming) - Cluster Node [akka.tcp://streaming@172.17.
 102.128:7000] - Welcome from [akka.tcp://streaming@172.17.102.125:7000]
 11:27:45.911 INFO  [streaming-akka.actor.default-dispatcher-2] 
 Cluster(akka://streaming) - Cluster Node [akka.tcp://streaming@172.17.
 102.128:7000] - Ignoring received gossip intended for someone else, 
 from [akka.tcp://streaming@172.17.102.68:7000] to 
 [UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,-1482656725)]
 11:27:45.943 INFO  [streaming-akka.actor.default-dispatcher-16] 
 Cluster(akka://streaming) - Cluster Node [akka.tcp://streaming@172.17.
 102.128:7000] - Ignoring received gossip intended for someone else, 
 from [akka.tcp://streaming@172.17.102.70:7000] to 
 [UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,-1482656725)]
 11:27:46.122 INFO  [streaming-akka.actor.default-dispatcher-16] 
 Cluster(akka://streaming) - Cluster Node [akka.tcp://streaming@172.17.
 102.128:7000] - Ignoring received gossip intended for someone else, 
 from [akka.tcp://streaming@172.17.102.69:7000] to 
 [UniqueAddress(akka.tcp

[akka-user] Akka cluster: node identity crisis

2014-07-01 Thread Benjamin Black

I have a cluster of 15 nodes/boxes. I start the nodes roughly at the same 
time. One of the nodes is behaving oddly and continually logging Ignoring 
received gossip intended for someone else. However, the node does seem to 
work for a while before being being dropped from the cluster. Basically 
this one node seems to think it is someone else, whilst also behaviouring 
as itself. The code and config is exactly the same on all 15 nodes so I 
don't understand why I'm getting this issue on only one node. Maybe this is 
a hardware issue?

Some logging:

11:27:45.412 INFO  [main] Remoting - Starting remoting
11:27:45.638 INFO  [main] Remoting - Remoting started; listening on 
addresses :[akka.tcp://streaming@172.17.102.128:7000]
11:27:45.660 INFO  [main] Cluster(akka://streaming) - Cluster Node 
[akka.tcp://streaming@172.17.102.128:7000] - Starting up...
11:27:45.714 INFO  [main] Cluster(akka://streaming) - Cluster Node 
[akka.tcp://streaming@172.17.102.128:7000] - Registered cluster JMX MBean 
[akka:type=Cluster]
11:27:45.715 INFO  [main] Cluster(akka://streaming) - Cluster Node 
[akka.tcp://streaming@172.17.102.128:7000] - Started up successfully
11:27:45.830 INFO  [streaming-akka.actor.default-dispatcher-3] 
a.a.LocalActorRef - Message 
[akka.cluster.InternalClusterAction$InitJoinAck] from 
Actor[akka.tcp://streaming@172.17.100.98:7000/system/cluster/core/daemon#1997515880]
 
to 
Actor[akka://streaming/system/cluster/core/daemon/joinSeedNodeProcess-1#1132911]
 
was not delivered. [1] dead letters encountered. This logging can be turned 
off or adjusted with configuration settings 'akka.log-dead-letters' and 
'akka.log-dead-letters-during-shutdown'.
11:27:45.872 INFO  [streaming-akka.actor.default-dispatcher-5] 
Cluster(akka://streaming) - Cluster Node 
[akka.tcp://streaming@172.17.102.128:7000] - Welcome from 
[akka.tcp://streaming@172.17.102.125:7000]
11:27:45.911 INFO  [streaming-akka.actor.default-dispatcher-2] 
Cluster(akka://streaming) - Cluster Node 
[akka.tcp://streaming@172.17.102.128:7000] - Ignoring received gossip 
intended for someone else, from [akka.tcp://streaming@172.17.102.68:7000] 
to [UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,-1482656725)]
11:27:45.943 INFO  [streaming-akka.actor.default-dispatcher-16] 
Cluster(akka://streaming) - Cluster Node 
[akka.tcp://streaming@172.17.102.128:7000] - Ignoring received gossip 
intended for someone else, from [akka.tcp://streaming@172.17.102.70:7000] 
to [UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,-1482656725)]
11:27:46.122 INFO  [streaming-akka.actor.default-dispatcher-16] 
Cluster(akka://streaming) - Cluster Node 
[akka.tcp://streaming@172.17.102.128:7000] - Ignoring received gossip 
intended for someone else, from [akka.tcp://streaming@172.17.102.69:7000] 
to [UniqueAddress(akka.tcp://streaming@172.17.102.128:7000,-1482656725)]

Config:

akka {
  cluster {
seed-nodes = [
  akka.tcp://streaming@172.17.102.125:7000
  akka.tcp://streaming@172.17.100.98:7000
]
  }
  remote.netty.tcp.hostname = 172.17.102.128
}

I thought it was weird that the unique address in the gossip messages 
referred to a negative number. I added log.info(smy unique ID: 
${AddressUidExtension(actorSystem).addressUid}) to the confused node (I 
hope this is the correct code) and it gave me the answer 1549799231, 
whilst continuing to give -1482656725 in the gossip messages. I'm 
guessing the problem is that the gossip messages have a corrupted address, 
which is why the confused node believes these messages are not for itself. 
I'm using Akka 2.3.2.


-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

[akka-user] Akka actors not getting balanced thread time

2014-04-16 Thread Benjamin Black

I'm creating an application that HTTP streams data that is read from Kafka. 
A client can create multiple connections, with the data being evenly 
balanced between the connections. I'm using Spray 1.3.1 to handle the HTTP 
streaming and Akka 2.3.0. Each client connection creates a streamer actor 
that gets data from a reader actor that is unique to the client. For 
example, if a client connects four times, there will be four streamer 
actors, with the streamer actors all requesting data from one reader actor.

What I'm witnessing is the following behavior (all connections to the same 
process, using default dispatcher):

T0: 1st client connection, 1st streamer and reader created, streamer 
requests 1400 msgs per second from the reader
T1: 2nd client connection, 2nd streamer created, 1st streamer requesting 
600 msgs per second, 2nd streamer requesting 1200 msgs per second
T2: 1st client connection killed, 1st streamer killed, 2nd streamer 
requesting 1700 msgs per second
T3: 3rd client connection, 3rd streamer created, 2nd  3rd streamer each 
requesting 1000 per second (this is the behavior I want!)

Basically it would seem that the 1st streamer is not getting the same 
thread time as later streamers. Is this a crazy thought? Is there anything 
I check to make sure the akka system is setup correctly? If people think 
this could be an akka bug then I can try to put together a small code 
example that demonstrates this behavior. Thanks.

-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Akka actors not getting balanced thread time

2014-04-16 Thread Benjamin Black

I'm working on reducing my code into something I can share. My system is 
actually a bit more complicated than I've explained. I'm using remote 
actors with other actors that coordinator everything and other actors that 
track offsets. I suppose I'm trying to understand how Akka allocates thread 
time to actors. The first streamer actor takes time to get up to full 
streaming speed, so maybe this affects the akka calculation? 

What kind of JVM monitoring are you referring to? I've used YourKit, but it 
didn't notice anything unusual.

On Wednesday, April 16, 2014 3:08:30 PM UTC-4, √ wrote:

 Hi Benjamin,

 your question is hypothetical and without the code and config etc it's 
 impossible to make a qualified answer.
 What does your JVM monitoring tell you?


 On Wed, Apr 16, 2014 at 7:47 PM, Benjamin Black 
 benbl...@gmail.comjavascript:
  wrote:

 I'm creating an application that HTTP streams data that is read from 
 Kafka. A client can create multiple connections, with the data being evenly 
 balanced between the connections. I'm using Spray 1.3.1 to handle the HTTP 
 streaming and Akka 2.3.0. Each client connection creates a streamer actor 
 that gets data from a reader actor that is unique to the client. For 
 example, if a client connects four times, there will be four streamer 
 actors, with the streamer actors all requesting data from one reader actor.

 What I'm witnessing is the following behavior (all connections to the 
 same process, using default dispatcher):

 T0: 1st client connection, 1st streamer and reader created, streamer 
 requests 1400 msgs per second from the reader
 T1: 2nd client connection, 2nd streamer created, 1st streamer requesting 
 600 msgs per second, 2nd streamer requesting 1200 msgs per second
 T2: 1st client connection killed, 1st streamer killed, 2nd streamer 
 requesting 1700 msgs per second
 T3: 3rd client connection, 3rd streamer created, 2nd  3rd streamer each 
 requesting 1000 per second (this is the behavior I want!)

 Basically it would seem that the 1st streamer is not getting the same 
 thread time as later streamers. Is this a crazy thought? Is there anything 
 I check to make sure the akka system is setup correctly? If people think 
 this could be an akka bug then I can try to put together a small code 
 example that demonstrates this behavior. Thanks.
  
 -- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
 --- 
 You received this message because you are subscribed to the Google Groups 
 Akka User List group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to akka-user+...@googlegroups.com javascript:.
 To post to this group, send email to akka...@googlegroups.comjavascript:
 .
 Visit this group at http://groups.google.com/group/akka-user.
 For more options, visit https://groups.google.com/d/optout.




 -- 
 Cheers,
 √
  

-- 
  Read the docs: http://akka.io/docs/
  Check the FAQ: 
 http://doc.akka.io/docs/akka/current/additional/faq.html
  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups Akka 
User List group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Re: Node quarantined

Re: [akka-user] Re: Node quarantined

Re: [akka-user] Re: Node quarantined

[akka-user] Re: Node quarantined

[akka-user] Re: Node quarantined

[akka-user] Re: Clarification on unreachable nodes in cluster

[akka-user] Clarification on unreachable nodes in cluster

[akka-user] Lost actor communication

Re: [akka-user] Akka cluster: node identity crisis

Re: [akka-user] Akka cluster: node identity crisis

[akka-user] Akka cluster: node identity crisis

[akka-user] Akka actors not getting balanced thread time

Re: [akka-user] Akka actors not getting balanced thread time

13 matches

Site Navigation

Mail list logo

Footer information