Re: [akka-user] Connection state in Artery

2017-11-13 Thread Merlijn Boogerd
I will, thanks for your feedback, much appreciated! :)

Op maandag 13 november 2017 17:10:28 UTC+1 schreef Patrik Nordwall:
>
>
>
> On Mon, Nov 13, 2017 at 4:43 PM, Merlijn Boogerd  > wrote:
>
>> Hi Patrik,
>>
>> Thanks for your fast response! 
>>
>> > In general we recommend against using Akka Remoting without Akka 
>> Cluster, but you might have good reasons for not using Akka Cluster?
>>
>> My reason to not use Cluster is because its complete view of the cluster 
>> members poses a scalability issue beyond a certain size. There's a paper 
>>  where it is 
>> claimed that Akka Cluster scales to 2400 nodes, but my ambition is to scale 
>> quite a bit beyond that. Not sure what your experiences are, but my guess 
>> is that Cluster would not be comfortable with millions of members. My use 
>> case doesn't require everything that Cluster provides either, so I trade 
>> some of its convenience for greater scalability.
>>
>> > I have created issue https://github.com/akka/akka/issues/23967
>>
>> Just for the sake of clarity, if this issue were to be resolved, you 
>> don't see any obvious problems in Artery/Aeron that make it a bad fit for 
>> my use case (using few connections at a time, but many over time)?
>>
>
> The limitation of Aeron is around number of simultaneous streams, as far 
> as I understand. I assume that Aeron will cleanup closed publications 
> (channels), but that is probably something you should test early on
>
> FYI, we are (slowly) working on Artery with TCP 
>  as an alternative to Aeron. 
> That could be a fallback plan if Aeron doesn't fit your needs.
>  
>
>>
>> Kind regards,
>>
>> Merlijn
>>
>>
>>
>> Op zondag 12 november 2017 18:25:16 UTC+1 schreef Patrik Nordwall:
>>>
>>> Hi Merlijn,
>>>
>>>
>>>
>>> On Sun, Nov 12, 2017 at 5:02 PM, Merlijn Boogerd >> > wrote:
>>>
 Hi fellow hakkers,

 I have two questions regarding the Artery module (I am not considering 
 the previous remoting as it will eventually get deprecated). I implemented 
 a peer sampling service (HyParView) and am in the process of implementing 
 a 
 clustering service (Vicinity), both directly on top of Artery. Although it 
 seems to work, I have some worries after delving deeper into Artery and 
 Aeron.

 *Connection control*:
 In my use case, a node may have contacted 1000s of other nodes over 
 time in its lifetime, while only actively using a handful (<10) in a 
 single 
 minute. The services I am implementing are (supposed to be) lightweight, 
 but I see potential reasons why performance might deteriorate over time.

 - Aeron claims 
 
  
 that its number of streams shouldn't be high (never over a thousand, but 
 ideally much lower). It is not clear to me what the costs are for Aeron if 
 a 'connection' is not used (the linked documentation might even refer only 
 to the Publisher/Subscribers directly connected to the MediaDriver, I'm 
 too 
 much of a noob to understand the docs)
 - Artery registers an association for each contacted remote (perhaps 
 more state even, I may have missed stuff). Users don't get to 'close the 
 connection' (I can see reasons why), but Artery does not seem to come with 
 a mechanism to clean unused connections either.

>>>
>>> That is a missing piece that should be fixed. The reason we haven't done 
>>> it yet is partly an oversight and partly because it already works with Akka 
>>> Cluster, since those outbound streams are stopped when a member is removed 
>>> from the cluster (or quarantined for other reasons).
>>>
>>> I have created issue https://github.com/akka/akka/issues/23967
>>>  
>>>

 Can you guys make an educated guess for performance drop in my use 
 case? And if it is significant, what would you advise as a 
 counter-measure? 
 I could see unused-association-garbage collection as a useful addition to 
 Remoting, I would be happy to help out if useful.

 *Quarantining*
 When remote watch fails for some remote actor system, that actor system 
 gets quarantined. In my case, that is a bit radical, as I don't 
 necessarily 
 have control over either of those ActorSystems. Without the ability to 
 reboot either ActorSystem, these systems would continue treating each 
 other 
 as 'down' even though the partition may have long passed. I could 
 instantiate a failure detector explicitly instead of using context.watch, 
 in a way that quarantining is not a consequence of failure-detection. 
 However, it feels like I am missing something with such a simple solution. 
 Why is quarantining as persistent as it is, if skipping it has no 
 downside? 
 What would 

Re: [akka-user] Connection state in Artery

2017-11-13 Thread Patrik Nordwall
On Mon, Nov 13, 2017 at 4:43 PM, Merlijn Boogerd <
merlijn.boog...@trivento.nl> wrote:

> Hi Patrik,
>
> Thanks for your fast response!
>
> > In general we recommend against using Akka Remoting without Akka
> Cluster, but you might have good reasons for not using Akka Cluster?
>
> My reason to not use Cluster is because its complete view of the cluster
> members poses a scalability issue beyond a certain size. There's a paper
>  where it is
> claimed that Akka Cluster scales to 2400 nodes, but my ambition is to scale
> quite a bit beyond that. Not sure what your experiences are, but my guess
> is that Cluster would not be comfortable with millions of members. My use
> case doesn't require everything that Cluster provides either, so I trade
> some of its convenience for greater scalability.
>
> > I have created issue https://github.com/akka/akka/issues/23967
>
> Just for the sake of clarity, if this issue were to be resolved, you don't
> see any obvious problems in Artery/Aeron that make it a bad fit for my use
> case (using few connections at a time, but many over time)?
>

The limitation of Aeron is around number of simultaneous streams, as far as
I understand. I assume that Aeron will cleanup closed publications
(channels), but that is probably something you should test early on

FYI, we are (slowly) working on Artery with TCP
 as an alternative to Aeron. That
could be a fallback plan if Aeron doesn't fit your needs.


>
> Kind regards,
>
> Merlijn
>
>
>
> Op zondag 12 november 2017 18:25:16 UTC+1 schreef Patrik Nordwall:
>>
>> Hi Merlijn,
>>
>>
>>
>> On Sun, Nov 12, 2017 at 5:02 PM, Merlijn Boogerd 
>> wrote:
>>
>>> Hi fellow hakkers,
>>>
>>> I have two questions regarding the Artery module (I am not considering
>>> the previous remoting as it will eventually get deprecated). I implemented
>>> a peer sampling service (HyParView) and am in the process of implementing a
>>> clustering service (Vicinity), both directly on top of Artery. Although it
>>> seems to work, I have some worries after delving deeper into Artery and
>>> Aeron.
>>>
>>> *Connection control*:
>>> In my use case, a node may have contacted 1000s of other nodes over time
>>> in its lifetime, while only actively using a handful (<10) in a single
>>> minute. The services I am implementing are (supposed to be) lightweight,
>>> but I see potential reasons why performance might deteriorate over time.
>>>
>>> - Aeron claims
>>> 
>>> that its number of streams shouldn't be high (never over a thousand, but
>>> ideally much lower). It is not clear to me what the costs are for Aeron if
>>> a 'connection' is not used (the linked documentation might even refer only
>>> to the Publisher/Subscribers directly connected to the MediaDriver, I'm too
>>> much of a noob to understand the docs)
>>> - Artery registers an association for each contacted remote (perhaps
>>> more state even, I may have missed stuff). Users don't get to 'close the
>>> connection' (I can see reasons why), but Artery does not seem to come with
>>> a mechanism to clean unused connections either.
>>>
>>
>> That is a missing piece that should be fixed. The reason we haven't done
>> it yet is partly an oversight and partly because it already works with Akka
>> Cluster, since those outbound streams are stopped when a member is removed
>> from the cluster (or quarantined for other reasons).
>>
>> I have created issue https://github.com/akka/akka/issues/23967
>>
>>
>>>
>>> Can you guys make an educated guess for performance drop in my use case?
>>> And if it is significant, what would you advise as a counter-measure? I
>>> could see unused-association-garbage collection as a useful addition to
>>> Remoting, I would be happy to help out if useful.
>>>
>>> *Quarantining*
>>> When remote watch fails for some remote actor system, that actor system
>>> gets quarantined. In my case, that is a bit radical, as I don't necessarily
>>> have control over either of those ActorSystems. Without the ability to
>>> reboot either ActorSystem, these systems would continue treating each other
>>> as 'down' even though the partition may have long passed. I could
>>> instantiate a failure detector explicitly instead of using context.watch,
>>> in a way that quarantining is not a consequence of failure-detection.
>>> However, it feels like I am missing something with such a simple solution.
>>> Why is quarantining as persistent as it is, if skipping it has no downside?
>>> What would you guys advise for the case where restarting actor-systems is
>>> not an option yet you would like to use failure-detection?
>>>
>>
>> Failure detection is much more robust when using Akka Cluster, since
>> Terminated message for watch is not triggered until the member is removed
>> from the cluster and not as soon as the failure 

Re: [akka-user] Connection state in Artery

2017-11-13 Thread Merlijn Boogerd
Hi Patrik,

Thanks for your fast response! 

> In general we recommend against using Akka Remoting without Akka Cluster, 
but you might have good reasons for not using Akka Cluster?

My reason to not use Cluster is because its complete view of the cluster 
members poses a scalability issue beyond a certain size. There's a paper 
 where it is 
claimed that Akka Cluster scales to 2400 nodes, but my ambition is to scale 
quite a bit beyond that. Not sure what your experiences are, but my guess 
is that Cluster would not be comfortable with millions of members. My use 
case doesn't require everything that Cluster provides either, so I trade 
some of its convenience for greater scalability.

> I have created issue https://github.com/akka/akka/issues/23967

Just for the sake of clarity, if this issue were to be resolved, you don't 
see any obvious problems in Artery/Aeron that make it a bad fit for my use 
case (using few connections at a time, but many over time)?

Kind regards,

Merlijn



Op zondag 12 november 2017 18:25:16 UTC+1 schreef Patrik Nordwall:
>
> Hi Merlijn,
>
>
>
> On Sun, Nov 12, 2017 at 5:02 PM, Merlijn Boogerd  > wrote:
>
>> Hi fellow hakkers,
>>
>> I have two questions regarding the Artery module (I am not considering 
>> the previous remoting as it will eventually get deprecated). I implemented 
>> a peer sampling service (HyParView) and am in the process of implementing a 
>> clustering service (Vicinity), both directly on top of Artery. Although it 
>> seems to work, I have some worries after delving deeper into Artery and 
>> Aeron.
>>
>> *Connection control*:
>> In my use case, a node may have contacted 1000s of other nodes over time 
>> in its lifetime, while only actively using a handful (<10) in a single 
>> minute. The services I am implementing are (supposed to be) lightweight, 
>> but I see potential reasons why performance might deteriorate over time.
>>
>> - Aeron claims 
>> 
>>  
>> that its number of streams shouldn't be high (never over a thousand, but 
>> ideally much lower). It is not clear to me what the costs are for Aeron if 
>> a 'connection' is not used (the linked documentation might even refer only 
>> to the Publisher/Subscribers directly connected to the MediaDriver, I'm too 
>> much of a noob to understand the docs)
>> - Artery registers an association for each contacted remote (perhaps more 
>> state even, I may have missed stuff). Users don't get to 'close the 
>> connection' (I can see reasons why), but Artery does not seem to come with 
>> a mechanism to clean unused connections either.
>>
>
> That is a missing piece that should be fixed. The reason we haven't done 
> it yet is partly an oversight and partly because it already works with Akka 
> Cluster, since those outbound streams are stopped when a member is removed 
> from the cluster (or quarantined for other reasons).
>
> I have created issue https://github.com/akka/akka/issues/23967
>  
>
>>
>> Can you guys make an educated guess for performance drop in my use case? 
>> And if it is significant, what would you advise as a counter-measure? I 
>> could see unused-association-garbage collection as a useful addition to 
>> Remoting, I would be happy to help out if useful.
>>
>> *Quarantining*
>> When remote watch fails for some remote actor system, that actor system 
>> gets quarantined. In my case, that is a bit radical, as I don't necessarily 
>> have control over either of those ActorSystems. Without the ability to 
>> reboot either ActorSystem, these systems would continue treating each other 
>> as 'down' even though the partition may have long passed. I could 
>> instantiate a failure detector explicitly instead of using context.watch, 
>> in a way that quarantining is not a consequence of failure-detection. 
>> However, it feels like I am missing something with such a simple solution. 
>> Why is quarantining as persistent as it is, if skipping it has no downside? 
>> What would you guys advise for the case where restarting actor-systems is 
>> not an option yet you would like to use failure-detection?
>>
>
> Failure detection is much more robust when using Akka Cluster, since 
> Terminated message for watch is not triggered until the member is removed 
> from the cluster and not as soon as the failure detector indicates that 
> there might be a problem. In general we recommend against using Akka 
> Remoting without Akka Cluster, but you might have good reasons for not 
> using Akka Cluster?
>
> The reason system messages are so important is that when using watch and 
> Terminated has been triggered we don't want the actors (and ActorSystem) to 
> come back to life again, i.e. no zombies.
>
> If these semantics of watch doesn't match what you need then I suggest 
> that you use your own heartbeating and failure detection. 
>
> /Patrik
>
>  
>
>>

Re: [akka-user] Connection state in Artery

2017-11-12 Thread Patrik Nordwall
Hi Merlijn,



On Sun, Nov 12, 2017 at 5:02 PM, Merlijn Boogerd <
merlijn.boog...@trivento.nl> wrote:

> Hi fellow hakkers,
>
> I have two questions regarding the Artery module (I am not considering the
> previous remoting as it will eventually get deprecated). I implemented a
> peer sampling service (HyParView) and am in the process of implementing a
> clustering service (Vicinity), both directly on top of Artery. Although it
> seems to work, I have some worries after delving deeper into Artery and
> Aeron.
>
> *Connection control*:
> In my use case, a node may have contacted 1000s of other nodes over time
> in its lifetime, while only actively using a handful (<10) in a single
> minute. The services I am implementing are (supposed to be) lightweight,
> but I see potential reasons why performance might deteriorate over time.
>
> - Aeron claims
> 
> that its number of streams shouldn't be high (never over a thousand, but
> ideally much lower). It is not clear to me what the costs are for Aeron if
> a 'connection' is not used (the linked documentation might even refer only
> to the Publisher/Subscribers directly connected to the MediaDriver, I'm too
> much of a noob to understand the docs)
> - Artery registers an association for each contacted remote (perhaps more
> state even, I may have missed stuff). Users don't get to 'close the
> connection' (I can see reasons why), but Artery does not seem to come with
> a mechanism to clean unused connections either.
>

That is a missing piece that should be fixed. The reason we haven't done it
yet is partly an oversight and partly because it already works with Akka
Cluster, since those outbound streams are stopped when a member is removed
from the cluster (or quarantined for other reasons).

I have created issue https://github.com/akka/akka/issues/23967


>
> Can you guys make an educated guess for performance drop in my use case?
> And if it is significant, what would you advise as a counter-measure? I
> could see unused-association-garbage collection as a useful addition to
> Remoting, I would be happy to help out if useful.
>
> *Quarantining*
> When remote watch fails for some remote actor system, that actor system
> gets quarantined. In my case, that is a bit radical, as I don't necessarily
> have control over either of those ActorSystems. Without the ability to
> reboot either ActorSystem, these systems would continue treating each other
> as 'down' even though the partition may have long passed. I could
> instantiate a failure detector explicitly instead of using context.watch,
> in a way that quarantining is not a consequence of failure-detection.
> However, it feels like I am missing something with such a simple solution.
> Why is quarantining as persistent as it is, if skipping it has no downside?
> What would you guys advise for the case where restarting actor-systems is
> not an option yet you would like to use failure-detection?
>

Failure detection is much more robust when using Akka Cluster, since
Terminated message for watch is not triggered until the member is removed
from the cluster and not as soon as the failure detector indicates that
there might be a problem. In general we recommend against using Akka
Remoting without Akka Cluster, but you might have good reasons for not
using Akka Cluster?

The reason system messages are so important is that when using watch and
Terminated has been triggered we don't want the actors (and ActorSystem) to
come back to life again, i.e. no zombies.

If these semantics of watch doesn't match what you need then I suggest that
you use your own heartbeating and failure detection.

/Patrik



>
> Thanks in advance for your insights!
>
> Kind regards,
>
> Merlijn
>
> --
> >> Read the docs: http://akka.io/docs/
> >> Check the FAQ: http://doc.akka.io/docs/akka/
> current/additional/faq.html
> >> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+unsubscr...@googlegroups.com.
> To post to this group, send email to akka-user@googlegroups.com.
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 

Patrik Nordwall
Akka Tech Lead
Lightbend  -  Reactive apps on the JVM
Twitter: @patriknw

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

[akka-user] Connection state in Artery

2017-11-12 Thread Merlijn Boogerd
Hi fellow hakkers,

I have two questions regarding the Artery module (I am not considering the 
previous remoting as it will eventually get deprecated). I implemented a 
peer sampling service (HyParView) and am in the process of implementing a 
clustering service (Vicinity), both directly on top of Artery. Although it 
seems to work, I have some worries after delving deeper into Artery and 
Aeron.

*Connection control*:
In my use case, a node may have contacted 1000s of other nodes over time in 
its lifetime, while only actively using a handful (<10) in a single minute. 
The services I am implementing are (supposed to be) lightweight, but I see 
potential reasons why performance might deteriorate over time.

- Aeron claims 
 
that its number of streams shouldn't be high (never over a thousand, but 
ideally much lower). It is not clear to me what the costs are for Aeron if 
a 'connection' is not used (the linked documentation might even refer only 
to the Publisher/Subscribers directly connected to the MediaDriver, I'm too 
much of a noob to understand the docs)
- Artery registers an association for each contacted remote (perhaps more 
state even, I may have missed stuff). Users don't get to 'close the 
connection' (I can see reasons why), but Artery does not seem to come with 
a mechanism to clean unused connections either.

Can you guys make an educated guess for performance drop in my use case? 
And if it is significant, what would you advise as a counter-measure? I 
could see unused-association-garbage collection as a useful addition to 
Remoting, I would be happy to help out if useful.

*Quarantining*
When remote watch fails for some remote actor system, that actor system 
gets quarantined. In my case, that is a bit radical, as I don't necessarily 
have control over either of those ActorSystems. Without the ability to 
reboot either ActorSystem, these systems would continue treating each other 
as 'down' even though the partition may have long passed. I could 
instantiate a failure detector explicitly instead of using context.watch, 
in a way that quarantining is not a consequence of failure-detection. 
However, it feels like I am missing something with such a simple solution. 
Why is quarantining as persistent as it is, if skipping it has no downside? 
What would you guys advise for the case where restarting actor-systems is 
not an option yet you would like to use failure-detection?

Thanks in advance for your insights!

Kind regards,

Merlijn

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.