Re: [controller-dev] ODL abrupt restart - System.exit() via QuarantinedMonitorActorPropsFactory ?

2018-07-05 Thread Muthukumaran K
Hi Michael,

Quarantine is the state when akka system level messages could not be exchanged 
across the nodes – these include but not limited to heartbeats, remote 
deathwatch, node state updates etc.

This article https://livingston.io/understanding-akkas-quarantine-state/ gives 
a fair idea

Some pointers on what could cause this are discussed here
https://groups.google.com/forum/#!searchin/akka-user/quarantine|sort:date/akka-user/6cmA1RzE4-s/IaHxhxLhEgAJ

We have seen the suicide in past earlier during long stop-the world type GCs as 
well as *deliberate* (for testing purposes) interface-down / up for 2550 …

Haven’t tested this behavior on master yet ..

Regards
Muthu




From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Michael 
Vorburger
Sent: Thursday, July 05, 2018 11:12 PM
To: Tom Pantelis 
Cc: Sridhar Gaddam ; Kitt, Stephen ; 
controller-dev 
Subject: Re: [controller-dev] ODL abrupt restart - System.exit() via 
QuarantinedMonitorActorPropsFactory ?

On Thu, Jul 5, 2018 at 7:39 PM, Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:
On Thu, Jul 5, 2018 at 1:35 PM, Michael Vorburger 
mailto:vorbur...@redhat.com>> wrote:
Tom, or Robert, or anyone else having hit this themselves,

would you be able to remind us what in clustering can cause an ODL abrupt 
restart - System.exit() via bundleContext.getBundle(0).stop(); from 
https://github.com/opendaylight/controller/blob/master/opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/akka/osgi/impl/QuarantinedMonitorActorPropsFactory.java
 ?

I do vaguely an "inconsistent cluster" leading to this - clarify exactly what 
situation leads to that? Loss of leader? Loss of majority?

asking for https://bugzilla.redhat.com/show_bug.cgi?id=1597304 ...

That happens when akka quarantines a node - it can no longer rejoin the 
majority cluster unless the actor system is restarted, hence we restart the 
whole JVM.

and what can cause Akka to have to quarantine a node?

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] ODL abrupt restart - System.exit() via QuarantinedMonitorActorPropsFactory ?

2018-07-05 Thread Ajay Lele
On Thu, Jul 5, 2018 at 10:45 AM, Tom Pantelis  wrote:

>
>
> On Thu, Jul 5, 2018 at 1:42 PM, Michael Vorburger 
> wrote:
>
>> On Thu, Jul 5, 2018 at 7:39 PM, Tom Pantelis 
>> wrote:
>>
>>> On Thu, Jul 5, 2018 at 1:35 PM, Michael Vorburger 
>>> wrote:
>>>
 Tom, or Robert, or anyone else having hit this themselves,

 would you be able to remind us what in clustering can cause an ODL
 abrupt restart - System.exit() via bundleContext.getBundle(0).stop();
 from https://github.com/opendaylight/controller/blob/master/opend
 aylight/md-sal/sal-distributed-datastore/src/main/java/org/o
 pendaylight/controller/cluster/akka/osgi/impl/QuarantinedMon
 itorActorPropsFactory.java ?

 I do vaguely an "inconsistent cluster" leading to this - clarify
 exactly what situation leads to that? Loss of leader? Loss of majority?

 asking for https://bugzilla.redhat.com/show_bug.cgi?id=1597304 ...

>>>
>>> That happens when akka quarantines a node - it can no longer rejoin the
>>> majority cluster unless the actor system is restarted, hence we restart the
>>> whole JVM.
>>>
>>
>> and what can cause Akka to have to quarantine a node?
>>
>
>
> An unrecoverable failure state - see https://livingston.io/
> understanding-akkas-quarantine-state/ for more detail.
>

The most common cause is nodes getting isolated for a considerable amount
of time


>
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] ODL abrupt restart - System.exit() via QuarantinedMonitorActorPropsFactory ?

2018-07-05 Thread Tom Pantelis
On Thu, Jul 5, 2018 at 1:42 PM, Michael Vorburger 
wrote:

> On Thu, Jul 5, 2018 at 7:39 PM, Tom Pantelis 
> wrote:
>
>> On Thu, Jul 5, 2018 at 1:35 PM, Michael Vorburger 
>> wrote:
>>
>>> Tom, or Robert, or anyone else having hit this themselves,
>>>
>>> would you be able to remind us what in clustering can cause an ODL
>>> abrupt restart - System.exit() via bundleContext.getBundle(0).stop();
>>> from https://github.com/opendaylight/controller/blob/master/opend
>>> aylight/md-sal/sal-distributed-datastore/src/main/java/org/
>>> opendaylight/controller/cluster/akka/osgi/impl/Quarant
>>> inedMonitorActorPropsFactory.java ?
>>>
>>> I do vaguely an "inconsistent cluster" leading to this - clarify exactly
>>> what situation leads to that? Loss of leader? Loss of majority?
>>>
>>> asking for https://bugzilla.redhat.com/show_bug.cgi?id=1597304 ...
>>>
>>
>> That happens when akka quarantines a node - it can no longer rejoin the
>> majority cluster unless the actor system is restarted, hence we restart the
>> whole JVM.
>>
>
> and what can cause Akka to have to quarantine a node?
>


An unrecoverable failure state - see
https://livingston.io/understanding-akkas-quarantine-state/ for more
detail.
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] ODL abrupt restart - System.exit() via QuarantinedMonitorActorPropsFactory ?

2018-07-05 Thread Michael Vorburger
On Thu, Jul 5, 2018 at 7:39 PM, Tom Pantelis  wrote:

> On Thu, Jul 5, 2018 at 1:35 PM, Michael Vorburger 
> wrote:
>
>> Tom, or Robert, or anyone else having hit this themselves,
>>
>> would you be able to remind us what in clustering can cause an ODL abrupt
>> restart - System.exit() via bundleContext.getBundle(0).stop(); from
>> https://github.com/opendaylight/controller/blob/master/
>> opendaylight/md-sal/sal-distributed-datastore/src/main
>> /java/org/opendaylight/controller/cluster/akka/osgi/impl/Qua
>> rantinedMonitorActorPropsFactory.java ?
>>
>> I do vaguely an "inconsistent cluster" leading to this - clarify exactly
>> what situation leads to that? Loss of leader? Loss of majority?
>>
>> asking for https://bugzilla.redhat.com/show_bug.cgi?id=1597304 ...
>>
>
> That happens when akka quarantines a node - it can no longer rejoin the
> majority cluster unless the actor system is restarted, hence we restart the
> whole JVM.
>

and what can cause Akka to have to quarantine a node?
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] ODL abrupt restart - System.exit() via QuarantinedMonitorActorPropsFactory ?

2018-07-05 Thread Tom Pantelis
On Thu, Jul 5, 2018 at 1:35 PM, Michael Vorburger 
wrote:

> Tom, or Robert, or anyone else having hit this themselves,
>
> would you be able to remind us what in clustering can cause an ODL abrupt
> restart - System.exit() via bundleContext.getBundle(0).stop(); from
> https://github.com/opendaylight/controller/blob/
> master/opendaylight/md-sal/sal-distributed-datastore/src/
> main/java/org/opendaylight/controller/cluster/akka/osgi/impl/
> QuarantinedMonitorActorPropsFactory.java ?
>
> I do vaguely an "inconsistent cluster" leading to this - clarify exactly
> what situation leads to that? Loss of leader? Loss of majority?
>
> asking for https://bugzilla.redhat.com/show_bug.cgi?id=1597304 ...
>

That happens when akka quarantines a node - it can no longer rejoin the
majority cluster unless the actor system is restarted, hence we restart the
whole JVM.


>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] Understading CDS

2018-07-05 Thread Robert Varga
Hello Josh, everyone,

when trying to understand what CDS does and how it does it, there are
concepts and technologies that must be understood -- all relating to
distributed systems and state management theory.

Specific topics:
- Actor systems, with Akka being an implementation
- Akka Clustering
- Akka Persistence
- The RAFT algorithm (and distributed consensus in general, like 3PC)
- Multiversion Concurrency Control (as a solution to the problem of
concurrency control)

All of these are things that cannot be explained in minutes and all have
bearing on architecture of CDS as well as trade-offs taken in its design
and implementation.

If we try to have a conversation about the CDS without sharing this
common knowledge, that conversation will be utterly inefficient with
frequent and long digressions into those topics -- which is something I
(and I suspect Tom) can ill afford.

Regards,
Robert



signature.asc
Description: OpenPGP digital signature
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] ODL abrupt restart - System.exit() via QuarantinedMonitorActorPropsFactory ?

2018-07-05 Thread Michael Vorburger
Tom, or Robert, or anyone else having hit this themselves,

would you be able to remind us what in clustering can cause an ODL abrupt
restart - System.exit() via bundleContext.getBundle(0).stop(); from
https://github.com/opendaylight/controller/blob/master/opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/akka/osgi/impl/QuarantinedMonitorActorPropsFactory.java
?

I do vaguely an "inconsistent cluster" leading to this - clarify exactly
what situation leads to that? Loss of leader? Loss of majority?

asking for https://bugzilla.redhat.com/show_bug.cgi?id=1597304 ...

Tx,
M.
--
Michael Vorburger, Red Hat
vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev