On 2015-08-18 11:14, Stefan Egli wrote:
On 18/08/15 10:57, "Julian Reschke" wrote:
...
Hi Julian,
The idea is indeed that if an instance fails to update the lease then it
will be considered by other instances in the cluster as dead/crashed -
even though it still continues to function. It is the only one that is
able to detect such a situation. Imv letting the instance shutdown is at
this moment the only reasonable reaction as upper level code might
otherwise continue to function on the assumption it is part of the cluster
- to which the other instances do not agree, the others consider this
instance as died.
So taking one step back: the lease becomes a vital part of the functioning
of Oak indeed.
I see three alternatives:
a) Oak itself behaves fail-safe and does the System.exit (that¹s the path
I have suggested for now)
b) Oak does not do the System.exit but refuses to update anything towards
the document store (thus just throws exceptions on each invocation) - and
upper level code detects this situation (eg a Sling Health Check) and
would do a System.exit based on how it is configured
c) same as b) but upper level code does not do a System.exit (I¹m not sure
if that makes sense - the instance is useless in such a situation)
d) none of the above and Oak tries to rejoin the cluster and continues to
function (in my view this will not result in unmanageable edge cases)
...
Yes, we need to think about how to stop Oak in this case. However I do
not think that stopping the *VM* is something we can do here. Keep in
mind that there might be many other things running in the VM which have
nothing to do with the content repository.
Best regards, Julian