On 18/08/15 10:57, "Julian Reschke" wrote: >On 2015-08-17 09:47, [email protected] wrote: >> Author: stefanegli >> + @Override >> + public void run() { >> + System.exit(-1); >> + } >>... > >I'm a bit concerned (and that's an understatement) that OAK is now >calling System.exit. Detecting a serious problem - good. Stopping the >content repository - probably good, at least for write operations? But >stopping the whole VM, no matter what else it runs? Seriously?
Hi Julian, The idea is indeed that if an instance fails to update the lease then it will be considered by other instances in the cluster as dead/crashed - even though it still continues to function. It is the only one that is able to detect such a situation. Imv letting the instance shutdown is at this moment the only reasonable reaction as upper level code might otherwise continue to function on the assumption it is part of the cluster - to which the other instances do not agree, the others consider this instance as died. So taking one step back: the lease becomes a vital part of the functioning of Oak indeed. I see three alternatives: a) Oak itself behaves fail-safe and does the System.exit (that¹s the path I have suggested for now) b) Oak does not do the System.exit but refuses to update anything towards the document store (thus just throws exceptions on each invocation) - and upper level code detects this situation (eg a Sling Health Check) and would do a System.exit based on how it is configured c) same as b) but upper level code does not do a System.exit (I¹m not sure if that makes sense - the instance is useless in such a situation) d) none of the above and Oak tries to rejoin the cluster and continues to function (in my view this will not result in unmanageable edge cases) Cheers, Stefan
