Hi all,

I'd like to follow up on the idea to restart DocumentNodeStore as a result
of a lease failure [0]: I suggest we don't do that and instead just stop
the oak-core bundle.

After some prototyping and running into OAK-3373 [1] I'm no longer sure if
restarting the DocumentNodeStore is a feasible path to go, esp in the
short term. The problem encountered so far is that Observers cannot be
easily switched from old to (restarted/)new store due to:

 * as pointed out by MichaelD they could have a backlog yet to process
towards the old store - which they cannot access anymore as that one would
be forcibly closed
 * there is not yet a proper way to switch from old to new ('reset') - esp
is there a risk that there could be a gap (this part we might be able to
fix though, not sure)
 * both above carry the risk that Observers miss some changes - something
which would be unacceptable I guess.

I think the more kiss approach would be to just forcibly close the
DocumentNodeStore - or actually to stop the entire oak-core bundle - with
appropriate errors logged so that the issue becomes clear. The instance
would basically become unusable, mostly, but at least it would not be a
System.exit.

What do ppl think?

Cheers,
Stefan
--
[0] https://issues.apache.org/jira/browse/OAK-3250
[1] https://issues.apache.org/jira/browse/OAK-3373

On 18/08/15 16:45, "Stefan Egli" <[email protected]> wrote:

>I've created OAK-3250 to follow up on the DocumentNodeStore-restart idea.
>
>Cheers,
>Stefan
>--
>https://issues.apache.org/jira/browse/OAK-3250
>
>On 18/08/15 15:59, "Marcel Reutegger" <[email protected]> wrote:
>
>>On 18/08/15 15:38, "Stefan Egli" wrote:
>>>On 18/08/15 13:43, "Marcel Reutegger" <[email protected]> wrote:
>>>>On 18/08/15 11:14, "Stefan Egli" wrote:
>>>>>b) Oak does not do the System.exit but refuses to update anything
>>>>>towards
>>>>>the document store (thus just throws exceptions on each invocation) -
>>>>>and
>>>>>upper level code detects this situation (eg a Sling Health Check) and
>>>>>would do a System.exit based on how it is configured
>>>>>
>>>>>c) same as b) but upper level code does not do a System.exit (I¹m not
>>>>>sure
>>>>>if that makes sense - the instance is useless in such a situation)
>>>>
>>>>either b) or c) sounds reasonable to me.
>>>>
>>>>but if possible I'd like to avoid a System.exit(). would it be possible
>>>>to detect this situation in the DocumentNodeStoreService and restart
>>>>the DocumentNodeStore without the need to restart the JVM
>>>
>>>Good point. Perhaps restarting DocumentNodeStore is a valid alternative
>>>indeed. Is that feasible from a DocumentNodeStore point of view?
>>
>>it probably requires some changes to the DocumentNodeStore, because
>>we want it to tear down without doing any of the cleanup it
>>may otherwise perform. it must not release the cluster node info
>>nor update pending _lastRevs, etc.
>>
>>> What would be the consequences of a restarted DocumentNodeStore?
>>
>>to the DocumentNodeStore it will look like it was killed and it will
>>perform recovery (e.g. for the pending _lastRevs).
>>
>>Regards
>> Marcel
>>
>


Reply via email to