Whatever the long term solution will be: we need a short term solution
that doesn't kill an entire application server, so +1.

On 09.09.15 14:12, Stefan Egli wrote:
> Hi all,
>
> I'd like to follow up on the idea to restart DocumentNodeStore as a result
> of a lease failure [0]: I suggest we don't do that and instead just stop
> the oak-core bundle.
>
> After some prototyping and running into OAK-3373 [1] I'm no longer sure if
> restarting the DocumentNodeStore is a feasible path to go, esp in the
> short term. The problem encountered so far is that Observers cannot be
> easily switched from old to (restarted/)new store due to:
>
>  * as pointed out by MichaelD they could have a backlog yet to process
> towards the old store - which they cannot access anymore as that one would
> be forcibly closed
>  * there is not yet a proper way to switch from old to new ('reset') - esp
> is there a risk that there could be a gap (this part we might be able to
> fix though, not sure)
>  * both above carry the risk that Observers miss some changes - something
> which would be unacceptable I guess.
>
> I think the more kiss approach would be to just forcibly close the
> DocumentNodeStore - or actually to stop the entire oak-core bundle - with
> appropriate errors logged so that the issue becomes clear. The instance
> would basically become unusable, mostly, but at least it would not be a
> System.exit.
>
> What do ppl think?
>
> Cheers,
> Stefan
> --
> [0] https://issues.apache.org/jira/browse/OAK-3250
> [1] https://issues.apache.org/jira/browse/OAK-3373
>
> On 18/08/15 16:45, "Stefan Egli" <e...@adobe.com> wrote:
>
>> I've created OAK-3250 to follow up on the DocumentNodeStore-restart idea.
>>
>> Cheers,
>> Stefan
>> --
>> https://issues.apache.org/jira/browse/OAK-3250
>>
>> On 18/08/15 15:59, "Marcel Reutegger" <mreut...@adobe.com> wrote:
>>
>>> On 18/08/15 15:38, "Stefan Egli" wrote:
>>>> On 18/08/15 13:43, "Marcel Reutegger" <mreut...@adobe.com> wrote:
>>>>> On 18/08/15 11:14, "Stefan Egli" wrote:
>>>>>> b) Oak does not do the System.exit but refuses to update anything
>>>>>> towards
>>>>>> the document store (thus just throws exceptions on each invocation) -
>>>>>> and
>>>>>> upper level code detects this situation (eg a Sling Health Check) and
>>>>>> would do a System.exit based on how it is configured
>>>>>>
>>>>>> c) same as b) but upper level code does not do a System.exit (I¹m not
>>>>>> sure
>>>>>> if that makes sense - the instance is useless in such a situation)
>>>>> either b) or c) sounds reasonable to me.
>>>>>
>>>>> but if possible I'd like to avoid a System.exit(). would it be possible
>>>>> to detect this situation in the DocumentNodeStoreService and restart
>>>>> the DocumentNodeStore without the need to restart the JVM
>>>> Good point. Perhaps restarting DocumentNodeStore is a valid alternative
>>>> indeed. Is that feasible from a DocumentNodeStore point of view?
>>> it probably requires some changes to the DocumentNodeStore, because
>>> we want it to tear down without doing any of the cleanup it
>>> may otherwise perform. it must not release the cluster node info
>>> nor update pending _lastRevs, etc.
>>>
>>>> What would be the consequences of a restarted DocumentNodeStore?
>>> to the DocumentNodeStore it will look like it was killed and it will
>>> perform recovery (e.g. for the pending _lastRevs).
>>>
>>> Regards
>>> Marcel
>>>
>

Reply via email to