[ 
https://issues.apache.org/jira/browse/OAK-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393739#comment-15393739
 ] 

Marcel Reutegger commented on OAK-4528:
---------------------------------------

The description of this issue already describes roughly what happens when an 
observation queue fills up and the system can get into a state where the queue 
never gets back to zero. Here are some more details what exactly happens:

There are usually two situations that can lead to growing observation queues:

- The commit rate is high for some time, adding changes to the observation 
queues and the rate of dequeuing them is lower.
- There is a large commit put into the observation queue. This means generating 
and delivering events for this large change set will take time. During this 
time more commits are performed, growing the queue until the the big change set 
is processed.

Given the observation queue does not hit the maximum and the diff cache still 
contains the pending changes in queues, the system is usually able to recover.

It gets more problematic when the queue hits the maximum size. At this point 
adding new changes to the queue will consolidate the most recent change sets 
and remove the commit info. When the change processor looks at this 
consolidated change set, the state comparison cannot be answered from the 
pre-populated cache anymore and differences must be calculated.

The cost for comparing two node states with the previous implementation is 
primarily a function of how old the two node states are. The data model of the 
DocumentNodeStore does not store each implicit modification of a node. That is, 
if a node {{/foo/bar/baz}} is added, {{/foo/bar}}, {{/foo}} and {{/}} are not 
immediately written. Their {{_lastRev}} entry is only updated in-memory first 
and written back asynchronously later. It is therefore impossible to exactly 
tell when a node was implicitly modified in the past.

Consider {{/foo}} with a {{_lastRev}} of {{r123-0-1}} and a comparison of the 
same node in revision {{r105-0-1}} and {{r117-0-1}}. Whether the node was 
modified between the two revisions is not available in the underlying 
{{NodeDocument}}. The previous implementation didn't have a choice but to 
traverse down the tree and check whether there are changes on the child nodes. 
In the worst case, the entire (potentially large) subtree needs to be traversed 
only to find out that nothing changed.

With this improvement, the implementation reads matching entries from the 
journal and can stop traversal much earlier when there were no changes between 
the two revisions at that path.

> diff calculation in DocumentNodeStore should try to re-use journal info on 
> diff cache miss
> ------------------------------------------------------------------------------------------
>
>                 Key: OAK-4528
>                 URL: https://issues.apache.org/jira/browse/OAK-4528
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, documentmk
>            Reporter: Vikas Saurabh
>            Assignee: Marcel Reutegger
>            Priority: Minor
>              Labels: observation, resilience
>             Fix For: 1.6, 1.5.6
>
>
> Currently, diff information is filled into caches actively (local commits 
> pushed in local_diff, externally read changes pushed into memory_diff). At 
> the time of event processing though, the entries could have already been 
> evicted.
> In that case, we fall back to computing diff by comparing 2 node-states which 
> becomes more and more expensive (and eventually fairly non-recoverable 
> leading to OAK-2683).
> To improve the situation somewhat, we can probably try to consult journal 
> entries to read a smaller-superset of changed paths before falling down to 
> comparison.
> /cc [~mreutegg], [~chetanm], [~egli]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to