Stefan Egli created OAK-4796:
--------------------------------

             Summary: filter events before adding to ChangeProcessor's queue
                 Key: OAK-4796
                 URL: https://issues.apache.org/jira/browse/OAK-4796
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: jcr
            Reporter: Stefan Egli


Currently the 
[ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
 is in charge of doing the event diffing and filtering and does so in a pooled 
Thread, ie asynchronously, at a later stage independent from the commit. This 
has the advantage that the commit is fast, but has the following potentially 
negative effects:
# events (in the form of ContentChange Objects) occupy a slot of the queue even 
if the listener is not interested in it - any commit lands on any listener's 
queue. This reduces the capacity of the queue for 'actual' events to be 
delivered. It therefore increases the risk that the queue fills - and when full 
has various consequences such as loosing the CommitInfo etc.
# each event==ContentChange later on must be evaluated, and for that a diff 
must be calculated. Depending on runtime behavior that diff might be expensive 
if no longer in the cache (documentMk specifically).

As an improvement, this diffing+filtering could be done at an earlier stage 
already, nearer to the commit, and in case the filter would ignore the event, 
it would not have to be put into the queue at all, thus avoiding occupying a 
slot and later potentially slower diffing.

The suggestion is to implement this via the following algorithm:

* During the commit, in a {{Validator}} the listener's filters are evaluated - 
in an as-efficient-as-possible manner (Reason for doing it in a Validator is 
that this doesn't add overhead as oak already goes through all changes for 
other Validators). As a result a _list of potentially affected observers_ is 
added to the {{CommitInfo}} (false positives are fine).
** Note that the above adds cost to the commit and must therefore be carefully 
done and measured
** One potential measure could be to only do filtering when listener's queues 
are larger than a certain threshold (eg 10)
* The ChangeProcessor in {{contentChanged}} (in the one created in 
[createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
 then checks the new commitInfo's _potentially affected observers_ list and if 
it's not in the list, adds a {{NOOP}} token at the end of the queue. If there's 
already a NOOP there, the two are collapsed (this way when a filter is not 
affected it would have a NOOP at the end of the queue). If later on a no-NOOP 
item is added, the NOOP's {{root}} is used as the {{previousRoot}} for the 
newly added {{ContentChange}} obj.
** To achieve that, the ContentChange obj is extended to not only have the "to" 
{{root}} pointer, but also the "from" {{previousRoot}} pointer which currently 
is implicitly maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to