[
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Mueller updated OAK-4796:
--------------------------------
Summary: Filter events before adding to ChangeProcessor's queue (was:
filter events before adding to ChangeProcessor's queue)
> Filter events before adding to ChangeProcessor's queue
> ------------------------------------------------------
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: jcr
> Affects Versions: 1.5.9
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Priority: Major
> Labels: observation
> Fix For: 1.5.13, 1.6.0
>
> Attachments: OAK-4796.changeSet.patch, OAK-4796.patch
>
>
> Currently the
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
> is in charge of doing the event diffing and filtering and does so in a
> pooled Thread, ie asynchronously, at a later stage independent from the
> commit. This has the advantage that the commit is fast, but has the following
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue
> even if the listener is not interested in it - any commit lands on any
> listener's queue. This reduces the capacity of the queue for 'actual' events
> to be delivered. It therefore increases the risk that the queue fills - and
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff
> must be calculated. Depending on runtime behavior that diff might be
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage
> already, nearer to the commit, and in case the filter would ignore the event,
> it would not have to be put into the queue at all, thus avoiding occupying a
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator
> is that this doesn't add overhead as oak already goes through all changes for
> other Validators). As a result a _list of potentially affected observers_ is
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
> then checks the new commitInfo's _potentially affected observers_ list and
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If
> there's already a NOOP there, the two are collapsed (this way when a filter
> is not affected it would have a NOOP at the end of the queue). If later on a
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}}
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which
> currently is implicitly maintained.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)