[ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507925#comment-15507925
 ] 

Michael Dürig commented on OAK-4796:
------------------------------------

In addition to Marcel's concerns I'm a bit worried of using the commit context 
to loop things through. I know the commit context has been introduced somewhere 
else. But still is represents shared mutable state and is passed around pretty 
much like a global variable. While it seemed like a quick win when we 
introduced it (and also now), it might well turn out to be a (scalability) 
blocker further down the road. Without careful considerations it will e.g. not 
allow us to scale the commit hooks out and run them in parallel. 

> filter events before adding to ChangeProcessor's queue
> ------------------------------------------------------
>
>                 Key: OAK-4796
>                 URL: https://issues.apache.org/jira/browse/OAK-4796
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: jcr
>    Affects Versions: 1.5.9
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>              Labels: observation
>             Fix For: 1.6
>
>         Attachments: OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the 
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which 
> currently is implicitly maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to