[jira] [Updated] (OAK-4796) Filter events before adding to ChangeProcessor's queue

2022-10-17 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-4796:

Summary: Filter events before adding to ChangeProcessor's queue  (was: 
filter events before adding to ChangeProcessor's queue)

> Filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Major
>  Labels: observation
> Fix For: 1.5.13, 1.6.0
>
> Attachments: OAK-4796.changeSet.patch, OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the 
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which 
> currently is implicitly maintained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-11-09 Thread Stefan Egli (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-4796:
-
Fix Version/s: 1.6

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6, 1.5.13
>
> Attachments: OAK-4796.changeSet.patch, OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the 
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which 
> currently is implicitly maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-10-05 Thread Stefan Egli (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-4796:
-
Attachment: OAK-4796.changeSet.patch

Attaching a second variant of the patch ([^OAK-4796.changeSet.patch]) which is 
based on Chetan's suggestion to compose a set of changes (parent-paths, 
propertyNames, nodeTypes, nodeNames) in an Editor (actually, I've used a 
ValidatorProvider/Validator pair), stores it as a property in the CommitContext 
of the CommitInfo, and evaluates it in oak-jcr's ChangeProcessor. This patch 
also includes some minimal statistics in the consolidated listener stats that 
shows how many commits were either skipped (because the feature was disabled or 
the CommitInfo null etc), included or excluded. The ObservationTest passes with 
the prefiltering enabled, however I plan to add some more specific testing 
still. 
Would welcome a review of this second approach (compared to the first which was 
EventFilter-based). /cc [~mreutegg], [~chetanm], [~mduerig], [~catholicon]

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4796.changeSet.patch, OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the 
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which 
> currently is implicitly maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-09-19 Thread Stefan Egli (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-4796:
-
Attachment: OAK-4796.patch

Attaching patch ([^OAK-4796.patch]) which contains the functionality as 
described, here's the implementation again in detail:
* An Observer can now voluntarily implement an extension interface 
{{ObserverValidatorProvider}} which has a {{Validator getRootValidator()}} 
method
* The new ObservationFilterValidatorProvider is the main integrator: it has a 
reference to all such ObserverValidatorProviders and hooks them into the 
validators via a CompositeValidator
* There's now an extension to BackgroundObserver called 
{{PrefilteringBackgroundObserver}} which implements the new 
ObserverValidatorProvider and does so by mapping the Validator interface to the 
EventFilter interface.
* The BackgroundObserver handles the new 
{{"oak.observation.observerFiltersEvaluated"}} and 
{{"oak.observation.interestedObservers"}} properties that are set on the 
CommitContext and if it figures a listener does *not* need any event for a 
commit, marks the *next non filtered* ContentChange with a new 
{{noopPreviousRoot}} (which is a pointer to the last filtered root)
** This noopPreviousRoot is then used to send out a new 
{{CommitInfo.NOOP_CHANGE}} token, which indicates to Observers that they should 
ignore this contentChanged call (but update the root/previousRoot accordingly).
* oak-jcr's ChangeProcessor now uses such a PrefilteringBackgroundObserver and 
does the last puzzle of filtering: it filters the entire change via 
{{includeCommit}} (which applies things like isExternal/isInternal)
* to limit potential performance effects the ChangeProcessor has a flag that 
control the size of the queue after which prefiltering will be done. Default 20.

Additionally, the patch can be used in 'test' mode, in which case a flag is set 
but BackgroundObserver doesn't evaluate it - instead the ChangeProcessor checks 
if the flag would have been correct. This test mode would be removed after 
enough confidence, but could be used in IT testing to verify that filtering 
would have been done correctly.

Pending tasks: 
* more test cases, IT
* performance testing

I would appreciate feedback/review of those having been involved in 
observation. /cc [~mduerig], [~chetanm], [~catholicon], [~mreutegg]

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)

[jira] [Updated] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-09-12 Thread Stefan Egli (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-4796:
-
Affects Version/s: 1.5.9

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the 
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which 
> currently is implicitly maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-09-12 Thread Stefan Egli (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-4796:
-
Fix Version/s: 1.6

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the 
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which 
> currently is implicitly maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)