[jira] [Comment Edited] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-10-10 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561548#comment-15561548
 ] 

Stefan Egli edited comment on OAK-4796 at 10/10/16 1:52 PM:


split up this task into 3 sub-tasks, things will be thus followed up there:
* OAK-4907 : collect ChangeSet in a validator, store in CommitContext
* OAK-4916 : add filter capability to BackgroundObserver - thus introduce 
{{NOOP_CHANGE}}
* OAK-4908 : add 'exclusion' support to BackgroundObserver and ChangeProcessor 
to use that for best-effort-prefiltering


was (Author: egli):
split up this task into 3 sub-tasks, things will be thus followed up there:
* OAK-4907 : collect ChangeSet in a validator, store in CommitContext
* OAK-4916 : add filter capability to BackgroundObserver - thus introduce 
{{NOOP_CHANGED}}
* OAK-4908 : add 'exclusion' support to BackgroundObserver and ChangeProcessor 
to use that for best-effort-prefiltering

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4796.changeSet.patch, OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the 
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which 
> currently is implicitly maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-10-10 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561548#comment-15561548
 ] 

Stefan Egli edited comment on OAK-4796 at 10/10/16 1:51 PM:


split up this task into 3 sub-tasks, things will be thus followed up there:
* OAK-4907 : collect ChangeSet in a validator, store in CommitContext
* OAK-4916 : add filter capability to BackgroundObserver - thus introduce 
{{NOOP_CHANGED}}
* OAK-4908 : add 'exclusion' support to BackgroundObserver and ChangeProcessor 
to use that for best-effort-prefiltering


was (Author: egli):
split up this task into 2 sub-tasks, things will be thus followed up there:
* OAK-4907 : collect ChangeSet in a validator, store in CommitContext
* OAK-4916 : add filter capability to BackgroundObserver - thus introduce 
{{NOOP_CHANGED}}
* OAK-4908 : add 'exclusion' support to BackgroundObserver and ChangeProcessor 
to use that for best-effort-prefiltering

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4796.changeSet.patch, OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the 
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which 
> currently is implicitly maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-10-10 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561548#comment-15561548
 ] 

Stefan Egli edited comment on OAK-4796 at 10/10/16 1:51 PM:


split up this task into 2 sub-tasks, things will be thus followed up there:
* OAK-4907 : collect ChangeSet in a validator, store in CommitContext
* OAK-4916 : add filter capability to BackgroundObserver - thus introduce 
{{NOOP_CHANGED}}
* OAK-4908 : add 'exclusion' support to BackgroundObserver and ChangeProcessor 
to use that for best-effort-prefiltering


was (Author: egli):
split up this task into 2 sub-tasks, things will be thus followed up there:
* OAK-4907 : collect ChangeSet in a validator, store in CommitContext
* OAK-4908 : add 'exclusion' support to BackgroundObserver and ChangeProcessor 
to use that for best-effort-prefiltering

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4796.changeSet.patch, OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the 
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which 
> currently is implicitly maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-10-05 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15548765#comment-15548765
 ] 

Stefan Egli edited comment on OAK-4796 at 10/5/16 2:14 PM:
---

bq. This is wrong. We need to either do prefiltering or postfiltering and can't 
mix the two. Therefore for prefiltering it's essential to pass around the 
applied filter in the ContentChange obj and use that later at delivery time.
Coming back to this point, there seems to be some issues with this based on the 
current design: Prior to prefiltering we had only postfiltering. And changing 
the FilterProvider was applied immediately - basically on all elements in the 
queue. With prefiltering this is, as pointed out, not correct: those elements 
in the queue already have gone through prefiltering, so postfiltering should be 
done with the same FilterProvider. Which means, the ChangeProcessor - which is 
in charge of postfiltering - should not use the FilterProvider set on its 
instance, but use the same that was used for prefiltering. Therefore the 
ChangeProcessor needs to be given the FilterProvider for each change that it 
processes. The way it receives changes though is via the 
Observer.contentChanged. Therefore about the only feasible place to pass the 
FilterProvider from BackgroundObserver to ChangeProcessor is via the CommitInfo.

Thing now is that for external and overflow entries the CommitInfo is null. So 
I'd say, as long as that's the case it's very hard to implement correctly 
switching the filter.

Unless this switch is done correctly, the only thing that can be said is that: 
when a filter is changed and the queue is not empty, then both filters are 
applied. However the listener doesn't know anything about the queue internas, 
so cannot make any conclusions based on that.


was (Author: egli):
bq. This is wrong. We need to either do prefiltering or postfiltering and can't 
mix the two. Therefore for prefiltering it's essential to pass around the 
applied filter in the ContentChange obj and use that later at delivery time.
Coming back to this point, there seems to be some issues with this based on the 
current design: Prior to prefiltering we had only postfiltering. And changing 
the FilterProvider was applied immediately - basically on all elements in the 
queue. With prefiltering this is, as pointed out, not correct: those elements 
in the queue already have gone through prefiltering, so postfiltering should be 
done with the same FilterProvider. Which means, the ChangeProcessor - which is 
in charge of postfiltering - should not use the FilterProvider set on its 
instance, but use the same that was used for prefiltering. Therefore the 
ChangeProcessor needs to be given the FilterProvider for each change that it 
processes. The way it receives changes though is via the 
Observer.contentChanged. Therefore about the only feasible place to pass the 
FilterProvider from BackgroundObserver to ChangeProcessor is via the CommitInfo.

Thing now is that for external and overflow entries the CommitInfo is null. So 
I'd say, as long as that's the case it's very hard to implement correctly 
switching the filter.

Unless this switch is done correctly, the only thing that can be said is that: 
when a filter is changed it is undefined for which changes both filters are 
applied (if the queue is not empty when switching).

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4796.changeSet.patch, OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculate

[jira] [Comment Edited] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-10-05 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15548765#comment-15548765
 ] 

Stefan Egli edited comment on OAK-4796 at 10/5/16 2:11 PM:
---

bq. This is wrong. We need to either do prefiltering or postfiltering and can't 
mix the two. Therefore for prefiltering it's essential to pass around the 
applied filter in the ContentChange obj and use that later at delivery time.
Coming back to this point, there seems to be some issues with this based on the 
current design: Prior to prefiltering we had only postfiltering. And changing 
the FilterProvider was applied immediately - basically on all elements in the 
queue. With prefiltering this is, as pointed out, not correct: those elements 
in the queue already have gone through prefiltering, so postfiltering should be 
done with the same FilterProvider. Which means, the ChangeProcessor - which is 
in charge of postfiltering - should not use the FilterProvider set on its 
instance, but use the same that was used for prefiltering. Therefore the 
ChangeProcessor needs to be given the FilterProvider for each change that it 
processes. The way it receives changes though is via the 
Observer.contentChanged. Therefore about the only feasible place to pass the 
FilterProvider from BackgroundObserver to ChangeProcessor is via the CommitInfo.

Thing now is that for external and overflow entries the CommitInfo is null. So 
I'd say, as long as that's the case it's very hard to implement correctly 
switching the filter.

Unless this switch is done correctly, the only thing that can be said is that: 
when a filter is changed it is undefined for which changes both filters are 
applied (if the queue is not empty when switching).


was (Author: egli):
bq. This is wrong. We need to either do prefiltering or postfiltering and can't 
mix the two. Therefore for prefiltering it's essential to pass around the 
applied filter in the ContentChange obj and use that later at delivery time.
Coming back to this point, there seems to be some issues with this based on the 
current design: Prior to prefiltering we had only postfiltering. And changing 
the FilterProvider was applied immediately - basically on all elements in the 
queue. With prefiltering this is, as pointed out, not correct: those elements 
in the queue already have gone through prefiltering, so postfiltering should be 
done with the same FilterProvider. Which means, the ChangeProcessor - which is 
in charge of postfiltering - should not use the FilterProvider set on its 
instance, but use the same that was used for prefiltering. Therefore the 
ChangeProcessor needs to be given the FilterProvider for each change that it 
processes. The way it receives changes though is via the 
Observer.contentChanged. Therefore about the only feasible place to pass the 
FilterProvider from BackgroundObserver to ChangeProcessor is via the CommitInfo.

Thing now is that for external and overflow entries the CommitInfo is null. So 
I'd say, as long as that's the case it's very hard to implement correctly 
switching the filter.

Unless this switch is done correctly, the only thing that can be said is that: 
when a filter is changed it is undefined if the old, the new or both filters 
are applied to entries in the queue.

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4796.changeSet.patch, OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the c

[jira] [Comment Edited] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-09-29 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532623#comment-15532623
 ] 

Stefan Egli edited comment on OAK-4796 at 9/29/16 3:24 PM:
---

As discussed offline with Marcel, I'll work on a patch for the 2nd variant, so 
that we can compare the complexity/result. 

Also realized that the moment when filtering is applied is critical: 
prefiltering (in a CommitHook or Observer) might be applied with a filter A, 
which could potentially be changed to A' before the event is delivered. 
Currently though before delivering, the new filter A' would be applied. This is 
wrong. We need to either do prefiltering or postfiltering and can't mix the 
two. Therefore for prefiltering it's essential to pass around the applied 
filter in the ContentChange obj and use that later at delivery time.


was (Author: egli):
As discussed offline with Marcel, I'll work on a patch for the 2nd variant, so 
that we can compare the complexity/result.

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to not only have the 
> "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which 
> currently is implicitly maintained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-09-20 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15506572#comment-15506572
 ] 

Stefan Egli edited comment on OAK-4796 at 9/20/16 1:42 PM:
---

[~chetanm], I see, your approach is completely different. Main differences I 
see:
# ObserverValidatorProvider approach:
#* 100% filtering of local and external events (whereas the external part is 
not yet implemented, actually, but would be similar)
#* for external events the diffing is done as today, so no performance 
improvements there. But we can also filter entire external events for the 
individual listeners as for local ones, just at a different location (in the 
backgroundRead somewhere)
# Extracted-Data approach:
#* not 100% filtering, but perhaps close
#* makes diffing for other instances in the cluster cheaper

so... let's decide which one to go for. 


was (Author: egli):
[~chetanm], I see, your approach is completely different. Main differences I 
see:
# ObserverValidatorProvider approach:
* 100% filtering of local and external events (whereas the external part is not 
yet implemented, actually, but would be similar)
* for external events the diffing is done as today, so no performance 
improvements there. But we can also filter entire external events for the 
individual listeners as for local ones, just at a different location (in the 
backgroundRead somewhere)
# Extracted-Data approach:
* not 100% filtering, but perhaps close
* makes diffing for other instances in the cluster cheaper

so... let's decide which one to go for. 

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative effects:
> # events (in the form of ContentChange Objects) occupy a slot of the queue 
> even if the listener is not interested in it - any commit lands on any 
> listener's queue. This reduces the capacity of the queue for 'actual' events 
> to be delivered. It therefore increases the risk that the queue fills - and 
> when full has various consequences such as loosing the CommitInfo etc.
> # each event==ContentChange later on must be evaluated, and for that a diff 
> must be calculated. Depending on runtime behavior that diff might be 
> expensive if no longer in the cache (documentMk specifically).
> As an improvement, this diffing+filtering could be done at an earlier stage 
> already, nearer to the commit, and in case the filter would ignore the event, 
> it would not have to be put into the queue at all, thus avoiding occupying a 
> slot and later potentially slower diffing.
> The suggestion is to implement this via the following algorithm:
> * During the commit, in a {{Validator}} the listener's filters are evaluated 
> - in an as-efficient-as-possible manner (Reason for doing it in a Validator 
> is that this doesn't add overhead as oak already goes through all changes for 
> other Validators). As a result a _list of potentially affected observers_ is 
> added to the {{CommitInfo}} (false positives are fine).
> ** Note that the above adds cost to the commit and must therefore be 
> carefully done and measured
> ** One potential measure could be to only do filtering when listener's queues 
> are larger than a certain threshold (eg 10)
> * The ChangeProcessor in {{contentChanged}} (in the one created in 
> [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224])
>  then checks the new commitInfo's _potentially affected observers_ list and 
> if it's not in the list, adds a {{NOOP}} token at the end of the queue. If 
> there's already a NOOP there, the two are collapsed (this way when a filter 
> is not affected it would have a NOOP at the end of the queue). If later on a 
> no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} 
> for the newly added {{ContentChange}} obj.
> ** To achieve that, the ContentChange obj is extended to no

[jira] [Comment Edited] (OAK-4796) filter events before adding to ChangeProcessor's queue

2016-09-19 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503621#comment-15503621
 ] 

Chetan Mehrotra edited comment on OAK-4796 at 9/19/16 2:23 PM:
---

What a coincidence was just looking at this issue and now the patch!. Have not 
yet gone in detail into the patch but wanted to comment on approach. Instead of 
having a Validator provided by Observer determine if certain change is 
interesting for given observor we can take a more decouple approach

h3. A - Collect data
For any given commit collect set of potentially interesting things (as 
mentioned in OAK-4586)
# set of nodetype names for modified nodes
# set of property names which got modified
# Paths (upto n level) under which modification happened say upto depth 2
# set of node names which got modified

This logic can be implemented as an {{Editor}} which gets registered in 
{{IndexUpdate}} and it collects above data and stores it in {{CommitContext}} 
on _best effort basis_. The editor would be invoked for each change and there 
it can very efficiently build up this state. Complete data is collected 
irrespective if any observer is interested in that change. 

My understanding is that this state would not be very large for most of the 
commits done

By best effort means that if any data becomes too big say 1000 different 
property name (highly unlikely!) which got changed then it would empty the 
state and somehow indicate that data is too large and observer should do the 
hard work. So if we are in a large transaction then we do not collect this data 
(configurable limits)

h3. B - Filter out based on collected data
Each observer would then provide a Filter which would be run when any 
ContentChange gets enqueued and it only allows those changes which have changes 
which it is interested in

This approach would allow us to later serialized this collected data in 
DocumentNodeStore journal entry and gets merged when an external diff event is 
sent. Thus benefiting both local and external change processing. 


was (Author: chetanm):
What a coincidence was just looking at this issue and now the patch!. Have not 
yet gone in detail into the patch but wanted to commit on approach. Instead of 
having a Validator provided by Observer determine if certain change is 
interesting for given observor we can take a more decouple approach

h3. A - Collect data
For any given commit collect set of potentially interesting things (as 
mentioned in OAK-4586)
# set of nodetype names for modified nodes
# set of property names which got modified
# Paths (upto n level) under which modification happened say upto depth 2
# set of node names which got modified

This logic can be implemented as an {{Editor}} which gets registered in 
{{IndexUpdate}} and it collects above data and stores it in {{CommitContext}} 
on _best effort basis_. The editor would be invoked for each change and there 
it can very efficiently build up this state. Complete data is collected 
irrespective if any observer is interested in that change. 

My understanding is that this state would not be very large for most of the 
commits done

By best effort means that if any data becomes too big say 1000 different 
property name which got changed then it would not empty the state and indicate 
that data is too large and observer should do the hard work. So if we are in a 
large commit then we do not collect this data (configurable limits)

h3. B - Filter out based on collected data
Each observer would then provide a Filter which would be run when any 
ContentChange gets enqueued and it only allows those changes which have changes 
which it is interested in

This approach would allow us to later serialized this collected data in 
DocumentNodeStore journal entry and gets merged when an external diff event is 
sent. Thus benefiting both local and external change processing. 

> filter events before adding to ChangeProcessor's queue
> --
>
> Key: OAK-4796
> URL: https://issues.apache.org/jira/browse/OAK-4796
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Affects Versions: 1.5.9
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4796.patch
>
>
> Currently the 
> [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335]
>  is in charge of doing the event diffing and filtering and does so in a 
> pooled Thread, ie asynchronously, at a later stage independent from the 
> commit. This has the advantage that the commit is fast, but has the following 
> potentially negative ef