[jira] [Commented] (OAK-5636) potential NPE in ReplicaSetInfo
[ https://issues.apache.org/jira/browse/OAK-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863754#comment-15863754 ] Stefan Egli commented on OAK-5636: -- +1, patch looks good to me > potential NPE in ReplicaSetInfo > --- > > Key: OAK-5636 > URL: https://issues.apache.org/jira/browse/OAK-5636 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core, mongomk >Affects Versions: 1.6.0 >Reporter: Julian Reschke >Assignee: Tomek Rękawek >Priority: Critical > Fix For: 1.8, 1.6.1 > > Attachments: OAK-5636.patch > > > seen in log: > {noformat} > java.lang.NullPointerException: null > at > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:192) > ~[oak-run-1.8-SNAPSHOT.jar:1.8-SNAPSHOT] > at > com.google.common.collect.SingletonImmutableSet.(SingletonImmutableSet.java:47) > ~[oak-run-1.8-SNAPSHOT.jar:1.8-SNAPSHOT] > at com.google.common.collect.ImmutableSet.of(ImmutableSet.java:94) > ~[oak-run-1.8-SNAPSHOT.jar:1.8-SNAPSHOT] > at > org.apache.jackrabbit.oak.plugins.document.mongo.replica.ReplicaSetInfo.updateRevisions(ReplicaSetInfo.java:264) > ~[oak-run-1.8-SNAPSHOT.jar:1.8-SNAPSHOT] > at > org.apache.jackrabbit.oak.plugins.document.mongo.replica.ReplicaSetInfo.updateReplicaStatus(ReplicaSetInfo.java:182) > ~[oak-run-1.8-SNAPSHOT.jar:1.8-SNAPSHOT] > at > org.apache.jackrabbit.oak.plugins.document.mongo.replica.ReplicaSetInfo.updateLoop(ReplicaSetInfo.java:145) > ~[oak-run-1.8-SNAPSHOT.jar:1.8-SNAPSHOT] > at > org.apache.jackrabbit.oak.plugins.document.mongo.replica.ReplicaSetInfo.run(ReplicaSetInfo.java:134) > ~[oak-run-1.8-SNAPSHOT.jar:1.8-SNAPSHOT] > at java.lang.Thread.run(Unknown Source) [na:1.8.0_121] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-5619) withIncludeAncestorsRemove reports unrelated top-level node deletion
[ https://issues.apache.org/jira/browse/OAK-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863858#comment-15863858 ] Stefan Egli commented on OAK-5619: -- Thx [~mduerig], I'll adjust the javadoc accordingly as well. > withIncludeAncestorsRemove reports unrelated top-level node deletion > > > Key: OAK-5619 > URL: https://issues.apache.org/jira/browse/OAK-5619 > Project: Jackrabbit Oak > Issue Type: Bug > Components: jcr >Affects Versions: 1.6.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: 1.6.1 > > Attachments: OAK-5619.patch, OAK-5619.patch2 > > > withIncludeAncestorsRemove includes deletion of all parents of the registered > paths. When registering an include path {{/a/b/c}} this thus triggers an > event if {{/a}} is deleted. When registering an include glob path {{**/foo}} > then any parent path deletion will be reported. > There is a bug currently whereas an include path {{/a/b/c}} results in any > parent deletion to be reported. This likely stems from the fact that for glob > paths any parent path deletion will be reported. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OAK-5619) withIncludeAncestorsRemove reports unrelated top-level node deletion
[ https://issues.apache.org/jira/browse/OAK-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-5619: - Fix Version/s: (was: 1.6.1) 1.8 1.7.0 * fixed in trunk: http://svn.apache.org/viewvc?rev=1782797=rev > withIncludeAncestorsRemove reports unrelated top-level node deletion > > > Key: OAK-5619 > URL: https://issues.apache.org/jira/browse/OAK-5619 > Project: Jackrabbit Oak > Issue Type: Bug > Components: jcr >Affects Versions: 1.6.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: 1.7.0, 1.8 > > Attachments: OAK-5619.patch, OAK-5619.patch2 > > > withIncludeAncestorsRemove includes deletion of all parents of the registered > paths. When registering an include path {{/a/b/c}} this thus triggers an > event if {{/a}} is deleted. When registering an include glob path {{**/foo}} > then any parent path deletion will be reported. > There is a bug currently whereas an include path {{/a/b/c}} results in any > parent deletion to be reported. This likely stems from the fact that for glob > paths any parent path deletion will be reported. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (OAK-5619) withIncludeAncestorsRemove reports unrelated top-level node deletion
[ https://issues.apache.org/jira/browse/OAK-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-5619. -- Resolution: Fixed > withIncludeAncestorsRemove reports unrelated top-level node deletion > > > Key: OAK-5619 > URL: https://issues.apache.org/jira/browse/OAK-5619 > Project: Jackrabbit Oak > Issue Type: Bug > Components: jcr >Affects Versions: 1.6.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: 1.7.0, 1.8, 1.6.1 > > Attachments: OAK-5619.patch, OAK-5619.patch2 > > > withIncludeAncestorsRemove includes deletion of all parents of the registered > paths. When registering an include path {{/a/b/c}} this thus triggers an > event if {{/a}} is deleted. When registering an include glob path {{**/foo}} > then any parent path deletion will be reported. > There is a bug currently whereas an include path {{/a/b/c}} results in any > parent deletion to be reported. This likely stems from the fact that for glob > paths any parent path deletion will be reported. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OAK-5619) withIncludeAncestorsRemove reports unrelated top-level node deletion
[ https://issues.apache.org/jira/browse/OAK-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-5619: - Fix Version/s: 1.6.1 * merged to 1.6 branch: http://svn.apache.org/viewvc?rev=1782801=rev > withIncludeAncestorsRemove reports unrelated top-level node deletion > > > Key: OAK-5619 > URL: https://issues.apache.org/jira/browse/OAK-5619 > Project: Jackrabbit Oak > Issue Type: Bug > Components: jcr >Affects Versions: 1.6.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: 1.7.0, 1.8, 1.6.1 > > Attachments: OAK-5619.patch, OAK-5619.patch2 > > > withIncludeAncestorsRemove includes deletion of all parents of the registered > paths. When registering an include path {{/a/b/c}} this thus triggers an > event if {{/a}} is deleted. When registering an include glob path {{**/foo}} > then any parent path deletion will be reported. > There is a bug currently whereas an include path {{/a/b/c}} results in any > parent deletion to be reported. This likely stems from the fact that for glob > paths any parent path deletion will be reported. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OAK-5619) withIncludeAncestorsRemove reports unrelated top-level node deletion
[ https://issues.apache.org/jira/browse/OAK-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-5619: - Summary: withIncludeAncestorsRemove reports unrelated top-level node deletion (was: withIncludeAncestorsRemove reports unrelated parent deletion) > withIncludeAncestorsRemove reports unrelated top-level node deletion > > > Key: OAK-5619 > URL: https://issues.apache.org/jira/browse/OAK-5619 > Project: Jackrabbit Oak > Issue Type: Bug > Components: jcr >Affects Versions: 1.6.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: 1.6.1 > > > withIncludeAncestorsRemove includes deletion of all parents of the registered > paths. When registering an include path {{/a/b/c}} this thus triggers an > event if {{/a}} is deleted. When registering an include glob path {{**/foo}} > then any parent path deletion will be reported. > There is a bug currently whereas an include path {{/a/b/c}} results in any > parent deletion to be reported. This likely stems from the fact that for glob > paths any parent path deletion will be reported. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-5619) withIncludeAncestorsRemove reports unrelated top-level node deletion
[ https://issues.apache.org/jira/browse/OAK-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859414#comment-15859414 ] Stefan Egli commented on OAK-5619: -- note that this doesn't affect any unrelated node but only sharing one of the ancestors' parent - including {{/}} - ie if you have a listener on {{/a/b/c/d}} then the deletion of {{/unrelated}} is reported, as well as the deletion of {{/a/b/unrelated}}. > withIncludeAncestorsRemove reports unrelated top-level node deletion > > > Key: OAK-5619 > URL: https://issues.apache.org/jira/browse/OAK-5619 > Project: Jackrabbit Oak > Issue Type: Bug > Components: jcr >Affects Versions: 1.6.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: 1.6.1 > > > withIncludeAncestorsRemove includes deletion of all parents of the registered > paths. When registering an include path {{/a/b/c}} this thus triggers an > event if {{/a}} is deleted. When registering an include glob path {{**/foo}} > then any parent path deletion will be reported. > There is a bug currently whereas an include path {{/a/b/c}} results in any > parent deletion to be reported. This likely stems from the fact that for glob > paths any parent path deletion will be reported. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-5626) ChangeProcessor doesn't reset 'blocking' flag when items from queue gets removed and commit-rate-limiter is null
[ https://issues.apache.org/jira/browse/OAK-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861275#comment-15861275 ] Stefan Egli commented on OAK-5626: -- I'd argue we should go for the simpler time check just around that log.warn. It only happens on dequeue of a full queue (ie when it was compacting). So doesn't happen normally anyway, only under a lot of stress.. > ChangeProcessor doesn't reset 'blocking' flag when items from queue gets > removed and commit-rate-limiter is null > > > Key: OAK-5626 > URL: https://issues.apache.org/jira/browse/OAK-5626 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > > Following up on conversation at \[0]: > {{ChangeProcessor#queueSizeChanged}} \[1] sets blocking flag to true if queue > size is hit (or beyond). The warning "Revision queue is full. Further > revisions will be compacted." is logged only when it *wasn't* blocking. > BUT, when queue empties, blocking flag is reset inside if block for > commitRateLimiter!=null. That means an event chain like: > # qFull > # log warn > # qEmpties > # qFull > won't log the WARN after step(4) > \[0]: http://markmail.org/message/hgein5g3ohyjhw5n > \[1]: > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L307 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-5626) ChangeProcessor doesn't reset 'blocking' flag when items from queue gets removed and commit-rate-limiter is null
[ https://issues.apache.org/jira/browse/OAK-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861226#comment-15861226 ] Stefan Egli commented on OAK-5626: -- bq. My concern is that this would lead to almost every commit leading to those WARN flooding the logs. Agreed, I had the same thought. bq. But, this part lies in fairly critical section - I'm not sure of getting time can be expensive or not? Agreed, not sure if it's a problem though. But one possible alternative might be to move the time check to the [{{removed()}}|https://github.com/apache/jackrabbit-oak/blob/4eac76dcbb262f10af9202cdcbc3e95dee40107a/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L303] case. It would get a bit unprecise and more tricky to implement, but {{removed()}} is called asynchronously and therefore less time critical. So above could be eg implemented "modCnt" style using {{logCnt}}, {{suppressCnt}} and {{lastSuppressTime}} as follows: * {{logCnt}} tracks number of those log.warns and log.warn is only issued if {{logCnt<=suppressCnt}}, in which case it then does {{logCnt = suppressCnt + 1}} (thus logCnt is maintained by the "add" thread only) * the "remove" thread takes note of logCnt incrementing, measures time ({{lastSuppressTime}}), and would increment suppressCnt only after eg 5min (so time is only ever checked in the remove thread). Not super nice, but possible > ChangeProcessor doesn't reset 'blocking' flag when items from queue gets > removed and commit-rate-limiter is null > > > Key: OAK-5626 > URL: https://issues.apache.org/jira/browse/OAK-5626 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core >Reporter: Vikas Saurabh >Assignee: Vikas Saurabh >Priority: Minor > > Following up on conversation at \[0]: > {{ChangeProcessor#queueSizeChanged}} \[1] sets blocking flag to true if queue > size is hit (or beyond). The warning "Revision queue is full. Further > revisions will be compacted." is logged only when it *wasn't* > blocking. > BUT, when queue empties, blocking flag is reset inside if block for > commitRateLimiter!=null. That means an event chain like: > # qFull > # log warn > # qEmpties > # qFull > won't log the WARN after step(4) > \[0]: http://markmail.org/message/hgein5g3ohyjhw5n > \[1]: > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L307 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OAK-5619) withIncludeAncestorsRemove reports unrelated top-level node deletion
[ https://issues.apache.org/jira/browse/OAK-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-5619: - Attachment: OAK-5619.patch2 Attached [^OAK-5619.patch2] which is the same as the previous patch but adds one more testcase plus adds some clarification to a javadoc > withIncludeAncestorsRemove reports unrelated top-level node deletion > > > Key: OAK-5619 > URL: https://issues.apache.org/jira/browse/OAK-5619 > Project: Jackrabbit Oak > Issue Type: Bug > Components: jcr >Affects Versions: 1.6.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: 1.6.1 > > Attachments: OAK-5619.patch, OAK-5619.patch2 > > > withIncludeAncestorsRemove includes deletion of all parents of the registered > paths. When registering an include path {{/a/b/c}} this thus triggers an > event if {{/a}} is deleted. When registering an include glob path {{**/foo}} > then any parent path deletion will be reported. > There is a bug currently whereas an include path {{/a/b/c}} results in any > parent deletion to be reported. This likely stems from the fact that for glob > paths any parent path deletion will be reported. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OAK-5619) withIncludeAncestorsRemove reports unrelated top-level node deletion
[ https://issues.apache.org/jira/browse/OAK-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-5619: - Attachment: OAK-5619.patch Attached [^OAK-5619.patch] which contains a suggested patch. > withIncludeAncestorsRemove reports unrelated top-level node deletion > > > Key: OAK-5619 > URL: https://issues.apache.org/jira/browse/OAK-5619 > Project: Jackrabbit Oak > Issue Type: Bug > Components: jcr >Affects Versions: 1.6.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: 1.6.1 > > Attachments: OAK-5619.patch > > > withIncludeAncestorsRemove includes deletion of all parents of the registered > paths. When registering an include path {{/a/b/c}} this thus triggers an > event if {{/a}} is deleted. When registering an include glob path {{**/foo}} > then any parent path deletion will be reported. > There is a bug currently whereas an include path {{/a/b/c}} results in any > parent deletion to be reported. This likely stems from the fact that for glob > paths any parent path deletion will be reported. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (OAK-5619) withIncludeAncestorsRemove reports unrelated top-level node deletion
[ https://issues.apache.org/jira/browse/OAK-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859302#comment-15859302 ] Stefan Egli edited comment on OAK-5619 at 2/9/17 4:28 PM: -- Added a test case to trunk (currently disabled) with which this can be reproduced: http://svn.apache.org/viewvc?rev=1782304=rev EDIT: extended the test case to cover more different cases: http://svn.apache.org/viewvc?rev=1782350=rev was (Author: egli): Added a test case to trunk (currently disabled) with which this can be reproduced: http://svn.apache.org/viewvc?rev=1782304=rev > withIncludeAncestorsRemove reports unrelated top-level node deletion > > > Key: OAK-5619 > URL: https://issues.apache.org/jira/browse/OAK-5619 > Project: Jackrabbit Oak > Issue Type: Bug > Components: jcr >Affects Versions: 1.6.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: 1.6.1 > > > withIncludeAncestorsRemove includes deletion of all parents of the registered > paths. When registering an include path {{/a/b/c}} this thus triggers an > event if {{/a}} is deleted. When registering an include glob path {{**/foo}} > then any parent path deletion will be reported. > There is a bug currently whereas an include path {{/a/b/c}} results in any > parent deletion to be reported. This likely stems from the fact that for glob > paths any parent path deletion will be reported. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-3707) Register composite commit hook with whiteboard
[ https://issues.apache.org/jira/browse/OAK-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422659#comment-15422659 ] Stefan Egli commented on OAK-3707: -- [~edivad], what's your opinion on backporting this to 1.0.x/1.2.x branches? > Register composite commit hook with whiteboard > -- > > Key: OAK-3707 > URL: https://issues.apache.org/jira/browse/OAK-3707 > Project: Jackrabbit Oak > Issue Type: Improvement >Affects Versions: 1.3.11 >Reporter: Davide Giannella >Assignee: Davide Giannella > Fix For: 1.4, 1.3.13 > > Attachments: OAK-3707-1.patch > > > Register, during repository initialisation the composite of the CommitHook > with the whiteboard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OAK-4153) segment's compareAgainstBaseState wont call childNodeDeleted when deleting last and adding n nodes
[ https://issues.apache.org/jira/browse/OAK-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-4153. -- Resolution: Fixed > segment's compareAgainstBaseState wont call childNodeDeleted when deleting > last and adding n nodes > -- > > Key: OAK-4153 > URL: https://issues.apache.org/jira/browse/OAK-4153 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segmentmk >Affects Versions: 1.2.13, 1.0.29, 1.4.1, 1.5.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: 1.4.7, 1.2.19, 1.0.34, 1.5.1 > > Attachments: OAK-4153-2.patch, OAK-4153-3.patch, OAK-4153.patch, > OAK-4153.simplified.patch > > > {{SegmentNodeState.compareAgainstBaseState}} fails to call > {{NodeStateDiff.childNodeDeleted}} when for the same parent the only child is > deleted and at the same time multiple new, different children are added. > Reason is that the [current > code|https://github.com/apache/jackrabbit-oak/blob/a9ce70b61567ffe27529dad8eb5d38ced77cf8ad/oak-segment/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentNodeState.java#L558] > for '{{afterChildName == MANY_CHILD_NODES}}' *and* '{{beforeChildName == > ONE_CHILD_NODE}}' does not handle all cases: it assumes that 'after' contains > the 'before' child and doesn't handle the situation where the 'before' child > has gone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4153) segment's compareAgainstBaseState wont call childNodeDeleted when deleting last and adding n nodes
[ https://issues.apache.org/jira/browse/OAK-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430543#comment-15430543 ] Stefan Egli commented on OAK-4153: -- [~edivad], I see what you mean, however is that really necessary as it changes the 'truth' about when this was indeed fixed in 1.5 (1.5.1). As 1.5 is an unstable release I'm not sure how important the release-notes are, but yes would have been better to open a separate ticket for the backports probably. > segment's compareAgainstBaseState wont call childNodeDeleted when deleting > last and adding n nodes > -- > > Key: OAK-4153 > URL: https://issues.apache.org/jira/browse/OAK-4153 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segmentmk >Affects Versions: 1.2.13, 1.0.29, 1.4.1, 1.5.0 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Critical > Fix For: 1.5.1, 1.4.7, 1.2.19, 1.0.34 > > Attachments: OAK-4153-2.patch, OAK-4153-3.patch, OAK-4153.patch, > OAK-4153.simplified.patch > > > {{SegmentNodeState.compareAgainstBaseState}} fails to call > {{NodeStateDiff.childNodeDeleted}} when for the same parent the only child is > deleted and at the same time multiple new, different children are added. > Reason is that the [current > code|https://github.com/apache/jackrabbit-oak/blob/a9ce70b61567ffe27529dad8eb5d38ced77cf8ad/oak-segment/src/main/java/org/apache/jackrabbit/oak/plugins/segment/SegmentNodeState.java#L558] > for '{{afterChildName == MANY_CHILD_NODES}}' *and* '{{beforeChildName == > ONE_CHILD_NODE}}' does not handle all cases: it assumes that 'after' contains > the 'before' child and doesn't handle the situation where the 'before' child > has gone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15458826#comment-15458826 ] Stefan Egli commented on OAK-4581: -- One more comment re bq. I expect unbounded queues to have adverse effects on the hotness of the various caches. As pointed out, I think we should do pre-filtering. But we should probably also do pre-calculation of the actual events we want to deliver. Assuming that the listener's filters are "good" it would be efficient to go through the whole filter and basically store pre-manufactured events as a result only. This would mean that once an event is persisted it is no longer dependent on _any_ cache. But it also means that what will be persisted will _actually_ be consumed later on - as the raw and final event itself would basically be stored, no later filtering whatsoever. Doing the filtering as early as possible would also add load to the system at commit time, thus result in a natural throttling. As basically the computation of all the events as they will be delivered to the listeners then becomes part of the commit. (_two birds with one stone_) > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464458#comment-15464458 ] Stefan Egli commented on OAK-4581: -- I wouldn't rule out that there are better fits, for sure. The advantage of using tarMk is that we'd use our own stuff for another use case too. * _there will be a performance impact as the lists grows_ : we can store in sub-folders named with a time pattern, similar as sling jobs does * _garbage collections is difficult and expensive_ : gc would be done in a generational approach: rewrite the queue in a new tarMk once certain thresholds are reached (ideally this could be done when the queue is empty, so no rewrite would be necessary). But gc on that tarMk incarnation would be turned off completely. * _adding and removing elements from the list will cause rewrites of buckets instead of just a single append, remove_ : the idea is to store in batches, so I was assuming this shouldn't be that bad Overall I'd find it simpler, but if ppl agree that we shouldn't use tarMk for this, then sure, let's switch. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464465#comment-15464465 ] Stefan Egli commented on OAK-4581: -- Waiting with further implementations until we agree on how to proceed > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1546#comment-1546 ] Stefan Egli commented on OAK-4581: -- So could we adjust the retention time dynamically (is there an API) when the queue grows? > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4739) lease: immediate renew after long renew call
[ https://issues.apache.org/jira/browse/OAK-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457947#comment-15457947 ] Stefan Egli commented on OAK-4739: -- Using the _recovery lock_ we should be able to distinguish the case where an instance failed to update the lease (eg due to network hickup) but no other instance noticed this yet (including discovery-lite) and a case where anyone noticed this. Basically we have clear state boundaries between a lease being valid and an instance being in recovery state. If this is the case (and I believe it is) then we could indeed do a retry in case the lease fails but the recovery lock has not yet been acquired, no? > lease: immediate renew after long renew call > > > Key: OAK-4739 > URL: https://issues.apache.org/jira/browse/OAK-4739 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Affects Versions: 1.5.8 >Reporter: Martin Böttcher > Labels: resilience > > A single temporary network issue can shut down the DocumentStore. We observed > the following situation: > # org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo.renewLease was > called (this is done regularly and completely normal) > # the network had a temporary issue (whatsoever) > # the database call terminated after a lot of time (the default db > maxWaitTime is 120 seconds). > # org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo.renewLease > decides that the current lease is too old (>120 seconds thats the default for > the oak.documentMK.leaseDurationSeconds property), sets a leaseCheckFailed > variable and throws an Exception > # because leaseCheckFailed is set all following tries (if any) will > immediately throw an Exception, too. > I'd recommend to make the ClusterNodeInfo code more robust so that at least > one retry will be made. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457991#comment-15457991 ] Stefan Egli commented on OAK-4581: -- [~mduerig], thanks for the comments! bq. Do we know that we need to go off-heap with that queue? Agreed, the entries are normally cheap, but generally speaking it's the open map mechanism that can make them unbounded, thus larger. But even assuming they are cheap on average, you can have a situation where you have such a high traffic burst that you're overwhelming even a highly optimized listener logic. In which case the queues grow and you'll get an OutOfMemoryError. The benefit of persisting the queues (when they become big that is) is for such rare special cases only. And you'd have to construct such a rare case where basically you'd force an OutOfMemoryError versus with this patch not. bq. I expect unbounded queues to have adverse effects on the hotness of the various caches. Right. There's some ideas left that aren't much mentioned or fleshed out yet: on the one hand we should do _pre-filtering_ of events, such that only events end up on queues that are indeed meant to go to a listener. The listener shouldn't have to filter afterwards anymore. Currently we're putting events on each listener's queues and only filter after hte fact. If queues become large, then this very fact becomes an issue exactly due to cache inefficiencies in this case. Ie a lot of computation is then lost purely to figure out if a listener needs an entry or not (as it can't find it in the cache anymore). So with prefiltering this would not be an issue anymore. What would be left though is the cache-inefficiency for actual events that listeners _want_. There we might optimize by including a bit more info into what we persist, perhaps the actual diff if it's not too big etc etc. bq. Any thoughts on how unbounded queues should interact with gc? One approach that we currently target is to checkpoint the oldest entry, such that we prevent gc from removing it (assuming checkpoints are respected). bq. However I dislike having to cope with serialising the open CommitInfo class. At least we should rely on a general purpose library here. Open for alternatives for sure! I was assuming that we need to store the CommitInfo obj, as that's what persisting is mostly about. And if something in there is not serializable, then we're lost and have to skip it (we can warn loudly though). What exactly were you thinking of as alternatives? bq. I don't think PersistedBlockingQueue should use a node store as its back-end. I'm probably not getting the entirety of this point. I guess one argument to reuse the tarMk is that it's something we have and know we can use it - we can surely use something else, for sure. Regarding GC the idea was to _not_ rely on GCing that observation-tarMk but to use generations of tarMk similar to how that's done in persistent cache: so we'd throw away a whole tarMk set once we switched to a new one. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only
[jira] [Resolved] (OAK-4728) tarmk's FileStoreBuilder.build should use mkdirs instead of mkdir
[ https://issues.apache.org/jira/browse/OAK-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-4728. -- Resolution: Fixed changed to {{mkdirs}} in http://svn.apache.org/viewvc?rev=1758610=rev > tarmk's FileStoreBuilder.build should use mkdirs instead of mkdir > - > > Key: OAK-4728 > URL: https://issues.apache.org/jira/browse/OAK-4728 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar >Affects Versions: Segment Tar 0.0.10 >Reporter: Stefan Egli >Assignee: Stefan Egli > Fix For: Segment Tar 0.0.12 > > > [FileStoreBuilder.build|https://github.com/apache/jackrabbit-oak/blob/2b6c2f5340f3b6485dda5c493f6343d232c883e9/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/FileStoreBuilder.java#L338] > uses {{mkdir}} which can be problematic when using non-standard directories > such as is perhaps intended with OAK-4655. Using {{mkdirs}} instead is more > robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-4728) tarmk's FileStoreBuilder.build should use mkdirs instead of mkdir
Stefan Egli created OAK-4728: Summary: tarmk's FileStoreBuilder.build should use mkdirs instead of mkdir Key: OAK-4728 URL: https://issues.apache.org/jira/browse/OAK-4728 Project: Jackrabbit Oak Issue Type: Bug Components: segment-tar Affects Versions: Segment Tar 0.0.10 Reporter: Stefan Egli Assignee: Stefan Egli Fix For: Segment Tar 0.0.12 [FileStoreBuilder.build|https://github.com/apache/jackrabbit-oak/blob/2b6c2f5340f3b6485dda5c493f6343d232c883e9/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/FileStoreBuilder.java#L338] uses {{mkdir}} which can be problematic when using non-standard directories such as is perhaps intended with OAK-4655. Using {{mkdirs}} instead is more robust. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (OAK-4655) Enable configuring multiple segment nodestore instances in same setup
[ https://issues.apache.org/jira/browse/OAK-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli reassigned OAK-4655: Assignee: Stefan Egli > Enable configuring multiple segment nodestore instances in same setup > - > > Key: OAK-4655 > URL: https://issues.apache.org/jira/browse/OAK-4655 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar, segmentmk >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Fix For: 1.6 > > > With OAK-4369 and OAK-4490 its now possible to configure a new > SegmentNodeStore to act as secondry nodestore (OAK-4180). Recently for few > other features we see a requirement to configure a SegmentNodeStore just for > storage purpose. For e.g. > # OAK-4180 - Enables use of SegmentNodeStore as a secondary store to > compliment DocumentNodeStore > #* Always uses BlobStore from primary DocumentNodeStore > #* Compaction to be enabled > # OAK-4654 - Enable use of SegmentNodeStore for private mount in a > multiplexing nodestore setup > #* Might use its own blob store > #* Compaction might be disabled as it would be read only > # OAK-4581 - Proposes to make use of SegmentNodeStore for storing event queue > offline > In all these setups we need to configure a SegmentNodeStore which has > following aspect > # NodeStore instance is not directly exposed but exposed via > {{NodeStoreProvider}} interface with {{role}} service property specifying the > intended usage > # NodeStore here is not fully functional i.e. it would not be configured with > std observers, would not be used by ContentRepository etc > # It needs to be ensured that any JMX MBean registered accounts for "role" so > that there is no collision > With existing SegmentNodeStoreService we can only configure 1 nodestore. To > support above cases we need a OSGi config factory based implementation which > enables creation of multiple SegmentNodeStore instances (each with different > directory and different settings) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4655) Enable configuring multiple segment nodestore instances in same setup
[ https://issues.apache.org/jira/browse/OAK-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4655: - Attachment: OAK-4655.v1.patch Attaching [^OAK-4655.v1.patch] which is a suggestion for a {{SegmentNodeStoreFactory}} (one for both oak-segment and oak-segment-tar - they're basically twins, perhaps we don't need both..) that registers {{NodeStoreProviders}} with the corresponding {{role}} set (the role coming from the config for {{SegmentNodeStoreFactory}}). > Enable configuring multiple segment nodestore instances in same setup > - > > Key: OAK-4655 > URL: https://issues.apache.org/jira/browse/OAK-4655 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar, segmentmk >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Fix For: 1.6 > > Attachments: OAK-4655.v1.patch > > > With OAK-4369 and OAK-4490 its now possible to configure a new > SegmentNodeStore to act as secondry nodestore (OAK-4180). Recently for few > other features we see a requirement to configure a SegmentNodeStore just for > storage purpose. For e.g. > # OAK-4180 - Enables use of SegmentNodeStore as a secondary store to > compliment DocumentNodeStore > #* Always uses BlobStore from primary DocumentNodeStore > #* Compaction to be enabled > # OAK-4654 - Enable use of SegmentNodeStore for private mount in a > multiplexing nodestore setup > #* Might use its own blob store > #* Compaction might be disabled as it would be read only > # OAK-4581 - Proposes to make use of SegmentNodeStore for storing event queue > offline > In all these setups we need to configure a SegmentNodeStore which has > following aspect > # NodeStore instance is not directly exposed but exposed via > {{NodeStoreProvider}} interface with {{role}} service property specifying the > intended usage > # NodeStore here is not fully functional i.e. it would not be configured with > std observers, would not be used by ContentRepository etc > # It needs to be ensured that any JMX MBean registered accounts for "role" so > that there is no collision > With existing SegmentNodeStoreService we can only configure 1 nodestore. To > support above cases we need a OSGi config factory based implementation which > enables creation of multiple SegmentNodeStore instances (each with different > directory and different settings) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4581: - Attachment: OAK-4581.v0.patch Attached [^OAK-4581.v0.patch] (which bases on [OAK-4655.v1.patch|https://issues.apache.org/jira/secure/attachment/12826406/OAK-4655.v1.patch] that introduces support for multiple SegmentNodeStores). This is an ongoing effort, but wanted to share progress early. Here's the status: * introduces a {{NodeStateSerializer}} that NSs should somehow implement in order to map {{NodeState->String}} and vice-verca (for storage of the event). * introduces a {{EventQueueFactory}} that is used by the BackgroundObserver instead of hardcoding creation of a {{newArrayBlockingQueue}} (it still does the latter for sling compatibility cases, but that we should get rid of) ** one version of this is above {{newArrayBlockingQueue}} - ie an in-memory queue that can still be used for a few cases ** the new version though is {{PersistedEventQueueFactory}} that creates {{PersistedBlockingQueue}} where the storing magic will happen. *** this last one is early stages - currently just show-cases using a secondary/thirdary.. store for persistence. (But storing is quite unoptimized there atm). * so again, the main idea is that the {{BackgroundObserver}} remains largely unchanged - it still works on a {{queue}} and wouldn't notice what's behind the queue. The logic for storing/retrieving via persistence is hidden behind a {{BlockingQueue}} implementation, that's the main point here I think. The consequences that this introduces will be: * we can only store values in the CommitInfo that are serializable - others would have to be skipped/get lost * creators of {{new Jcr()/new Oak()}} would pass the corresponding {{EventQueueFactory}} - thus the mapping would be done in the 'jcr factory service'. The EventQueueFactory will then propagate down the Repository/Session chain to the BackgroundObserver. I'll continue working on the {{PersistedBlockingQueue}} including testing but would appreciate some early feedback about this approach. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep
[jira] [Assigned] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli reassigned OAK-4796: Assignee: Stefan Egli > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4796: - Fix Version/s: 1.6 > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4796: - Affects Version/s: 1.5.9 > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-4796) filter events before adding to ChangeProcessor's queue
Stefan Egli created OAK-4796: Summary: filter events before adding to ChangeProcessor's queue Key: OAK-4796 URL: https://issues.apache.org/jira/browse/OAK-4796 Project: Jackrabbit Oak Issue Type: Improvement Components: jcr Reporter: Stefan Egli Currently the [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] is in charge of doing the event diffing and filtering and does so in a pooled Thread, ie asynchronously, at a later stage independent from the commit. This has the advantage that the commit is fast, but has the following potentially negative effects: # events (in the form of ContentChange Objects) occupy a slot of the queue even if the listener is not interested in it - any commit lands on any listener's queue. This reduces the capacity of the queue for 'actual' events to be delivered. It therefore increases the risk that the queue fills - and when full has various consequences such as loosing the CommitInfo etc. # each event==ContentChange later on must be evaluated, and for that a diff must be calculated. Depending on runtime behavior that diff might be expensive if no longer in the cache (documentMk specifically). As an improvement, this diffing+filtering could be done at an earlier stage already, nearer to the commit, and in case the filter would ignore the event, it would not have to be put into the queue at all, thus avoiding occupying a slot and later potentially slower diffing. The suggestion is to implement this via the following algorithm: * During the commit, in a {{Validator}} the listener's filters are evaluated - in an as-efficient-as-possible manner (Reason for doing it in a Validator is that this doesn't add overhead as oak already goes through all changes for other Validators). As a result a _list of potentially affected observers_ is added to the {{CommitInfo}} (false positives are fine). ** Note that the above adds cost to the commit and must therefore be carefully done and measured ** One potential measure could be to only do filtering when listener's queues are larger than a certain threshold (eg 10) * The ChangeProcessor in {{contentChanged}} (in the one created in [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) then checks the new commitInfo's _potentially affected observers_ list and if it's not in the list, adds a {{NOOP}} token at the end of the queue. If there's already a NOOP there, the two are collapsed (this way when a filter is not affected it would have a NOOP at the end of the queue). If later on a no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} for the newly added {{ContentChange}} obj. ** To achieve that, the ContentChange obj is extended to not only have the "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532623#comment-15532623 ] Stefan Egli edited comment on OAK-4796 at 9/29/16 3:24 PM: --- As discussed offline with Marcel, I'll work on a patch for the 2nd variant, so that we can compare the complexity/result. Also realized that the moment when filtering is applied is critical: prefiltering (in a CommitHook or Observer) might be applied with a filter A, which could potentially be changed to A' before the event is delivered. Currently though before delivering, the new filter A' would be applied. This is wrong. We need to either do prefiltering or postfiltering and can't mix the two. Therefore for prefiltering it's essential to pass around the applied filter in the ContentChange obj and use that later at delivery time. was (Author: egli): As discussed offline with Marcel, I'll work on a patch for the 2nd variant, so that we can compare the complexity/result. > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4898) Allow for external changes to have a CommitInfo attached
[ https://issues.apache.org/jira/browse/OAK-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551192#comment-15551192 ] Stefan Egli commented on OAK-4898: -- +1 besides allowing the additional infos mentioned, I think we should change the logic to have CommitInfo never be null - even in the overflow case - with the same arguments in that it would allow to pass state between/to Observers. > Allow for external changes to have a CommitInfo attached > > > Key: OAK-4898 > URL: https://issues.apache.org/jira/browse/OAK-4898 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Chetan Mehrotra > Fix For: 1.6 > > > Currently the observation logic relies on fact that CommitInfo being null > means that changes are from other cluster node i.e. external changes. > We should change this semantic and provide a different way to indicate that > changes are external. This would allow a NodeStore implementation to still > pass in a CommitInfo which captures useful information about commit like > brief summary on what got changed which can be used for pre filtering > (OAK-4796) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548765#comment-15548765 ] Stefan Egli edited comment on OAK-4796 at 10/5/16 2:11 PM: --- bq. This is wrong. We need to either do prefiltering or postfiltering and can't mix the two. Therefore for prefiltering it's essential to pass around the applied filter in the ContentChange obj and use that later at delivery time. Coming back to this point, there seems to be some issues with this based on the current design: Prior to prefiltering we had only postfiltering. And changing the FilterProvider was applied immediately - basically on all elements in the queue. With prefiltering this is, as pointed out, not correct: those elements in the queue already have gone through prefiltering, so postfiltering should be done with the same FilterProvider. Which means, the ChangeProcessor - which is in charge of postfiltering - should not use the FilterProvider set on its instance, but use the same that was used for prefiltering. Therefore the ChangeProcessor needs to be given the FilterProvider for each change that it processes. The way it receives changes though is via the Observer.contentChanged. Therefore about the only feasible place to pass the FilterProvider from BackgroundObserver to ChangeProcessor is via the CommitInfo. Thing now is that for external and overflow entries the CommitInfo is null. So I'd say, as long as that's the case it's very hard to implement correctly switching the filter. Unless this switch is done correctly, the only thing that can be said is that: when a filter is changed it is undefined for which changes both filters are applied (if the queue is not empty when switching). was (Author: egli): bq. This is wrong. We need to either do prefiltering or postfiltering and can't mix the two. Therefore for prefiltering it's essential to pass around the applied filter in the ContentChange obj and use that later at delivery time. Coming back to this point, there seems to be some issues with this based on the current design: Prior to prefiltering we had only postfiltering. And changing the FilterProvider was applied immediately - basically on all elements in the queue. With prefiltering this is, as pointed out, not correct: those elements in the queue already have gone through prefiltering, so postfiltering should be done with the same FilterProvider. Which means, the ChangeProcessor - which is in charge of postfiltering - should not use the FilterProvider set on its instance, but use the same that was used for prefiltering. Therefore the ChangeProcessor needs to be given the FilterProvider for each change that it processes. The way it receives changes though is via the Observer.contentChanged. Therefore about the only feasible place to pass the FilterProvider from BackgroundObserver to ChangeProcessor is via the CommitInfo. Thing now is that for external and overflow entries the CommitInfo is null. So I'd say, as long as that's the case it's very hard to implement correctly switching the filter. Unless this switch is done correctly, the only thing that can be said is that: when a filter is changed it is undefined if the old, the new or both filters are applied to entries in the queue. > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4796.changeSet.patch, OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk
[jira] [Commented] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548765#comment-15548765 ] Stefan Egli commented on OAK-4796: -- bq. This is wrong. We need to either do prefiltering or postfiltering and can't mix the two. Therefore for prefiltering it's essential to pass around the applied filter in the ContentChange obj and use that later at delivery time. Coming back to this point, there seems to be some issues with this based on the current design: Prior to prefiltering we had only postfiltering. And changing the FilterProvider was applied immediately - basically on all elements in the queue. With prefiltering this is, as pointed out, not correct: those elements in the queue already have gone through prefiltering, so postfiltering should be done with the same FilterProvider. Which means, the ChangeProcessor - which is in charge of postfiltering - should not use the FilterProvider set on its instance, but use the same that was used for prefiltering. Therefore the ChangeProcessor needs to be given the FilterProvider for each change that it processes. The way it receives changes though is via the Observer.contentChanged. Therefore about the only feasible place to pass the FilterProvider from BackgroundObserver to ChangeProcessor is via the CommitInfo. Thing now is that for external and overflow entries the CommitInfo is null. So I'd say, as long as that's the case it's very hard to implement correctly switching the filter. Unless this switch is done correctly, the only thing that can be said is that: when a filter is changed it is undefined if the old, the new or both filters are applied to entries in the queue. > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4796.changeSet.patch, OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as
[jira] [Comment Edited] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548765#comment-15548765 ] Stefan Egli edited comment on OAK-4796 at 10/5/16 2:14 PM: --- bq. This is wrong. We need to either do prefiltering or postfiltering and can't mix the two. Therefore for prefiltering it's essential to pass around the applied filter in the ContentChange obj and use that later at delivery time. Coming back to this point, there seems to be some issues with this based on the current design: Prior to prefiltering we had only postfiltering. And changing the FilterProvider was applied immediately - basically on all elements in the queue. With prefiltering this is, as pointed out, not correct: those elements in the queue already have gone through prefiltering, so postfiltering should be done with the same FilterProvider. Which means, the ChangeProcessor - which is in charge of postfiltering - should not use the FilterProvider set on its instance, but use the same that was used for prefiltering. Therefore the ChangeProcessor needs to be given the FilterProvider for each change that it processes. The way it receives changes though is via the Observer.contentChanged. Therefore about the only feasible place to pass the FilterProvider from BackgroundObserver to ChangeProcessor is via the CommitInfo. Thing now is that for external and overflow entries the CommitInfo is null. So I'd say, as long as that's the case it's very hard to implement correctly switching the filter. Unless this switch is done correctly, the only thing that can be said is that: when a filter is changed and the queue is not empty, then both filters are applied. However the listener doesn't know anything about the queue internas, so cannot make any conclusions based on that. was (Author: egli): bq. This is wrong. We need to either do prefiltering or postfiltering and can't mix the two. Therefore for prefiltering it's essential to pass around the applied filter in the ContentChange obj and use that later at delivery time. Coming back to this point, there seems to be some issues with this based on the current design: Prior to prefiltering we had only postfiltering. And changing the FilterProvider was applied immediately - basically on all elements in the queue. With prefiltering this is, as pointed out, not correct: those elements in the queue already have gone through prefiltering, so postfiltering should be done with the same FilterProvider. Which means, the ChangeProcessor - which is in charge of postfiltering - should not use the FilterProvider set on its instance, but use the same that was used for prefiltering. Therefore the ChangeProcessor needs to be given the FilterProvider for each change that it processes. The way it receives changes though is via the Observer.contentChanged. Therefore about the only feasible place to pass the FilterProvider from BackgroundObserver to ChangeProcessor is via the CommitInfo. Thing now is that for external and overflow entries the CommitInfo is null. So I'd say, as long as that's the case it's very hard to implement correctly switching the filter. Unless this switch is done correctly, the only thing that can be said is that: when a filter is changed it is undefined for which changes both filters are applied (if the queue is not empty when switching). > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4796.changeSet.patch, OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on
[jira] [Updated] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4796: - Attachment: OAK-4796.changeSet.patch Attaching a second variant of the patch ([^OAK-4796.changeSet.patch]) which is based on Chetan's suggestion to compose a set of changes (parent-paths, propertyNames, nodeTypes, nodeNames) in an Editor (actually, I've used a ValidatorProvider/Validator pair), stores it as a property in the CommitContext of the CommitInfo, and evaluates it in oak-jcr's ChangeProcessor. This patch also includes some minimal statistics in the consolidated listener stats that shows how many commits were either skipped (because the feature was disabled or the CommitInfo null etc), included or excluded. The ObservationTest passes with the prefiltering enabled, however I plan to add some more specific testing still. Would welcome a review of this second approach (compared to the first which was EventFilter-based). /cc [~mreutegg], [~chetanm], [~mduerig], [~catholicon] > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4796.changeSet.patch, OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549104#comment-15549104 ] Stefan Egli commented on OAK-4581: -- Summarizing those numerous, recent threaded comments I believe we have consensus that the goal is to get rid of external use of BackgroundObserver (not scope of this ticket, but a consequence) and that we do the *persistence on the oak-jcr/ChangeProcessor level*. This makes the persistence independent from cache and GC problems, works fine with filter changes, but has the downside that it means larger temporary storage required (as events are 'exploded'). The actual file format of the stored events is somewhat of an orthogonal/detail question, but can be something like a flat file. Unless I hear objections I'm looking at following up on these assumption in the next days. /cc [~chetanm], [~mduerig], [~mreutegg], [~reschke], [~catholicon] > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551799#comment-15551799 ] Stefan Egli commented on OAK-4796: -- Many thanks for this thorough review, [~chetanm]! I'll dig through them in detail. Agreeing on most points, except perhaps re bq. Probably we can keep things simple here and focus on included paths not sure what exactly you mean here: we've collected nodeTypes too, so are you suggesting not to filter by nodeType? if yes, why? > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4796.changeSet.patch, OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4907) Collect changes (paths, nts, props..) of a commit in a validator
[ https://issues.apache.org/jira/browse/OAK-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4907: - Attachment: OAK-4907.patch Attaching [^OAK-4907.patch] which contains mainly: * ChangeSet : the data object holding actual items changed (paths, names, types, properties) * ChangeCollectorProvider: a type ValidationProvider that can be hooked into Oak to have above ChangeSets be generated and stored in CommitContexts > Collect changes (paths, nts, props..) of a commit in a validator > > > Key: OAK-4907 > URL: https://issues.apache.org/jira/browse/OAK-4907 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Fix For: 1.6 > > Attachments: OAK-4907.patch > > > It would be useful to collect a set of changes of a commit (eg in a > validator) that could later be used in an Observer for eg prefiltering. > Such a change collector should collect paths, nodetypes, properties, > node-names (and perhaps more at a later stage) of all changes and store the > result in the CommitInfo's CommitContext. > Note that this is a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509287#comment-15509287 ] Stefan Egli commented on OAK-4581: -- /cc [~mreutegg] > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (OAK-4655) Enable configuring multiple segment nodestore instances in same setup
[ https://issues.apache.org/jira/browse/OAK-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli reassigned OAK-4655: Assignee: Tomek Rękawek (was: Stefan Egli) [~tomek.rekawek], looks good, more generic than my version which had 'observation' hardcoded. Am assigning the ticket to you then, thx! > Enable configuring multiple segment nodestore instances in same setup > - > > Key: OAK-4655 > URL: https://issues.apache.org/jira/browse/OAK-4655 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar, segmentmk >Reporter: Chetan Mehrotra >Assignee: Tomek Rękawek > Fix For: 1.6 > > Attachments: OAK-4655.v1.patch, OAK-4655.v2.patch > > > With OAK-4369 and OAK-4490 its now possible to configure a new > SegmentNodeStore to act as secondry nodestore (OAK-4180). Recently for few > other features we see a requirement to configure a SegmentNodeStore just for > storage purpose. For e.g. > # OAK-4180 - Enables use of SegmentNodeStore as a secondary store to > compliment DocumentNodeStore > #* Always uses BlobStore from primary DocumentNodeStore > #* Compaction to be enabled > # OAK-4654 - Enable use of SegmentNodeStore for private mount in a > multiplexing nodestore setup > #* Might use its own blob store > #* Compaction might be disabled as it would be read only > # OAK-4581 - Proposes to make use of SegmentNodeStore for storing event queue > offline > In all these setups we need to configure a SegmentNodeStore which has > following aspect > # NodeStore instance is not directly exposed but exposed via > {{NodeStoreProvider}} interface with {{role}} service property specifying the > intended usage > # NodeStore here is not fully functional i.e. it would not be configured with > std observers, would not be used by ContentRepository etc > # It needs to be ensured that any JMX MBean registered accounts for "role" so > that there is no collision > With existing SegmentNodeStoreService we can only configure 1 nodestore. To > support above cases we need a OSGi config factory based implementation which > enables creation of multiple SegmentNodeStore instances (each with different > directory and different settings) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509914#comment-15509914 ] Stefan Egli commented on OAK-4581: -- One additional note: should we choose to go the I-B route (queue in ChangeProcessor), then this improvement will not become usable by Sling's ResourceChangeListener - as that has switched to using an OakResourceListener (based on Oak's recommendation to do so, see SLING-3279) and directly bases on BackgroundObserver... > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4796: - Attachment: OAK-4796.patch Attaching patch ([^OAK-4796.patch]) which contains the functionality as described, here's the implementation again in detail: * An Observer can now voluntarily implement an extension interface {{ObserverValidatorProvider}} which has a {{Validator getRootValidator()}} method * The new ObservationFilterValidatorProvider is the main integrator: it has a reference to all such ObserverValidatorProviders and hooks them into the validators via a CompositeValidator * There's now an extension to BackgroundObserver called {{PrefilteringBackgroundObserver}} which implements the new ObserverValidatorProvider and does so by mapping the Validator interface to the EventFilter interface. * The BackgroundObserver handles the new {{"oak.observation.observerFiltersEvaluated"}} and {{"oak.observation.interestedObservers"}} properties that are set on the CommitContext and if it figures a listener does *not* need any event for a commit, marks the *next non filtered* ContentChange with a new {{noopPreviousRoot}} (which is a pointer to the last filtered root) ** This noopPreviousRoot is then used to send out a new {{CommitInfo.NOOP_CHANGE}} token, which indicates to Observers that they should ignore this contentChanged call (but update the root/previousRoot accordingly). * oak-jcr's ChangeProcessor now uses such a PrefilteringBackgroundObserver and does the last puzzle of filtering: it filters the entire change via {{includeCommit}} (which applies things like isExternal/isInternal) * to limit potential performance effects the ChangeProcessor has a flag that control the size of the queue after which prefiltering will be done. Default 20. Additionally, the patch can be used in 'test' mode, in which case a flag is set but BackgroundObserver doesn't evaluate it - instead the ChangeProcessor checks if the flag would have been correct. This test mode would be removed after enough confidence, but could be used in IT testing to verify that filtering would have been done correctly. Pending tasks: * more test cases, IT * performance testing I would appreciate feedback/review of those having been involved in observation. /cc [~mduerig], [~chetanm], [~catholicon], [~mreutegg] > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10)
[jira] [Commented] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509153#comment-15509153 ] Stefan Egli commented on OAK-4796: -- thx for the reviews! I'd look at fixing those points. Before that would be good if we could decide which approach to take: the one suggested via the patch or the one suggested by Chetan. > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4759) Queueless change processor
[ https://issues.apache.org/jira/browse/OAK-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506601#comment-15506601 ] Stefan Egli commented on OAK-4759: -- Regarding {{ConsolidatedChanges}}: I was wondering if we could not separate out the 'ignores commitInfo' part and introduce a similar, perhaps called {{IgnoresCommitInfo}} marker interface (point 2 from [suggestion on list|http://markmail.org/thread/3blnp3lmsc24nbea]): any listener flagged with that would set the {{ignoresEventInfo}} flag on the JackrabbitEventFilter and it would essentially always get {{null}} as the CommitInfo. That would of course allow further optimizations under the hood. Eg in your case if a listener had both {{IgnoresCommitInfo}} and {{ConsolidatedChanges}} set, it would allow a queueless change processor... > Queueless change processor > -- > > Key: OAK-4759 > URL: https://issues.apache.org/jira/browse/OAK-4759 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, documentmk, jcr >Reporter: Marcel Reutegger > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4759.patch, OAK-4759.patch, jackrabbit-api.patch, > jackrabbit-api.patch > > > The initial proposal for this improvement was: > {quote} > Change processing for listeners that are only interested in external > events could be simplified because there is no commit info for > external changes. The basic idea is that the node store implementation > may be able to optimize batch processing of multiple external changes > and does not need to process each external change individually. The > DocumentNodeStore would use the journal to identify external changes > and need to come up with a way to ignore overlapping local changes. > With this new feature, expensive listeners that process local as well > as external events could be split into two separate listeners, each > optimized for the type of changes. > {quote} > Later the proposal was changed in a more general queueless change processor. > See first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506514#comment-15506514 ] Stefan Egli commented on OAK-4581: -- I'd like to move this ticket forward and believe we need a few decisions on the approach: h4. I - Who to persist for There are different possibilities as to where the persisted queue should sit: h5. A - BackgroundObserver In this class the BackgroundObserver's queue is persisted - and that can logically only be based on {{NodeState}}. This will thus support any type of downstream Observer including NodeObserver etc. Being based on Observer it requires GC-prevention. Here's a list of concrete subvariants: h6. 1 - store serialized root state This seemlessly serializes and stores {{NodeState}} objects. Later on they are read and used for diffing. Which means the data must still be available to do the actual diff. This can be achieved by increasing the GC retention period one way or another. What's also important here is that the caches aren't poluted with these late diffs - ie they should probably not be stored in the cache in this late-delivery case. h6. 2 - store serialized diff (and root state) Besides serializing the {{NodeState}} this variant (also) stores the diff. This speeds up later diffing as the diff is then already there (it probably must be stored in 'a cache' temporarily, but only temporarily as it will only be used for one event, likely). This variant is still dependent on preventing GC though, as we're still on the Observer level, which works on {{NodeState}}. h6. 3 - base it on the journal Alternatively the journal is equipped with more diff-like information (perhaps with the full, but perhaps only partially), also see OAK-4586. Otherwise this has same characteristics as I-A-1 and I-A-2: GC must still be prevented, we're still on the Observation/NodeState level. It will be implementation dependent, as Segment doesn't have the same type of journal as Document has. h5. B - ChangeProcessor In this class the queue is handled on the ChangeProcessor level (not in BackgroundObserver), thus no longer based on NodeState, but now independent, the format just must be suitable for calculation and later delivery via onEvent. Being independent of NodeState allows to become independent of GC and cache-hotness issues. However, it's important to note that this class of solutions targets concrete EventListeners, not Observers in general! h6. 1 - store serialized events The ChangeProcessor calculates events as if for delivery to onEvent, but just persists the events as is. This will bloat the amount of data stored and increase I/O. However, later delivery is trivial as all the events are already there, they just have to be read and onEvent called. h6. 2 - store serialized diff The ChangeProcessor stores the serialized diff in a form that it can later be processed by the EventFilter and result in events for delivery to EventListener.onEvent. (This would then be independent from the NodeState) h6. 3 - base it on the journal If the journal contains the complete diff such that ChangeProcessor can evaluate the filters and deliver, then the journal could be enough (however that might be tricky to achieve). Also, this will be implementation dependent, as Segment doesn't have the same type of journal as Document has. In any case, additionally the CommitInfo must be stored somewhere, either also in the Journal or per ChangeProcessor. h4. II Serializing CommitInfo Not sure if we have many options here, I think it's just something we have to do. And if Oak code prevents serialization, then we can fix it. If it's upper/application-layer code that causes problems, we can't do much other than issue a warn. h4. III - Storage Layer This depends a bit on the actual solution chosen. If we base it eg on journal, then a lot comes from there already. If we persist flat events, then surely an extra storage is needed. h5. A - Use a SegmentNodeStore * would be straight forward but has issues as mentioned by Michael. h5. B - Use internas of SegmentNodeStore, eg SegmentWriter * might be much more optimal, but adds dependencies on internas of tarmk. h5. C - store as JSON in a flat file [~mduerig], [~chetanm], [~catholicon], [~tmueller], which variant should we implement? > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a
[jira] [Commented] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506572#comment-15506572 ] Stefan Egli commented on OAK-4796: -- [~chetanm], I see, your approach is completely different. Main differences I see: # ObserverValidatorProvider approach: * 100% filtering of local and external events (whereas the external part is not yet implemented, actually, but would be similar) * for external events the diffing is done as today, so no performance improvements there. But we can also filter entire external events for the individual listeners as for local ones, just at a different location (in the backgroundRead somewhere) # Extracted-Data approach: * not 100% filtering, but perhaps close * makes diffing for other instances in the cluster cheaper so... let's decide which one to go for. > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-4677) stop oak-core bundle only transiently on lease failure
Stefan Egli created OAK-4677: Summary: stop oak-core bundle only transiently on lease failure Key: OAK-4677 URL: https://issues.apache.org/jira/browse/OAK-4677 Project: Jackrabbit Oak Issue Type: Improvement Components: documentmk Affects Versions: 1.5.8, 1.4.6 Reporter: Stefan Egli Assignee: Stefan Egli Fix For: 1.4.7, 1.5.9 Since OAK-3397 the oak-core bundle is stopped (via {{bundle.stop();}}) when the lease with the document store times out (ie lease failed to be updated in time). Using {{bundle.stop();}} leads to an unwanted side-effect, namely that it _changes and persists the autostart_ settings of the bundle. Ie on next startup the oak-core bundle will not automatically start. Using {{bundle.stop(Bundle.STOP_TRANSIENT);}} would avoid this and achieve exactly what was the original intension: to stop the bundle (temporarily) until restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-1312) Bundle nodes into a document
[ https://issues.apache.org/jira/browse/OAK-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424175#comment-15424175 ] Stefan Egli commented on OAK-1312: -- Suggestions for an alternative name for _bundling nodes_: * _collapse_ nodes * _embed_ nodes > Bundle nodes into a document > > > Key: OAK-1312 > URL: https://issues.apache.org/jira/browse/OAK-1312 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core, documentmk >Reporter: Marcel Reutegger >Assignee: Chetan Mehrotra > Labels: performance > Fix For: 1.6 > > > For very fine grained content with many nodes and only few properties per > node it would be more efficient to bundle multiple nodes into a single > MongoDB document. Mostly reading would benefit because there are less > roundtrips to the backend. At the same time storage footprint would be lower > because metadata overhead is per document. > Feature branch - > https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-1312 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OAK-4677) stop oak-core bundle only transiently on lease failure
[ https://issues.apache.org/jira/browse/OAK-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-4677. -- Resolution: Fixed * [fixed in trunk|http://svn.apache.org/viewvc?rev=1756584=rev] * [fixed in 1.4-branch|http://svn.apache.org/viewvc?rev=1756585=rev] > stop oak-core bundle only transiently on lease failure > -- > > Key: OAK-4677 > URL: https://issues.apache.org/jira/browse/OAK-4677 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Affects Versions: 1.4.6, 1.5.8 >Reporter: Stefan Egli >Assignee: Stefan Egli > Fix For: 1.4.7, 1.5.9 > > > Since OAK-3397 the oak-core bundle is stopped (via {{bundle.stop();}}) when > the lease with the document store times out (ie lease failed to be updated in > time). Using {{bundle.stop();}} leads to an unwanted side-effect, namely that > it _changes and persists the autostart_ settings of the bundle. Ie on next > startup the oak-core bundle will not automatically start. > Using {{bundle.stop(Bundle.STOP_TRANSIENT);}} would avoid this and achieve > exactly what was the original intension: to stop the bundle (temporarily) > until restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445633#comment-15445633 ] Stefan Egli commented on OAK-4581: -- Re {{A.1 - Serialized Root States}} I see the following options: # Store entire diff in a json format # Assume that incoming NodeState in {{contentChanged(NodeState,CommitInfo)}} is still the current root, then create a checkpoint and store the checkpoint Id. # a combination of 1. and 2. above: for small diffs store the diff, for large diffs, create a checkpoint # or introduce a more lightweight checkpoint mechanism which allows to _serialize NodeState_ (compared to {{checkpoint}} which is more heavy weight as it stores an id->nodeState mapping including properties). [~chetanm], wdyt, I think having the option to do 4. would be good, but afaics would require a change in either the {{NodeState}} or the {{NodeStore}} APIs. I fear that the assumption done in 2. would perhaps not always hold and is a bit a weak link - which would mean that the only other option would be 1. which is expensive. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli reassigned OAK-4581: Assignee: Stefan Egli I'll be looking at this one and will try to come up with a patch > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-4717) TarNodeStore.checkpoint methods represent endless loop
Stefan Egli created OAK-4717: Summary: TarNodeStore.checkpoint methods represent endless loop Key: OAK-4717 URL: https://issues.apache.org/jira/browse/OAK-4717 Project: Jackrabbit Oak Issue Type: Bug Components: upgrade Affects Versions: 1.5.8 Reporter: Stefan Egli Assignee: Tomek Rękawek Noticed that in [TarNodeStore|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-upgrade/src/main/java/org/apache/jackrabbit/oak/upgrade/cli/node/TarNodeStore.java#L88] all checkpoint related methods are endless loops. [~tomek.rekawek], is that intentional or a bug? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15525978#comment-15525978 ] Stefan Egli commented on OAK-4581: -- Created SLING-6070 for invoking reportChanges after child traversal finished > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4855) Expose actual listener.toString in consolidated listener mbean
[ https://issues.apache.org/jira/browse/OAK-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15525741#comment-15525741 ] Stefan Egli commented on OAK-4855: -- [~chetanm], could you pls have a look at OAK-4855 and JCR-4032? I think it would be a simple but useful, small extension. Thx! > Expose actual listener.toString in consolidated listener mbean > -- > > Key: OAK-4855 > URL: https://issues.apache.org/jira/browse/OAK-4855 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.10 >Reporter: Stefan Egli >Assignee: Stefan Egli > Fix For: 1.5.11 > > Attachments: OAK-4855.patch > > > With SLING-6056 more listeners (in the form of ResourceChangeListeners) will > be mapped 1:1 to either BackgroundObservers or JCR EventListeners. That > means, they will also be exposed in the consolidated listeners stats. Without > any change though, all that can be seen in that stats is the name of that > 'bridge/mapper' listener (ie either JcrResourceListener or > OakResourceListener), since currently all that is exposed is > {{getClass().toString()}} - and the actual ResourceChangeListener sitting 2 > steps behind is not visible. > In JCR-4032 I'm suggesting to introduce a {{getToString()}} to the > EventListenerMBean, and once that would be available, this could be exposed > in the ConsolidatedListenerMBeanImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15525942#comment-15525942 ] Stefan Egli commented on OAK-4581: -- * re 'maps of added/removed/changed items' : I think we might be able to remove these, or lets say drastically reduce its size: by knowing that the events are coming in in a breadth first manner, the JcrResourceListener could build up and report changes whenever a child traversal finished - no need to wait with calling reportChanges until the very end (as the OakResourceListener sends out events also after child traversal finished). * re 'osgiEventQueue' : that has gone with SLING-5163 / [here|https://github.com/apache/sling/commit/9c424dfca6a802a6b66b4b7981a313c5856a0e1f] * Re 'Access control' : that's probably still an open issue indeed, however it is the case for both OakResourceListener *and* JcrResourceListener, so that's not an argument against JcrResourceListener > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532190#comment-15532190 ] Stefan Egli commented on OAK-4581: -- [~cziegeler], so IIUC then support for these changed/added/removed properties is the reason why these maps have to first be filled in the JcrResourceListener, and this is causing memory issue if we're talking about a huge change. If we no longer propagate property changes in the JRL, then we could avoid these maps. Do I understand you correct that you're suggesting to remove this support then? This would allow us indeed to get rid of the OakResourceListener. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4859) Warn if lease update is invoked with large delay
[ https://issues.apache.org/jira/browse/OAK-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15529441#comment-15529441 ] Stefan Egli commented on OAK-4859: -- I wasn't too much aware of OAK-4770 details, good to know! So OAK-4770 measure how long doing one particular lease update takes, while I was thinking of measuring the time between 2 calls to {{renewLease()}}. > Warn if lease update is invoked with large delay > > > Key: OAK-4859 > URL: https://issues.apache.org/jira/browse/OAK-4859 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Affects Versions: 1.5.10 >Reporter: Stefan Egli > Fix For: 1.6 > > > DocumentMk's lease mechanism is built on the assumption that the lease is > periodically updated by each instance. If the update doesn't happen within a > certain time - and the instance hasn't crashed - there's the risk of the own > lease to fail. It is therefore important that the lease update happens > without (large) delay according to the configured period. > One pattern where this doesn't happen is when the VM is under heavy load due > to JVM-Full-GC cycles. It seems likely that a memory problem doesn't normally > happen instantly, but slowly builds up. Based on this assumption we could > introduce a check that compares the actual time since last lease update with > the desired period. If these two diverge _a lot_ then we can at least issue a > log.warn. This might help to easier identify this type of lease failures and > perhaps find root causes earlier/easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-4859) Warn if lease update is invoked with large delay
Stefan Egli created OAK-4859: Summary: Warn if lease update is invoked with large delay Key: OAK-4859 URL: https://issues.apache.org/jira/browse/OAK-4859 Project: Jackrabbit Oak Issue Type: Improvement Components: documentmk Affects Versions: 1.5.10 Reporter: Stefan Egli Fix For: 1.6 DocumentMk's lease mechanism is built on the assumption that the lease is periodically updated by each instance. If the update doesn't happen within a certain time - and the instance hasn't crashed - there's the risk of the own lease to fail. It is therefore important that the lease update happens without (large) delay according to the configured period. One pattern where this doesn't happen is when the VM is under heavy load due to JVM-Full-GC cycles. It seems likely that a memory problem doesn't normally happen instantly, but slowly builds up. Based on this assumption we could introduce a check that compares the actual time since last lease update with the desired period. If these two diverge _a lot_ then we can at least issue a log.warn. This might help to easier identify this type of lease failures and perhaps find root causes earlier/easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526714#comment-15526714 ] Stefan Egli commented on OAK-4581: -- [~cziegeler], re bq. Now, an additional reason at least for Sling was that the event bridge we have was reading all added/changed nodes to get the resource type property. I can't find this dependency in the code, do you know where that's coded in JcrResourceListener? IIUC then the reason for having these addedEvents/changedEvents/removedEvents maps in onEvent is to be able to build a JcrResourceChanged obj that contains all changed properties (unrelated to resource type - but perhaps that's hidden somewhere down deep..) > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526714#comment-15526714 ] Stefan Egli edited comment on OAK-4581 at 9/27/16 5:02 PM: --- [~cziegeler], re bq. Now, an additional reason at least for Sling was that the event bridge we have was reading all added/changed nodes to get the resource type property. -I can't find this dependency in the code, do you know where that's coded in JcrResourceListener? IIUC then the reason for having these addedEvents/changedEvents/removedEvents maps in onEvent is to be able to build a JcrResourceChanged obj that contains all changed properties (unrelated to resource type - but perhaps that's hidden somewhere down deep..)- EDIT: Are you referring to [OsgiObservationBridge.sendOsgiEvent:131|https://github.com/apache/sling/blob/9c424dfca6a802a6b66b4b7981a313c5856a0e1f/bundles/resourceresolver/src/main/java/org/apache/sling/resourceresolver/impl/observation/OsgiObservationBridge.java#L131] ? IIUC that reads the current resource explicitly to get the resourceType. So sure, having the resource type as part of the event (as OAK-4853 would provide) would be a handy thing, even if slighlty unexpected. However, I fail to see how this justifies the addedEvents/changedEvents/removedEvents in JcrResourceListener. IIUC the reason for building these maps is to be able to generate _correct_ JcrResourceChange objs - as they must contain all properties of a particular node - and those come in separate {{Events}}. So if the goal is to prevent those maps to become huge (or to not have them at all), then this has nothing to do with the resource type in my view. If we shouldn't rely on the breadth-first traversal, then the only alternative would be to get all properties that have been changed/added/removed on the corresponding {{Event}} of the node (which is slightly different from OAK-4853). wdyt? was (Author: egli): [~cziegeler], re bq. Now, an additional reason at least for Sling was that the event bridge we have was reading all added/changed nodes to get the resource type property. I can't find this dependency in the code, do you know where that's coded in JcrResourceListener? IIUC then the reason for having these addedEvents/changedEvents/removedEvents maps in onEvent is to be able to build a JcrResourceChanged obj that contains all changed properties (unrelated to resource type - but perhaps that's hidden somewhere down deep..) > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care
[jira] [Commented] (OAK-4855) Expose actual listener.toString in consolidated listener mbean
[ https://issues.apache.org/jira/browse/OAK-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541827#comment-15541827 ] Stefan Egli commented on OAK-4855: -- right, there's one "toString" too many. I'll wait for the next jackrabbit release and will then finish this one up. > Expose actual listener.toString in consolidated listener mbean > -- > > Key: OAK-4855 > URL: https://issues.apache.org/jira/browse/OAK-4855 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.10 >Reporter: Stefan Egli >Assignee: Stefan Egli > Fix For: 1.6 > > Attachments: OAK-4855.patch > > > With SLING-6056 more listeners (in the form of ResourceChangeListeners) will > be mapped 1:1 to either BackgroundObservers or JCR EventListeners. That > means, they will also be exposed in the consolidated listeners stats. Without > any change though, all that can be seen in that stats is the name of that > 'bridge/mapper' listener (ie either JcrResourceListener or > OakResourceListener), since currently all that is exposed is > {{getClass().toString()}} - and the actual ResourceChangeListener sitting 2 > steps behind is not visible. > In JCR-4032 I'm suggesting to introduce a {{getToString()}} to the > EventListenerMBean, and once that would be available, this could be exposed > in the ConsolidatedListenerMBeanImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522996#comment-15522996 ] Stefan Egli commented on OAK-4581: -- that sounds like a new class of listener that would want 'at-least-once' delivery in a cluster. Something probably useful, but I'm not sure if that fits into the observation umbrella of the JCR API. I think that's orthogonal to this ticket (somewhat) and could probably be handled separately? > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522461#comment-15522461 ] Stefan Egli commented on OAK-4581: -- [~catholicon], thx for your feedback! Re bq. I thought this issue was about persisting change pointers Agreed. And when looking at persisting NodeState(+CommitInfo) then the revGC issue comes up (you must later be able to do the diff, and revGC must not clean up things before the persited observation queues haven't been processed). And from this resulted the idea to not persist NodeState but the actual, calculated Event (even though that would bloat the storage, as it would become much simpler). However, this now again conflicts with support for any type of BackgroundObserver, not only ChangeProcessor. So I think the latter question becomes central now, and if we want to support any BackgroundObserver we need to persist NodeState and prevent revGC from cleaning up too early. bq. Afaics, we still want remain wary of infinite storage of pointers bq. Sure if we're saying that revGC deleted node states Exactly. There's a dilemma: we want to prevent revGC to being 'paused' for too long just because of observation - but if it isn't paused then such a slow or overwhelmed listener would loose events. We have to make a choice, it's a binary thing. Perhaps we have to cut off events after a certain time (eg after exactly 24hours to fit into the segment-tar's default generation cycle)? (The advantage of persisting events would have been that it wouldn't have needed such a cut-off..) > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] >
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523087#comment-15523087 ] Stefan Egli commented on OAK-4581: -- bq. 3. those observers get events post reboot Not sure that's really the case. I would argue that after a restart these persisted events get deleted. IMO what 'persist' in the context of this ticket emphasizes is only additional memory at runtime, not that it survives restarts. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4855) Expose actual listener.toString in consolidated listener mbean
[ https://issues.apache.org/jira/browse/OAK-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4855: - Attachment: OAK-4855.patch Attaching patch [^OAK-4855.patch] > Expose actual listener.toString in consolidated listener mbean > -- > > Key: OAK-4855 > URL: https://issues.apache.org/jira/browse/OAK-4855 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.10 >Reporter: Stefan Egli >Assignee: Stefan Egli > Fix For: 1.5.11 > > Attachments: OAK-4855.patch > > > With SLING-6056 more listeners (in the form of ResourceChangeListeners) will > be mapped 1:1 to either BackgroundObservers or JCR EventListeners. That > means, they will also be exposed in the consolidated listeners stats. Without > any change though, all that can be seen in that stats is the name of that > 'bridge/mapper' listener (ie either JcrResourceListener or > OakResourceListener), since currently all that is exposed is > {{getClass().toString()}} - and the actual ResourceChangeListener sitting 2 > steps behind is not visible. > In JCR-4032 I'm suggesting to introduce a {{getToString()}} to the > EventListenerMBean, and once that would be available, this could be exposed > in the ConsolidatedListenerMBeanImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-4855) Expose actual listener.toString in consolidated listener mbean
Stefan Egli created OAK-4855: Summary: Expose actual listener.toString in consolidated listener mbean Key: OAK-4855 URL: https://issues.apache.org/jira/browse/OAK-4855 Project: Jackrabbit Oak Issue Type: Improvement Components: jcr Affects Versions: 1.5.10 Reporter: Stefan Egli Assignee: Stefan Egli Fix For: 1.5.11 With SLING-6056 more listeners (in the form of ResourceChangeListeners) will be mapped 1:1 to either BackgroundObservers or JCR EventListeners. That means, they will also be exposed in the consolidated listeners stats. Without any change though, all that can be seen in that stats is the name of that 'bridge/mapper' listener (ie either JcrResourceListener or OakResourceListener), since currently all that is exposed is {{getClass().toString()}} - and the actual ResourceChangeListener sitting 2 steps behind is not visible. In JCR-4032 I'm suggesting to introduce a {{getToString()}} to the EventListenerMBean, and once that would be available, this could be exposed in the ConsolidatedListenerMBeanImpl. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532623#comment-15532623 ] Stefan Egli commented on OAK-4796: -- As discussed offline with Marcel, I'll work on a patch for the 2nd variant, so that we can compare the complexity/result. > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4916) Add support for excluding commits to BackgroundObserver
[ https://issues.apache.org/jira/browse/OAK-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4916: - Attachment: OAK-4916.patch Attaching [^OAK-4916.patch] which implements the {{isExcluded}} subclass hook as well as the {{NOOP_CHANGED}} CommitInfo in BackgroundObserver as described, including test cases. > Add support for excluding commits to BackgroundObserver > --- > > Key: OAK-4916 > URL: https://issues.apache.org/jira/browse/OAK-4916 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Fix For: 1.6 > > Attachments: OAK-4916.patch > > > As part of pre-filtering commits it would be useful to have support in the > BackgroundObserver (in general) that would allow to exclude certain commits > from being added to the (BackgroundObserver's) queue, thus keeping the queue > smaller. The actual filtering is up to subclasses. > The suggested implementation is as follows: > * a new method {{isExcluded}} is introduced which represents a subclass hook > for filtering > * excluded commits are not added to the queue > * when multiple commits are excluded subsequently, this is collapsed > * the first non-excluded commit (ContentChange) added to the queue is marked > with the last non-excluded root state as the 'previous root' > * downstream Observers are notified of the exclusion of a commit via a > special CommitInfo {{NOOP_CHANGE}}: this instructs it to exclude this change > while at the same time 'fast-forwarding' the root state to the new one. > ** this extra token is one way of solving the problem that > {{Observer.contentChanged}} represents a diff between two states but does not > transport the 'from' state explicitly - that is implicitly taken from the > previous call to {{contentChanged}}. Thus using such a gap token > ({{NOOP_CHANGE}}) seems to be the only way to instruct Observers to skip a > change. > To repeat: whoever extends BackgroundObserver with filtering must be aware of > the new {{NOOP_CHANGE}} token. Anyone not doing filtering will not get any > {{NOOP_CHANGE}} tokens though. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-5023) add applyOnOwnNode property to OakEventFilter
Stefan Egli created OAK-5023: Summary: add applyOnOwnNode property to OakEventFilter Key: OAK-5023 URL: https://issues.apache.org/jira/browse/OAK-5023 Project: Jackrabbit Oak Issue Type: Improvement Components: jcr Affects Versions: 1.5.12 Reporter: Stefan Egli Assignee: Stefan Egli There seems to be a rather frequent use case of observation around which would like to create a filter on a _child_ rather than on a _parent_: consider the case when you'd like to filter for the removal of a node that has a particular nodeType. This can't be achieved atm as the nodeType is applicable to the parent of the node that changes, not the node itself (ie child). Therefore suggesting the introduction of a flag similar to the following: {code} boolean applyOnOwnNode; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-5019) Support glob patterns through OakEventFilter
[ https://issues.apache.org/jira/browse/OAK-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-5019: - Description: (Originally reported as JCR-4044, but moved to Oak as a result of introducing the OakEventFilter in OAK-5013. From the original description: ) In the Sling project, we would like to register JCR listeners based on glob patterns as defined in https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/plugins/observation/filter/GlobbingPathFilter.html So basically instead (or in addition) to specifying an absolute path, defining patterns. [Discussion thread|https://lists.apache.org/thread.html/300f84574bbb039cebe35aab1afc21e043560a1c0471e456a2f5c0be@%3Cdev.jackrabbit.apache.org%3E] /cc [~cziegeler] was: (Originally reported as JCR-4044, but moved to Oak as a result of introducing the OakEventFilter in OAK-5013. From the original description:) In the Sling project, we would like to register JCR listeners based on glob patterns as defined in https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/plugins/observation/filter/GlobbingPathFilter.html So basically instead (or in addition) to specifying an absolute path, defining patterns. [Discussion thread|https://lists.apache.org/thread.html/300f84574bbb039cebe35aab1afc21e043560a1c0471e456a2f5c0be@%3Cdev.jackrabbit.apache.org%3E] /cc [~cziegeler] > Support glob patterns through OakEventFilter > > > Key: OAK-5019 > URL: https://issues.apache.org/jira/browse/OAK-5019 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.12 >Reporter: Stefan Egli >Assignee: Stefan Egli > > (Originally reported as JCR-4044, but moved to Oak as a result of introducing > the OakEventFilter in OAK-5013. From the original description: ) > In the Sling project, we would like to register JCR listeners based on glob > patterns as defined in > https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/plugins/observation/filter/GlobbingPathFilter.html > So basically instead (or in addition) to specifying an absolute path, > defining patterns. > [Discussion > thread|https://lists.apache.org/thread.html/300f84574bbb039cebe35aab1afc21e043560a1c0471e456a2f5c0be@%3Cdev.jackrabbit.apache.org%3E] > /cc [~cziegeler] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-5023) add applyOnOwnNode property to OakEventFilter
[ https://issues.apache.org/jira/browse/OAK-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-5023: - Description: (Originally reported as JCR-4048, but moved to Oak as a result of introducing the OakEventFilter in OAK-5013. From the original description: ) There seems to be a rather frequent use case of observation around which would like to create a filter on a _child_ rather than on a _parent_: consider the case when you'd like to filter for the removal of a node that has a particular nodeType. This can't be achieved atm as the nodeType is applicable to the parent of the node that changes, not the node itself (ie child). Therefore suggesting the introduction of a flag similar to the following: {code} boolean applyOnOwnNode; {code} was: There seems to be a rather frequent use case of observation around which would like to create a filter on a _child_ rather than on a _parent_: consider the case when you'd like to filter for the removal of a node that has a particular nodeType. This can't be achieved atm as the nodeType is applicable to the parent of the node that changes, not the node itself (ie child). Therefore suggesting the introduction of a flag similar to the following: {code} boolean applyOnOwnNode; {code} > add applyOnOwnNode property to OakEventFilter > - > > Key: OAK-5023 > URL: https://issues.apache.org/jira/browse/OAK-5023 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.12 >Reporter: Stefan Egli >Assignee: Stefan Egli > > (Originally reported as JCR-4048, but moved to Oak as a result of introducing > the OakEventFilter in OAK-5013. From the original description: ) > There seems to be a rather frequent use case of observation around which > would like to create a filter on a _child_ rather than on a _parent_: > consider the case when you'd like to filter for the removal of a node that > has a particular nodeType. This can't be achieved atm as the nodeType is > applicable to the parent of the node that changes, not the node itself (ie > child). > Therefore suggesting the introduction of a flag similar to the following: > {code} > boolean applyOnOwnNode; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-5019) Support glob patterns through OakEventFilter
Stefan Egli created OAK-5019: Summary: Support glob patterns through OakEventFilter Key: OAK-5019 URL: https://issues.apache.org/jira/browse/OAK-5019 Project: Jackrabbit Oak Issue Type: Improvement Components: jcr Affects Versions: 1.5.12 Reporter: Stefan Egli Assignee: Stefan Egli (Originally reported as JCR-4044, but moved to Oak as a result of introducing the OakEventFilter in OAK-5013. From the original description:) In the Sling project, we would like to register JCR listeners based on glob patterns as defined in https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/plugins/observation/filter/GlobbingPathFilter.html So basically instead (or in addition) to specifying an absolute path, defining patterns. [Discussion thread|https://lists.apache.org/thread.html/300f84574bbb039cebe35aab1afc21e043560a1c0471e456a2f5c0be@%3Cdev.jackrabbit.apache.org%3E] /cc [~cziegeler] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5013) Introduce observation filter extension mechanism to Oak
[ https://issues.apache.org/jira/browse/OAK-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611199#comment-15611199 ] Stefan Egli commented on OAK-5013: -- agreed, I'll adjust the wording accordingly. "write-through" was meant to be only between OakEventListener and the underlying, not that it modifies the filter after registration. > Introduce observation filter extension mechanism to Oak > --- > > Key: OAK-5013 > URL: https://issues.apache.org/jira/browse/OAK-5013 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.12 >Reporter: Stefan Egli >Assignee: Stefan Egli > Fix For: 1.5.13 > > Attachments: OAK-5013.patch > > > During [discussions|http://markmail.org/thread/fyngo4dwb7fvlqdj] regarding > extending JackrabbitEventFilter an interesting suggestion by [~mduerig] came > up that we could extend the JackrabbitEventFilter in oak explicitly, rather > than modifying it in Jackrabbit each time we add something oak-specific. > We should introduce a builder/interface pair in oak to reflect that. > Users that would like to use such oak-specific extensions would wrap a > JackrabbitEventFilter and get an OakEventFilter that contains enabling these > extensions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-5021) Improve observation of files
Stefan Egli created OAK-5021: Summary: Improve observation of files Key: OAK-5021 URL: https://issues.apache.org/jira/browse/OAK-5021 Project: Jackrabbit Oak Issue Type: Improvement Components: jcr Affects Versions: 1.5.12 Reporter: Stefan Egli Assignee: Stefan Egli (Originally reported as JCR-4046, but moved to Oak as a result of introducing the OakEventFilter in OAK-5013. From the original description: ) A file in JCR is represented by at least two nodes, the nt:file node and a child node named jcr:content holding the contents of the file (and metadata). This has the consequence that if the contents of a file changes, a change event of the jcr:content node is reported - but not of the nt:file node. This makes creating listeners listening for changes in files complicated, as you can't use the file name to filter - especially with glob patterns (see JCR-4044 - now OAK-5019) this becomes troublesome. In addition, whenever you get a change for a jcr:content node, you have to check if the parent is a nt:file node and decide based on the result. It would be great to have a flag on the JackrabbitEventFilter to enable smarter reporting just for nt:files: if a property on jcr:content is changed, a change to the nt:file node is reported. See also SLING-6163 and OAK-4940 /cc [~cziegeler] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-5022) add includeSubtreeOnDelete flag to OakEventFilter
Stefan Egli created OAK-5022: Summary: add includeSubtreeOnDelete flag to OakEventFilter Key: OAK-5022 URL: https://issues.apache.org/jira/browse/OAK-5022 Project: Jackrabbit Oak Issue Type: Improvement Components: jcr Affects Versions: 1.5.12 Reporter: Stefan Egli Assignee: Stefan Egli (Originally reported as JCR-4037, but moved to Oak as a result of introducing the OakEventFilter in OAK-5013. From the original description: ) In some cases of observation it would be useful to receive events of child node or properties of a parent/grandparent that was deleted. Currently (in Oak at least) one only receives a deleted event for the node that was deleted and none of the children. Suggesting to (re)introduce a flag, eg as follows to the JackrabbitEventFilter: {code} boolean includeSubtreeOnRemove; {code} (Open for any better name of course) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-5020) Improved support for node removals
Stefan Egli created OAK-5020: Summary: Improved support for node removals Key: OAK-5020 URL: https://issues.apache.org/jira/browse/OAK-5020 Project: Jackrabbit Oak Issue Type: Improvement Components: jcr Affects Versions: 1.5.12 Reporter: Stefan Egli Assignee: Stefan Egli (Originally reported as JCR-4045, but moved to Oak as a result of introducing the OakEventFilter in OAK-5013. From the original description: ) If a listener is subscribed for removal events of a subtree, e.g. /a/b/c/d it gets removal events for everything in that three. However, if /a/b is removed, the listener is not informed at all, which makes the listener state inconsistent/invalid I suggest to add a new flag to the JackrabbitEventFilter and if that is enabled the listener will get remove events of all the parent nodes - if the listener is interested in remove events of any kind. /cc [~cziegeler] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-5013) Introduce observation filter extension mechanism to Oak
[ https://issues.apache.org/jira/browse/OAK-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-5013: - Fix Version/s: 1.5.13 > Introduce observation filter extension mechanism to Oak > --- > > Key: OAK-5013 > URL: https://issues.apache.org/jira/browse/OAK-5013 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.12 >Reporter: Stefan Egli >Assignee: Stefan Egli > Fix For: 1.5.13 > > Attachments: OAK-5013.patch > > > During [discussions|http://markmail.org/thread/fyngo4dwb7fvlqdj] regarding > extending JackrabbitEventFilter an interesting suggestion by [~mduerig] came > up that we could extend the JackrabbitEventFilter in oak explicitly, rather > than modifying it in Jackrabbit each time we add something oak-specific. > We should introduce a builder/interface pair in oak to reflect that. > Users that would like to use such oak-specific extensions would wrap a > JackrabbitEventFilter and get an OakEventFilter that contains enabling these > extensions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-5013) Introduce observation filter extension mechanism to Oak
[ https://issues.apache.org/jira/browse/OAK-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-5013: - Attachment: OAK-5013.patch Attached [^OAK-5013.patch] shows a static factory and an abstract base class {{OakEventFilter}} which could serve as an API. Later changes to the {{OakEventFilter}} could then be done in a way that existing client code would not have to be touched ("@ProviderType") > Introduce observation filter extension mechanism to Oak > --- > > Key: OAK-5013 > URL: https://issues.apache.org/jira/browse/OAK-5013 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.12 >Reporter: Stefan Egli >Assignee: Stefan Egli > Attachments: OAK-5013.patch > > > During [discussions|http://markmail.org/thread/fyngo4dwb7fvlqdj] regarding > extending JackrabbitEventFilter an interesting suggestion by [~mduerig] came > up that we could extend the JackrabbitEventFilter in oak explicitly, rather > than modifying it in Jackrabbit each time we add something oak-specific. > We should introduce a builder/interface pair in oak to reflect that. > Users that would like to use such oak-specific extensions would wrap a > JackrabbitEventFilter and get an OakEventFilter that contains enabling these > extensions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OAK-5013) Introduce observation filter extension mechanism to Oak
Stefan Egli created OAK-5013: Summary: Introduce observation filter extension mechanism to Oak Key: OAK-5013 URL: https://issues.apache.org/jira/browse/OAK-5013 Project: Jackrabbit Oak Issue Type: Improvement Components: jcr Affects Versions: 1.5.12 Reporter: Stefan Egli Assignee: Stefan Egli During [discussions|http://markmail.org/thread/fyngo4dwb7fvlqdj] regarding extending JackrabbitEventFilter an interesting suggestion by [~mduerig] came up that we could extend the JackrabbitEventFilter in oak explicitly, rather than modifying it in Jackrabbit each time we add something oak-specific. We should introduce a builder/interface pair in oak to reflect that. Users that would like to use such oak-specific extensions would wrap a JackrabbitEventFilter and get an OakEventFilter that contains enabling these extensions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4908: - Fix Version/s: 1.6 > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: review > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch, > OAK-4908.v4.patch, OAK-4908.v5.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4916) Add support for excluding commits to BackgroundObserver
[ https://issues.apache.org/jira/browse/OAK-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4916: - Fix Version/s: 1.6 > Add support for excluding commits to BackgroundObserver > --- > > Key: OAK-4916 > URL: https://issues.apache.org/jira/browse/OAK-4916 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Fix For: 1.6, 1.5.13 > > Attachments: FilteringObserver.patch, OAK-4916.patch, > OAK-4916.v2.patch > > > As part of pre-filtering commits it would be useful to have support in the > BackgroundObserver (in general) that would allow to exclude certain commits > from being added to the (BackgroundObserver's) queue, thus keeping the queue > smaller. The actual filtering is up to subclasses. > The suggested implementation is as follows: > * a new method {{isExcluded}} is introduced which represents a subclass hook > for filtering > * excluded commits are not added to the queue > * when multiple commits are excluded subsequently, this is collapsed > * the first non-excluded commit (ContentChange) added to the queue is marked > with the last non-excluded root state as the 'previous root' > * downstream Observers are notified of the exclusion of a commit via a > special CommitInfo {{NOOP_CHANGE}}: this instructs it to exclude this change > while at the same time 'fast-forwarding' the root state to the new one. > ** this extra token is one way of solving the problem that > {{Observer.contentChanged}} represents a diff between two states but does not > transport the 'from' state explicitly - that is implicitly taken from the > previous call to {{contentChanged}}. Thus using such a gap token > ({{NOOP_CHANGE}}) seems to be the only way to instruct Observers to skip a > change. > To repeat: whoever extends BackgroundObserver with filtering must be aware of > the new {{NOOP_CHANGE}} token. Anyone not doing filtering will not get any > {{NOOP_CHANGE}} tokens though. > NOTE: See [comment further > below|https://issues.apache.org/jira/browse/OAK-4916?focusedCommentId=15572165=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15572165] > with a new suggested approach, which doesn't use NOOP_CHANGED but introduces > a new FilteringAwareObserver instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4796: - Fix Version/s: 1.6 > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4796.changeSet.patch, OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4907) Collect changes (paths, nts, props..) of a commit in a validator
[ https://issues.apache.org/jira/browse/OAK-4907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4907: - Fix Version/s: 1.6 > Collect changes (paths, nts, props..) of a commit in a validator > > > Key: OAK-4907 > URL: https://issues.apache.org/jira/browse/OAK-4907 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Fix For: 1.6, 1.5.13 > > Attachments: OAK-4907.patch, OAK-4907.v2.patch > > > It would be useful to collect a set of changes of a commit (eg in a > validator) that could later be used in an Observer for eg prefiltering. > Such a change collector should collect paths, nodetypes, properties, > node-names (and perhaps more at a later stage) of all changes and store the > result in the CommitInfo's CommitContext. > Note that this is a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5072) ChangeCollectorProvider should enable metatype support
[ https://issues.apache.org/jira/browse/OAK-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15639557#comment-15639557 ] Stefan Egli commented on OAK-5072: -- +1, thx, looks good (just a very minor typo fixed in 1768208) > ChangeCollectorProvider should enable metatype support > -- > > Key: OAK-5072 > URL: https://issues.apache.org/jira/browse/OAK-5072 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: core >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra >Priority: Minor > Fix For: 1.6, 1.5.13 > > > {{ChangeCollectorProvider}} exposes some OSGi config but does not have > metatype config enable. To allow configuring those config params it should be > enabled -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5066) Provide a config option to disable lease check at DocumentNodeStoreService level
[ https://issues.apache.org/jira/browse/OAK-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15639611#comment-15639611 ] Stefan Egli commented on OAK-5066: -- [~chetanm], shouldn't {{setLeaseCheck}} be set to true by default? I think you might have to invert that flag there. > Provide a config option to disable lease check at DocumentNodeStoreService > level > > > Key: OAK-5066 > URL: https://issues.apache.org/jira/browse/OAK-5066 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: documentmk >Reporter: Chetan Mehrotra >Assignee: Chetan Mehrotra >Priority: Minor > Fix For: 1.6 > > Attachments: OAK-5066-v1.patch > > > Currently its not possible to disable lease check complete at > DocumentNodeStoreService. System property > {{oak.documentMK.disableLeaseCheck}} only disables some logic in > ClusterNodeInfo but lease check wrapper still gets used. > We should modify the logic also avoid wrapping if this property is enabled -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646805#comment-15646805 ] Stefan Egli commented on OAK-4908: -- [~mreutegg], why are you reverting? Was just going to fix OAK-5082 > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Blocker > Labels: review > Fix For: 1.5.13 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch, > OAK-4908.v4.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646862#comment-15646862 ] Stefan Egli commented on OAK-4908: -- you mean the failing test of OAK-5082 or something else? One failing test shouldn't cause too much drama if it's going to be fixed quickly > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Blocker > Labels: review > Fix For: 1.5.13 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch, > OAK-4908.v4.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5082) Test failure: testDerivedMixin(org.apache.jackrabbit.core.observation.MixinTest)
[ https://issues.apache.org/jira/browse/OAK-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646954#comment-15646954 ] Stefan Egli commented on OAK-5082: -- OAK-4908 is currently reverted, however a fix for OAK-5082 would have been (in ObservationManagerImpl.addEventListener): {code} // OAK-5082 : node type filtering should not only be direct but include derived types // one easy way to solve this is to 'explode' the node types at start by including // all subtypes of every registered node type //TODO: what if there are multiple hierarchy levels, does subTypes return those too? HashSet explodedNodeTypes = newHashSet(); if (validatedNodeTypeNames != null) { for (String nt : validatedNodeTypeNames) { NodeTypeIterator it = ntMgr.getNodeType(nt).getSubtypes(); while(it.hasNext()) { String subnt = String.valueOf(it.next()); explodedNodeTypes.add(subnt); } explodedNodeTypes.add(nt); } } {code} > Test failure: > testDerivedMixin(org.apache.jackrabbit.core.observation.MixinTest) > > > Key: OAK-5082 > URL: https://issues.apache.org/jira/browse/OAK-5082 > Project: Jackrabbit Oak > Issue Type: Test > Components: jcr >Reporter: Amit Jain >Assignee: Marcel Reutegger > > Failed on travis - > https://travis-ci.org/apache/jackrabbit-oak/builds/173972640 as well as > locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4908: - Attachment: OAK-4908.v5.patch Attaching [^OAK-4908.v5.patch] which includes v4 changes by [~mreutegg], and includes the fix for OAK-5082 which includes exploding node types before passing to the ChangeSetFilterImpl. tests on oak-core/oak-jcr run fine with this - pending -PintegrationTesting > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: review > Fix For: 1.6 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch, > OAK-4908.v4.patch, OAK-4908.v5.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646903#comment-15646903 ] Stefan Egli commented on OAK-4908: -- unscheduling from 1.5.13 then > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: review > Fix For: 1.6 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch, > OAK-4908.v4.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-5082) Test failure: testDerivedMixin(org.apache.jackrabbit.core.observation.MixinTest)
[ https://issues.apache.org/jira/browse/OAK-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647174#comment-15647174 ] Stefan Egli commented on OAK-5082: -- bq. Does the ChangeSet also include mixin types? yes: [here|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/observation/ChangeCollectorProvider.java#L204] > Test failure: > testDerivedMixin(org.apache.jackrabbit.core.observation.MixinTest) > > > Key: OAK-5082 > URL: https://issues.apache.org/jira/browse/OAK-5082 > Project: Jackrabbit Oak > Issue Type: Test > Components: jcr >Reporter: Amit Jain >Assignee: Marcel Reutegger > > Failed on travis - > https://travis-ci.org/apache/jackrabbit-oak/builds/173972640 as well as > locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4908: - Priority: Major (was: Blocker) > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: review > Fix For: 1.5.13 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch, > OAK-4908.v4.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4908: - Fix Version/s: (was: 1.5.13) 1.6 > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: review > Fix For: 1.6 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch, > OAK-4908.v4.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647227#comment-15647227 ] Stefan Egli commented on OAK-4908: -- if integration test runs fine locally (just started now) I'd suggest to still include this in today's 1.5.13, [~edivad], wdyt? > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: review > Fix For: 1.6 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch, > OAK-4908.v4.patch, OAK-4908.v5.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OAK-4796) filter events before adding to ChangeProcessor's queue
[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-4796. -- Resolution: Fixed Fix Version/s: (was: 1.6) 1.5.13 sub-tasks have been finished, closing this one > filter events before adding to ChangeProcessor's queue > -- > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr >Affects Versions: 1.5.9 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: observation > Fix For: 1.5.13 > > Attachments: OAK-4796.changeSet.patch, OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-4908. -- Resolution: Fixed Committed to trunk in http://svn.apache.org/viewvc?rev=1768558=rev > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli >Priority: Blocker > Labels: review > Fix For: 1.5.13 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647389#comment-15647389 ] Stefan Egli commented on OAK-4908: -- integration tests run fine and I hope it's fine to still commit this, which I've done now in http://svn.apache.org/viewvc?rev=1768673=rev > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: review > Fix For: 1.6 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch, > OAK-4908.v4.patch, OAK-4908.v5.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli resolved OAK-4908. -- Resolution: Fixed > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: review > Fix For: 1.5.13 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch, > OAK-4908.v4.patch, OAK-4908.v5.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OAK-4908) Best-effort prefiltering in ChangeProcessor based on ChangeSet
[ https://issues.apache.org/jira/browse/OAK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Egli updated OAK-4908: - Fix Version/s: (was: 1.6) 1.5.13 > Best-effort prefiltering in ChangeProcessor based on ChangeSet > -- > > Key: OAK-4908 > URL: https://issues.apache.org/jira/browse/OAK-4908 > Project: Jackrabbit Oak > Issue Type: Technical task > Components: core, jcr >Affects Versions: 1.5.11 >Reporter: Stefan Egli >Assignee: Stefan Egli > Labels: review > Fix For: 1.5.13 > > Attachments: OAK-4908.patch, OAK-4908.v2.patch, OAK-4908.v3.patch, > OAK-4908.v4.patch, OAK-4908.v5.patch > > > This is a subtask as a result of > [discussions|https://issues.apache.org/jira/browse/OAK-4796?focusedCommentId=15550962=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15550962] > around design in OAK-4796: > Based on the ChangeSet provided with OAK-4907 in the CommitContext, the > ChangeProcessor should do a best-effort prefiltering of commits before they > get added to the (BackgroundObserver's) queue. > This consists of the following parts: > * -the support for optionally excluding commits from being added to the queue > in the BackgroundObserver- EDIT: factored that out into OAK-4916 > * -the BackgroundObserver signaling downstream Observers that a change should > be excluded via a {{NOOP_CHANGE}} CommitInfo- EDIT: factored that out into > OAK-4916 > * the ChangeProcessor using OAK-4907's ChangeSet of the CommitContext for > best-effort prefiltering - and handling the {{NOOP_CHANGED}} CommitInfo > introduced in OAK-4916 -- This message was sent by Atlassian JIRA (v6.3.4#6332)