[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251362#comment-16251362 ] Stefan Egli commented on OAK-4581: -- moved this from 1.8 to 1.10 as I don't see as fitting into the 1.8 timeframe - pls adjust if you think otherwise > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.10 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948858#comment-15948858 ] Stefan Egli commented on OAK-4581: -- Speaking for documentMk's journal: this one (also) batches commits/revisions into one entry per backgroundWrite. That means, we no longer have the boundaries of each commit. We could however introducing an additional property on the journal entry, which is the list of all revisions contained in that batch. That way we could (if we wanted, ie just for the EventJournal) dig down into the resolution of single commits without having to greatly modify the journal mechanism. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.8 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948705#comment-15948705 ] Stefan Egli commented on OAK-4581: -- We could make the origin instance part of the EventJournal and support local changes through that. The documentMk's journal actually contains the instance id of which instance produced the revision/commit. So we could use that information to still support local/external filtering. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.8 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948700#comment-15948700 ] Chetan Mehrotra commented on OAK-4581: -- One missing part would be identifying local changes. A pure diff based approach cannot determine if the change is due to local change or external change and purpose of having such big queue was to ensure that local event do not get lost (collapsed and then become external change) > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.8 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948690#comment-15948690 ] Stefan Egli commented on OAK-4581: -- This might have been discussed before, but I'd like to bring it up again: alternatively to designs suggested in the history of this ticket we could also turn this thing around entirely and rather support _persisting_ via implementing [JCR 12.6.'s journaled observation/EventJjournal|https://docs.adobe.com/content/docs/en/spec/jcr/2.0/12_Observation.html]: Having underlying journaled observation would allow to: * switch between a 'live listener' to a 'journaled listener' once the queue is full (ie we could support a hard queue limit, no compaction of events) * this switch could be implemented in a way that guarantees no loss of events nor any duplicate events (via careful use of the {{skipTo}} method - perhaps with the help of a non-standard _revision_ based {{skipTo}} variant) * once the listener reaches 'the end of the (journaled) events' we switch it back to a live listener The advantages are that we don't have to worry about storing anything additionally - don't have to define any file format for these persisted events and maintain them. This is all delegated to the EventJournal. Note that the EventJournal itself can be implemented based on existing data too, such as the existing journal of the DocumentNodeStore. (For SegmentNodeStore it might be different since its journal is less precise, only done every couple seconds). The disadvantages are that the EventJournal is (probably, tbd) not based on _events_ but based on raw revision (commit) information. Thus events would have to yet be generated for the listeners when needed. That means it would have to do the diff again. That means it could be slower as the diff might no longer be in the cache. Also, the EventJournal would have to at least go as much back in time as the slowest listener. So overall perhaps a summary is that using EventJournal for slow listeners could be easier to implement, thus easier in terms of code maintenance, but also easier in terms of operational maintenance with the downside of potentially being less performant. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.8 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3]
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601699#comment-15601699 ] Marcel Reutegger commented on OAK-4581: --- Stefan and I had an offline discussion about different approaches. We came to the conclusion that it may be favourable to implement 'head-of-queue-persisting' first because it is simple and able to deal with one cause of a growing queue: a slow EventListener.onEvent() implementation. We could still implement tail-of-queue-persisting in a next step and compare it with other strategies for the case when the queue fills up. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588993#comment-15588993 ] Stefan Egli commented on OAK-4581: -- h4. Prototype based on a SwitchingObserver [pushed a prototype|https://github.com/stefan-egli/jackrabbit-oak/commit/a521599e89b62ed8d40af5ae0a987cb4a9b546b7] to my github fork which tries to scetch an approach that would use a SwitchingObserer in front of either a BackgroundObserver/ChangeProcessor pair or a PersistingBackgroundObserver/PersistingEventQueue pair. The disadvantage of this approach is that there are quite a few new classes in play. The advantage is that it clearly distinguishes between normal (live) mode - by using the exiting, unchanged BackgroundObserver/ChangeProcessor pair - and a new persistent mode. Switching between these modes is delicate and is thus explicitly handled with extra switching modes. This conceptually works fine - added a test that illustrates the switch. h4. Prototype based on persistence embedded in the ChangeProcessor An alternative to the above would be to embedd the persistence directly in the ChangeProcessor. The contentChanged method there would inspect the queue size and if too large, not deliver events to the listener anymore, but instead go via a persistent queue. The advantage is that it's more enclosed in the ChangeProcessor. The disadvantage is that if a listener would block it could not prevent the queue from growing (but maybe supporting an indefinitely blocking listener is not required). I'll look into how such an approach could be implemented next. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/m
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585048#comment-15585048 ] Stefan Egli commented on OAK-4581: -- An update on filtering and the persistence design: The current filter logic (EventFilter/EventQueue) takes NodeStates as input and generates Events that can be pulled (via an iterator) at discretion. The fact that we want to _persist events_ means that the filter is applied before persisting. Later, at read/delivery time this same filter would not be applicable anymore, as it can't filter Events (it can only filter based on NodeStates). Now when we look at how to support persisting events for a set of listeners then one question arises: does each listener have its own private queue, or is there something like a shared queue. A shared queue would sound advantageous as it would reduce disk space. It could for example be solved by figuring out which listener is interested in a particular event, and enrich an event in the storage format with a set of listener ids. And at delivery time you could then just read those events that contain your listener id. Here's a description of possible private-vs-shared queue scenarios: h5. shared event queue Events would only be stored once, they would contain a set of listener ids indicating which listener is interested in it. Storage space would be kept small. To support an "infinitely large" commit, the events would have to be persisted while traversing through the diff. Therefore the "deduplication" of events must happen during this traversal too. Thus it seems reasonable to solve this with an "Umbrella EventFilter" that has all the actual filters as sub-filter and which marks each events with the set of ids that include the change. While this sounds complicated, it might be doable. However, there's one additional obstacle here: the NamePathMapper: in theory each listener can have a different NamePathMapper - besides also having different base paths (which have their own Generator, thus could have a different order). In short, I believe this is not easily solvable in the current design. h5. semi-shared event queue Events would still be _shareable_, however it would be based on best-effort and no longer, unlike above, be precise. So, storage space would be bigger in some cases. The way this could be done is to have the same underlying persistence logic: each event can have a set of listener ids for which it applies. The difference is in how the filtering would happen: for each listener an EventFilter/EventQueue would be created and Events would be generated individually. After a certain number of Events are generated for each listener, the resulting set of Events is deduplicated. So the deduplication doesn't happen in an Umbrella-EventFilter but after the Event generation. The reason this is best-effort is that the events are worked off in batches: to avoid OOME and still support infinitely large commits that's probably the way to go. And it's possible that an Event would show up in different batches for different listeners. So the deduplication is only happening in one batch, not in multiple - and that's where the best-effort lies in. An additional disadvantage of this approach is more read I/O when deduplication isn't perfect. h5. individual event queues This model is the simplest: each listener has its own, individual event queue. The persistence doesn't have to keep track of listener ids. Of course this means more storage is required, but the code would be simpler. h4. Conclusion While the shared queue approach seems favorable in terms of storage space, it is more complex code and perhaps more prone to bugs. Also, I'm not sure how much disk space would be gained as listeners should be very precise thus ideally somewhat disjunct. So perhaps we should go for the simpler, individual queue model as a first iteration. [~chetanm], [~tmueller], [~mreutegg], et al, wdyt? > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to han
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550982#comment-15550982 ] Chetan Mehrotra commented on OAK-4581: -- Well I am still bit sceptical around serializing the whole event stuff but lets try. I think feature like this might need multiple implementations as poc, done in branch. And then we run certain scenarios or benchmark test to see the effectiveness of the approach. So its fine to start with one approach in a branch and measure the impact > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549104#comment-15549104 ] Stefan Egli commented on OAK-4581: -- Summarizing those numerous, recent threaded comments I believe we have consensus that the goal is to get rid of external use of BackgroundObserver (not scope of this ticket, but a consequence) and that we do the *persistence on the oak-jcr/ChangeProcessor level*. This makes the persistence independent from cache and GC problems, works fine with filter changes, but has the downside that it means larger temporary storage required (as events are 'exploded'). The actual file format of the stored events is somewhat of an orthogonal/detail question, but can be something like a flat file. Unless I hear objections I'm looking at following up on these assumption in the next days. /cc [~chetanm], [~mduerig], [~mreutegg], [~reschke], [~catholicon] > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532190#comment-15532190 ] Stefan Egli commented on OAK-4581: -- [~cziegeler], so IIUC then support for these changed/added/removed properties is the reason why these maps have to first be filled in the JcrResourceListener, and this is causing memory issue if we're talking about a huge change. If we no longer propagate property changes in the JRL, then we could avoid these maps. Do I understand you correct that you're suggesting to remove this support then? This would allow us indeed to get rid of the OakResourceListener. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528553#comment-15528553 ] Carsten Ziegeler commented on OAK-4581: --- I didn't follow the whole discussion here (way too long), so I just responded to the question why we moved to an Oak observer. As said above, having the resource type in the event send by Sling is no longer an issue as we don't support that anymore. Personally I think that we should never had support for the names of the changed/added/removed properties in the events. Use cases for this are very rare and the new RCL in Sling explicitely marks these things as optional, so listeners can't rely on this fact. So from a Sling pov I think we don't need them. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528524#comment-15528524 ] Carsten Ziegeler commented on OAK-4581: --- We don't require the functionality in Sling anymore. In general it is too resource consuming (we have to think about all possible resource provider implementations not just Oak) and there are only very few listeners using this > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526714#comment-15526714 ] Stefan Egli commented on OAK-4581: -- [~cziegeler], re bq. Now, an additional reason at least for Sling was that the event bridge we have was reading all added/changed nodes to get the resource type property. I can't find this dependency in the code, do you know where that's coded in JcrResourceListener? IIUC then the reason for having these addedEvents/changedEvents/removedEvents maps in onEvent is to be able to build a JcrResourceChanged obj that contains all changed properties (unrelated to resource type - but perhaps that's hidden somewhere down deep..) > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526047#comment-15526047 ] Michael Dürig commented on OAK-4581: bq. knowing that the events are coming in in a breadth first manner I wouldn't rely on such implementation details but rather expose collecting information about child items through a proper API as proposed with OAK-4853. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525978#comment-15525978 ] Stefan Egli commented on OAK-4581: -- Created SLING-6070 for invoking reportChanges after child traversal finished > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525942#comment-15525942 ] Stefan Egli commented on OAK-4581: -- * re 'maps of added/removed/changed items' : I think we might be able to remove these, or lets say drastically reduce its size: by knowing that the events are coming in in a breadth first manner, the JcrResourceListener could build up and report changes whenever a child traversal finished - no need to wait with calling reportChanges until the very end (as the OakResourceListener sends out events also after child traversal finished). * re 'osgiEventQueue' : that has gone with SLING-5163 / [here|https://github.com/apache/sling/commit/9c424dfca6a802a6b66b4b7981a313c5856a0e1f] * Re 'Access control' : that's probably still an open issue indeed, however it is the case for both OakResourceListener *and* JcrResourceListener, so that's not an argument against JcrResourceListener > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525601#comment-15525601 ] Michael Dürig commented on OAK-4581: This is one aspect yes. The other is the maps of added, removed and changed items kept at {{JcrResourceListener#onEvent}}. Those are used to keep all items affected by a single event. For big change sets this pushed us OOM. Further {{JcrResourceListener#osgiEventQueue}} is yet another expensive queue of maps, which also was causing a lot of pressure on the heap. Access control was never taken into account. This was always being evaluated in the context of an admin session. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525549#comment-15525549 ] Marcel Reutegger commented on OAK-4581: --- So, IIUC that means the overhead of reading those properties over the JCR API was too high? There are two reasons I can think of why this would be the case. 1) an event listener first needs to read the path from the event and then get the item, which means internally the path must be resolved resulting in reads of multiple items, depending on the length of the path. 2) access control must be evaluated, whereas an Observer gives you access to everything without access control checks. Is this a reasonable assessment? Created OAK-4853 for the 'proper' API proposal. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525357#comment-15525357 ] Michael Dürig commented on OAK-4581: bq. reading all added/changed nodes This was the main driver for this change. Sling's previous implementation did not scale properly. +1 though to provide the required functionality though a "proper" API. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525207#comment-15525207 ] Carsten Ziegeler commented on OAK-4581: --- I don't remember the details, but I think this was a recommendation from the Oak team back then Now, an additional reason at least for Sling was that the event bridge we have was reading all added/changed nodes to get the resource type property. This is less expensive as the observer approach can already provide that value > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523201#comment-15523201 ] Vikas Saurabh commented on OAK-4581: Hmm... I probably put too much weight into C2 in description- bq. The journal should survive restarts > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523087#comment-15523087 ] Stefan Egli commented on OAK-4581: -- bq. 3. those observers get events post reboot Not sure that's really the case. I would argue that after a restart these persisted events get deleted. IMO what 'persist' in the context of this ticket emphasizes is only additional memory at runtime, not that it survives restarts. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523022#comment-15523022 ] Vikas Saurabh commented on OAK-4581: What I meant was that this issue is saying that for JCR observation, we'd be more resilient for: # event generated for node1 # node1 rebounces before event is dispatched to all observers # those observers get events post reboot That opens up the case for at-least-once-delivery for events that got generated for a given cluster node. My argument being that if we say that we just handle a sub-set of cases, then observation client still has to be resilient against loss of events - and if that observation client code is resilient what's the point of us trying to be more resilient. Afaict, the issue got created because it's tricky/hard(/impossible??) for observation client to be resilient to loss of observation events (barring manual intervention). But, yes if we do this (just do simple stuff and not to give pedantic guarantee of delivery), then we can have tooling to help with that manual intervention. Again, I'm not saying we need to do this(pedantic delivery) - just something that we should keep in mind about what the observation client might expect post this implementation. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522996#comment-15522996 ] Stefan Egli commented on OAK-4581: -- that sounds like a new class of listener that would want 'at-least-once' delivery in a cluster. Something probably useful, but I'm not sure if that fits into the observation umbrella of the JCR API. I think that's orthogonal to this ticket (somewhat) and could probably be handled separately? > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522974#comment-15522974 ] Stefan Egli commented on OAK-4581: -- The latter (GC-decoupling). We could indeed still have a limit to protect from disk overflow, but yes, it would be independent. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522963#comment-15522963 ] Vikas Saurabh commented on OAK-4581: Btw, just to note (I'm unable to concretely argue if we really want to do this or not) - no matter how we proceed, the storage would be cluster-id specific. So, do we want to cater the case where some cluster node gets obliterated (with pending stored events) - would/should some other node pick that up? Adding the disclaimer again that it's a case that would hit us once we start to advertise "stronger guarantee delivery of observation events" - not that we must fix it (we can as document it that the guarantee is a best-effort thing) > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522946#comment-15522946 ] Vikas Saurabh commented on OAK-4581: yeah, if we get around to single way of observation - then, I'd also like 1-B-1 + 3-C (same as what Marcel preferred). > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522942#comment-15522942 ] Vikas Saurabh commented on OAK-4581: bq. The advantage of persisting events would have been that it wouldn't have needed such a cut-off.. Umm... do you mean that we'd be ok with infinite storage with persisted events? Or, that it de-couples storage concern with GC and hence the cut-off metric could be storage size (as against must-have time boundaries if we remain coupled) > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522741#comment-15522741 ] Marcel Reutegger commented on OAK-4581: --- In my view it would be best to review the usage of Observers outside of Oak again and use JCR/Jackrabbit EventListeners whenever possible. The Oak observation package is not a public API and the export version is set to zero. I think that was a good decision, because we repeatedly run into situations where Oak observer (or BackgroundObserver) usage outside of Oak lags behind current development. Examples include missing MBean support, default values for queue length or new callbacks on BackgroundObserver. IMO we should stop promoting Oak Observer when we actually consider it internal. SLING-3279 was created to leverage filters not available in JCR. However, the most useful filter with multiple paths is already available since Jackrabbit 2.7.5 (JCR-3745) and gathering names of added/removed/changed properties is also possible with plain JCR Events. [~mduerig] & [~cziegeler], do you remember the main reasons why Oak Observers were considered superior over JCR EventListeners? If there are features missing, I would rather add them to the Jackrabbit API and make it available to other users as well and change the JCR Resource implementation to only rely on the Jackrabbit API. Once we have a clear separation of Oak internal Observer usage and client code using JCR EventListeners, we can more easily optimize those two parts individually. The former is the clear responsibility of the repository and would produce events as efficiently and fast as possible, while the latter gets those events at its own pace. The current situation is IMO problematic because the two aspects are mixed. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2]
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522461#comment-15522461 ] Stefan Egli commented on OAK-4581: -- [~catholicon], thx for your feedback! Re bq. I thought this issue was about persisting change pointers Agreed. And when looking at persisting NodeState(+CommitInfo) then the revGC issue comes up (you must later be able to do the diff, and revGC must not clean up things before the persited observation queues haven't been processed). And from this resulted the idea to not persist NodeState but the actual, calculated Event (even though that would bloat the storage, as it would become much simpler). However, this now again conflicts with support for any type of BackgroundObserver, not only ChangeProcessor. So I think the latter question becomes central now, and if we want to support any BackgroundObserver we need to persist NodeState and prevent revGC from cleaning up too early. bq. Afaics, we still want remain wary of infinite storage of pointers bq. Sure if we're saying that revGC deleted node states Exactly. There's a dilemma: we want to prevent revGC to being 'paused' for too long just because of observation - but if it isn't paused then such a slow or overwhelmed listener would loose events. We have to make a choice, it's a binary thing. Perhaps we have to cut off events after a certain time (eg after exactly 24hours to fit into the segment-tar's default generation cycle)? (The advantage of persisting events would have been that it wouldn't have needed such a cut-off..) > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elas
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513400#comment-15513400 ] Vikas Saurabh commented on OAK-4581: I got slightly confused about what are we targeting here (sorry, haven't followed all the comments - just the description and the last explanation). My initial thought about the issue was that we're solving quicker diff elsewhere (segment is fast anyway. document now relies on journal, afaik can recover from full queue too, and can work on ranges). Given those improvements, I thought this issue was about persisting change pointers (rev or root-node-id as the need be and commit info for local changes) so that listeners work fine post rebounce too. Afaics, we still want remain wary of infinite storage of pointers (whatever be the case - slow listeners or too many changes.. in effect rate_listener < rate_incoming). Assuming (accurate??) diff processing is a different concern, I'm not sure why GC (assuming revGC) need to care about node states (thinking about 1A side of things) - I don't know about segment... but for document we can still get root node state. Sure if we're saying that revGC deleted node states (ie the even hadn't been consume in 24 hours) then is that really a case worth solving by retaining node states?? Anyway, assuming my original interpretation of the issue, I like 1-A-1 + 3-C... as you pointed out that we need to take care of OakResourceListener. I think we've bore a lot of pain with us improving at least diagnosing ChangeProcessor (jmx, configurable queue) while sling bridge kept reeling under pain with the default of 1k. I'd rather have improvements that work uniformly. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509914#comment-15509914 ] Stefan Egli commented on OAK-4581: -- One additional note: should we choose to go the I-B route (queue in ChangeProcessor), then this improvement will not become usable by Sling's ResourceChangeListener - as that has switched to using an OakResourceListener (based on Oak's recommendation to do so, see SLING-3279) and directly bases on BackgroundObserver... > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509752#comment-15509752 ] Marcel Reutegger commented on OAK-4581: --- My preferred approach is I-B-1 with something similar to III-C, structured like a log (Kafka is probably overkill). In other issues we so far looked at how to improve the producer side of the observation system, but even if the producer side is perfect, the queue could still fill up when the listener is too slow. And once the listener is too slow, it may have a ripple effect on the producer side. The node state comparisons may not be served from cache anymore and require I/O interfering/competing with other read operations in the repository. I think it's best to create the JCR events as soon as possible and then push them into a queue that is consumed by the listener. The contents of this queue will obviously have bigger space requirements than the current queue with just the root states, but it is simpler to handle. The entries are independent of the repository (unrelated to GC) and read and write patterns are simple linear operations. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509287#comment-15509287 ] Stefan Egli commented on OAK-4581: -- /cc [~mreutegg] > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15506514#comment-15506514 ] Stefan Egli commented on OAK-4581: -- I'd like to move this ticket forward and believe we need a few decisions on the approach: h4. I - Who to persist for There are different possibilities as to where the persisted queue should sit: h5. A - BackgroundObserver In this class the BackgroundObserver's queue is persisted - and that can logically only be based on {{NodeState}}. This will thus support any type of downstream Observer including NodeObserver etc. Being based on Observer it requires GC-prevention. Here's a list of concrete subvariants: h6. 1 - store serialized root state This seemlessly serializes and stores {{NodeState}} objects. Later on they are read and used for diffing. Which means the data must still be available to do the actual diff. This can be achieved by increasing the GC retention period one way or another. What's also important here is that the caches aren't poluted with these late diffs - ie they should probably not be stored in the cache in this late-delivery case. h6. 2 - store serialized diff (and root state) Besides serializing the {{NodeState}} this variant (also) stores the diff. This speeds up later diffing as the diff is then already there (it probably must be stored in 'a cache' temporarily, but only temporarily as it will only be used for one event, likely). This variant is still dependent on preventing GC though, as we're still on the Observer level, which works on {{NodeState}}. h6. 3 - base it on the journal Alternatively the journal is equipped with more diff-like information (perhaps with the full, but perhaps only partially), also see OAK-4586. Otherwise this has same characteristics as I-A-1 and I-A-2: GC must still be prevented, we're still on the Observation/NodeState level. It will be implementation dependent, as Segment doesn't have the same type of journal as Document has. h5. B - ChangeProcessor In this class the queue is handled on the ChangeProcessor level (not in BackgroundObserver), thus no longer based on NodeState, but now independent, the format just must be suitable for calculation and later delivery via onEvent. Being independent of NodeState allows to become independent of GC and cache-hotness issues. However, it's important to note that this class of solutions targets concrete EventListeners, not Observers in general! h6. 1 - store serialized events The ChangeProcessor calculates events as if for delivery to onEvent, but just persists the events as is. This will bloat the amount of data stored and increase I/O. However, later delivery is trivial as all the events are already there, they just have to be read and onEvent called. h6. 2 - store serialized diff The ChangeProcessor stores the serialized diff in a form that it can later be processed by the EventFilter and result in events for delivery to EventListener.onEvent. (This would then be independent from the NodeState) h6. 3 - base it on the journal If the journal contains the complete diff such that ChangeProcessor can evaluate the filters and deliver, then the journal could be enough (however that might be tricky to achieve). Also, this will be implementation dependent, as Segment doesn't have the same type of journal as Document has. In any case, additionally the CommitInfo must be stored somewhere, either also in the Journal or per ChangeProcessor. h4. II Serializing CommitInfo Not sure if we have many options here, I think it's just something we have to do. And if Oak code prevents serialization, then we can fix it. If it's upper/application-layer code that causes problems, we can't do much other than issue a warn. h4. III - Storage Layer This depends a bit on the actual solution chosen. If we base it eg on journal, then a lot comes from there already. If we persist flat events, then surely an extra storage is needed. h5. A - Use a SegmentNodeStore * would be straight forward but has issues as mentioned by Michael. h5. B - Use internas of SegmentNodeStore, eg SegmentWriter * might be much more optimal, but adds dependencies on internas of tarmk. h5. C - store as JSON in a flat file [~mduerig], [~chetanm], [~catholicon], [~tmueller], which variant should we implement? > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > draw
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15464465#comment-15464465 ] Stefan Egli commented on OAK-4581: -- Waiting with further implementations until we agree on how to proceed > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15464458#comment-15464458 ] Stefan Egli commented on OAK-4581: -- I wouldn't rule out that there are better fits, for sure. The advantage of using tarMk is that we'd use our own stuff for another use case too. * _there will be a performance impact as the lists grows_ : we can store in sub-folders named with a time pattern, similar as sling jobs does * _garbage collections is difficult and expensive_ : gc would be done in a generational approach: rewrite the queue in a new tarMk once certain thresholds are reached (ideally this could be done when the queue is empty, so no rewrite would be necessary). But gc on that tarMk incarnation would be turned off completely. * _adding and removing elements from the list will cause rewrites of buckets instead of just a single append, remove_ : the idea is to store in batches, so I was assuming this shouldn't be that bad Overall I'd find it simpler, but if ppl agree that we shouldn't use tarMk for this, then sure, let's switch. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1546#comment-1546 ] Stefan Egli commented on OAK-4581: -- So could we adjust the retention time dynamically (is there an API) when the queue grows? > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15464441#comment-15464441 ] Stefan Egli commented on OAK-4581: -- yes, this has been seen esp on documentMk with earlier oak versions (although it might also happen on newer oak versions, there's nothing preventing it from happening, but newer oak versions have several improvements in this area) > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459629#comment-15459629 ] Michael Dürig commented on OAK-4581: bq. I'm probably not getting the entirety of this point. My point was the the TarMK is not a particular good storage solution for linear lists. What we *could* do though is reuse some of its internals to implement a store that is. We would be reinventing Kafka on segment though... > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459618#comment-15459618 ] Michael Dürig commented on OAK-4581: bq. One approach that we currently target is to checkpoint the oldest entry, such that we prevent gc from removing it (assuming checkpoints are respected). No it doesn't on {{oak-segment-tar}. We moved from reachability to retention time for gc as the former didn't work. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459608#comment-15459608 ] Michael Dürig commented on OAK-4581: bq. is for such rare special cases only. That is what I was aiming at: do we need to cover rare special cases? Did we actually observe them and was it really an observation queue pushing the system OOM? Do rare spacial cases really warrant this amount of extra complexity? > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459586#comment-15459586 ] Michael Dürig commented on OAK-4581: But storing the events themselves will tremendously increase the sizes of the queues. Each commit will occupy roughly as many slots in the queue as there where changes in the commit. Each event would need to store a path, a userId, an info map etc. Not exactly cheap. And on top of that there is a queue per listener. In effect this would multiply the requirements for IO bandwidth by the number of observation listeners (each commit is first stored to disk and then distributed to N listener queues and stored again in the form of events). Agreed, early filtering will reduce the number of events somewhat and above picture is just an upper bound. But storing the commit (handle) will occupy just a single slot in the queue while storing events will occupy at least as many slots unless a commit does not generate a single event. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458826#comment-15458826 ] Stefan Egli commented on OAK-4581: -- One more comment re bq. I expect unbounded queues to have adverse effects on the hotness of the various caches. As pointed out, I think we should do pre-filtering. But we should probably also do pre-calculation of the actual events we want to deliver. Assuming that the listener's filters are "good" it would be efficient to go through the whole filter and basically store pre-manufactured events as a result only. This would mean that once an event is persisted it is no longer dependent on _any_ cache. But it also means that what will be persisted will _actually_ be consumed later on - as the raw and final event itself would basically be stored, no later filtering whatsoever. Doing the filtering as early as possible would also add load to the system at commit time, thus result in a natural throttling. As basically the computation of all the events as they will be delivered to the listeners then becomes part of the commit. (_two birds with one stone_) > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15457991#comment-15457991 ] Stefan Egli commented on OAK-4581: -- [~mduerig], thanks for the comments! bq. Do we know that we need to go off-heap with that queue? Agreed, the entries are normally cheap, but generally speaking it's the open map mechanism that can make them unbounded, thus larger. But even assuming they are cheap on average, you can have a situation where you have such a high traffic burst that you're overwhelming even a highly optimized listener logic. In which case the queues grow and you'll get an OutOfMemoryError. The benefit of persisting the queues (when they become big that is) is for such rare special cases only. And you'd have to construct such a rare case where basically you'd force an OutOfMemoryError versus with this patch not. bq. I expect unbounded queues to have adverse effects on the hotness of the various caches. Right. There's some ideas left that aren't much mentioned or fleshed out yet: on the one hand we should do _pre-filtering_ of events, such that only events end up on queues that are indeed meant to go to a listener. The listener shouldn't have to filter afterwards anymore. Currently we're putting events on each listener's queues and only filter after hte fact. If queues become large, then this very fact becomes an issue exactly due to cache inefficiencies in this case. Ie a lot of computation is then lost purely to figure out if a listener needs an entry or not (as it can't find it in the cache anymore). So with prefiltering this would not be an issue anymore. What would be left though is the cache-inefficiency for actual events that listeners _want_. There we might optimize by including a bit more info into what we persist, perhaps the actual diff if it's not too big etc etc. bq. Any thoughts on how unbounded queues should interact with gc? One approach that we currently target is to checkpoint the oldest entry, such that we prevent gc from removing it (assuming checkpoints are respected). bq. However I dislike having to cope with serialising the open CommitInfo class. At least we should rely on a general purpose library here. Open for alternatives for sure! I was assuming that we need to store the CommitInfo obj, as that's what persisting is mostly about. And if something in there is not serializable, then we're lost and have to skip it (we can warn loudly though). What exactly were you thinking of as alternatives? bq. I don't think PersistedBlockingQueue should use a node store as its back-end. I'm probably not getting the entirety of this point. I guess one argument to reuse the tarMk is that it's something we have and know we can use it - we can surely use something else, for sure. Regarding GC the idea was to _not_ rely on GCing that observation-tarMk but to use generations of tarMk similar to how that's done in persistent cache: so we'd throw away a whole tarMk set once we switched to a new one. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The d
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453237#comment-15453237 ] Michael Dürig commented on OAK-4581: A couple of comments/thoughts without diving too deep into the implementation: * Do we know that we need to go off-heap with that queue? After all the entries are relatively cheap. At least we should run a comparison regarding heap utilisation with unbounded queues in memory and off-heap (now that we have that possibility). * I expect unbounded queues to have adverse effects on the hotness of the various caches. In the worst case leading to some sort of thrashing when old observation handlers need to load old data replacing new entries required to server requests. * Any thoughts on how unbounded queues should interact with gc? On {{oak-segment-tar}} revisions get cleared after 2 gc cycles (48h). * On the implementation side I like abstracting over the queue implementation. However I dislike having to cope with serialising the open {{CommitInfo}} class. At least we should rely on a general purpose library here. Better even we find a way so we don't need to do it. * I don't think {{PersistedBlockingQueue}} should use a node store as its back-end. A hierarchical store is not a good fit for storing linear lists: there will be a performance impact as the lists grows, garbage collections is difficult and expensive, adding and removing elements from the list will cause rewrites of buckets instead of just a single append, remove. If we want to reuse the TarMK here, it might be better to implement a linked list on top of a segment writer. There are mechanisms which would allow you to partition along individual observation handlers. Adding and removing elements and gc would be quite simple. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4581.v0.patch > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.h
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15448502#comment-15448502 ] Chetan Mehrotra commented on OAK-4581: -- To start with we can go for #4. Just have a new interface for that which is implemented by NodeStore. This would be similar to Observable which is implemented by both NodeStore impl currently. Later we need to ensure via checkpoint that oldest entry in queue is backed by checkpoint lifetime. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15445633#comment-15445633 ] Stefan Egli commented on OAK-4581: -- Re {{A.1 - Serialized Root States}} I see the following options: # Store entire diff in a json format # Assume that incoming NodeState in {{contentChanged(NodeState,CommitInfo)}} is still the current root, then create a checkpoint and store the checkpoint Id. # a combination of 1. and 2. above: for small diffs store the diff, for large diffs, create a checkpoint # or introduce a more lightweight checkpoint mechanism which allows to _serialize NodeState_ (compared to {{checkpoint}} which is more heavy weight as it stores an id->nodeState mapping including properties). [~chetanm], wdyt, I think having the option to do 4. would be good, but afaics would require a change in either the {{NodeState}} or the {{NodeStore}} APIs. I fear that the assumption done in 2. would perhaps not always hold and is a bit a weak link - which would mean that the only other option would be 1. which is expensive. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra >Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430918#comment-15430918 ] Stefan Egli commented on OAK-4581: -- bq. 2. The journal should survive restarts For pure _infinite observation queue_ support this wouldn't really be necessary I guess? But having this could probably be used also to support JCR's _Journaled Observation_? > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra > Labels: observation > Fix For: 1.6 > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387518#comment-15387518 ] Chetan Mehrotra commented on OAK-4581: -- Ack there would still be the issue of "events lagging to far behind". So either we prevent RevisionGC for all that duration. Or persist the events to be generated by just running the event iterator producer logic for such Listeners i.e. where what event gets generated can be determined from filter. But this might be too costly also! > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra > Fix For: 1.6 > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387397#comment-15387397 ] Michael Dürig commented on OAK-4581: bq. With persistent journal observers which are fast would be able to get prompt delivery as they would be pulling directly and not impacted by other slow listeners. But this is the case already: the listeners operate off independent threads. The only dependency comes in from thread pool size, JVM scheduling and IO bandwidth. The same issues that you have to deal with in your proposal. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra > Fix For: 1.6 > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387264#comment-15387264 ] Ian Boston commented on OAK-4581: - Only consider my suggestions for a WAL if the existing Journal implementation is unsuitable. It seems the existing journal is suitable, so ignore WAL references. --- Assuming the persisted journal capacity is only bounded by available disk space, the need to reference a remote journal to conserve queue capacity in exchange for additional computation or network is probably unnecessary. I assume "remote" means remote to the VM where the JVM is running and not some other interpretation of remote. --- Events lagging too far behind (data latency) is: 1. something that will (I assume) be caused by the commit traffic an application is subjecting Oak to, rather than caused by Oak internal commits. 2. something an application will care about. 3. given sufficient and appropriate feedback (queue latency metric) something an application developer will be keen to act on during dev and QE before production. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra > Fix For: 1.6 > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387105#comment-15387105 ] Chetan Mehrotra commented on OAK-4581: -- bq. However, I doubt this will solve OAK-2683. Instead it will most likely just shift the problem to "events lagging to far behind". Creating all sorts of problems in applications that rely on somewhat prompt event delivery. With persistent journal observers which are fast would be able to get prompt delivery as they would be pulling directly and not impacted by other slow listeners. bq. Also interaction with GC will be tricky as here the assumption is contrariwise and revisions are purged after a certain time. Yes that would need to be accounted for in implementation say via checkpoint. In DocumentNodeStore as part of OAK-4528 its ensured that we have a valid checkpoint based on what diff need to be processed ([~mreutegg] can confirm). Further we can then checkout repository state at any intermediate time interval. However for SegmentNodeStore I am not sure of the impact of compaction on such intermediate root. For e.g. its ensured that one can fetch the repository state at given valid checkpoint reference but its not clear if same holds true for any intermediate "root" > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra > Fix For: 1.6 > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation
[ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386566#comment-15386566 ] Michael Dürig commented on OAK-4581: In the {{SegmentNodeStore}} case you only need to persist the record ids of the root node states. I.e. conceptionally all you need is a persisted queue of {{BackgroundObserver.ContentChange}} instances. In fact I think it should be the same way for the {{DocumentNodeStore}}, maybe with a local cache to speed things up. However, I doubt this will solve OAK-2683. Instead it will most likely just shift the problem to "events lagging to far behind". Creating all sorts of problems in applications that rely on somewhat prompt event delivery. Also interaction with GC will be tricky as here the assumption is contrariwise and revisions are purged after a certain time. > Persistent local journal for more reliable event generation > --- > > Key: OAK-4581 > URL: https://issues.apache.org/jira/browse/OAK-4581 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: core >Reporter: Chetan Mehrotra > Fix For: 1.6 > > > As discussed in OAK-2683 "hitting the observation queue limit" has multiple > drawbacks. Quite a bit of work is done to make diff generation faster. > However there are still chances of event queue getting filled up. > This issue is meant to implement a persistent event journal. Idea here being > # NodeStore would push the diff into a persistent store via a synchronous > observer > # Observors which are meant to handle such events in async way (by virtue of > being wrapped in BackgroundObserver) would instead pull the events from this > persisted journal > h3. A - What is persisted > h4. 1 - Serialized Root States and CommitInfo > In this approach we just persist the root states in serialized form. > * DocumentNodeStore - This means storing the root revision vector > * SegmentNodeStore - {color:red}Q1 - What does serialized form of > SegmentNodeStore root state looks like{color} - Possible the RecordId of > "root" state > Note that with OAK-4528 DocumentNodeStore can rely on persisted remote > journal to determine the affected paths. Which reduces the need for > persisting complete diff locally. > Event generation logic would then "deserialize" the persisted root states and > then generate the diff as currently done via NodeState comparison > h4. 2 - Serialized commit diff and CommitInfo > In this approach we can save the diff in JSOP form. The diff only contains > information about affected path. Similar to what is current being stored in > DocumentNodeStore journal > h4. CommitInfo > The commit info would also need to be serialized. So it needs to be ensure > whatever is stored there can be serialized or re calculated > h3. B - How it is persisted > h4. 1 - Use a secondary segment NodeStore > OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. > [~mreutegg] suggested that for persisted local journal we can also utilize a > SegmentNodeStore instance. Care needs to be taken for compaction. Either via > generation approach or relying on online compaction > h4. 2- Make use of write ahead log implementations > [~ianeboston] suggested that we can make use of some write ahead log > implementation like [1], [2] or [3] > h3. C - How changes get pulled > Some points to consider for event generation logic > # Would need a way to keep pointers to journal entry on per listener basis. > This would allow each Listener to "pull" content changes and generate diff as > per its speed and keeping in memory overhead low > # The journal should survive restarts > [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html > [2] > https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal > [3] > https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog -- This message was sent by Atlassian JIRA (v6.3.4#6332)