[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2017-11-14 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251362#comment-16251362
 ] 

Stefan Egli commented on OAK-4581:
--

moved this from 1.8 to 1.10 as I don't see as fitting into the 1.8 timeframe - 
pls adjust if you think otherwise

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.10
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2017-03-30 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948858#comment-15948858
 ] 

Stefan Egli commented on OAK-4581:
--

Speaking for documentMk's journal: this one (also) batches commits/revisions 
into one entry per backgroundWrite. That means, we no longer have the 
boundaries of each commit. We could however introducing an additional property 
on the journal entry, which is the list of all revisions contained in that 
batch. That way we could (if we wanted, ie just for the EventJournal) dig down 
into the resolution of single commits without having to greatly modify the 
journal mechanism.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.8
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2017-03-30 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948705#comment-15948705
 ] 

Stefan Egli commented on OAK-4581:
--

We could make the origin instance part of the EventJournal and support local 
changes through that. The documentMk's journal actually contains the instance 
id of which instance produced the revision/commit. So we could use that 
information to still support local/external filtering.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.8
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2017-03-30 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948700#comment-15948700
 ] 

Chetan Mehrotra commented on OAK-4581:
--

One missing part would be identifying local changes. A pure diff based approach 
cannot determine if the change is due to local change or external change and 
purpose of having such big queue was to ensure that local event do not get lost 
(collapsed and then become external change)

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.8
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2017-03-30 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948690#comment-15948690
 ] 

Stefan Egli commented on OAK-4581:
--

This might have been discussed before, but I'd like to bring it up again: 
alternatively to designs suggested in the history of this ticket we could also 
turn this thing around entirely and rather support _persisting_ via 
implementing [JCR 12.6.'s journaled 
observation/EventJjournal|https://docs.adobe.com/content/docs/en/spec/jcr/2.0/12_Observation.html]:
Having underlying journaled observation would allow to:
* switch between a 'live listener' to a 'journaled listener' once the queue is 
full (ie we could support a hard queue limit, no compaction of events)
* this switch could be implemented in a way that guarantees no loss of events 
nor any duplicate events (via careful use of the {{skipTo}} method - perhaps 
with the help of a non-standard _revision_ based {{skipTo}} variant)
* once the listener reaches 'the end of the (journaled) events' we switch it 
back to a live listener

The advantages are that we don't have to worry about storing anything 
additionally - don't have to define any file format for these persisted events 
and maintain them. This is all delegated to the EventJournal. 
Note that the EventJournal itself can be implemented based on existing data 
too, such as the existing journal of the DocumentNodeStore. (For 
SegmentNodeStore it might be different since its journal is less precise, only 
done every couple seconds).

The disadvantages are that the EventJournal is (probably, tbd) not based on 
_events_ but based on raw revision (commit) information. Thus events would have 
to yet be generated for the listeners when needed. That means it would have to 
do the diff again. That means it could be slower as the diff might no longer be 
in the cache.
Also, the EventJournal would have to at least go as much back in time as the 
slowest listener.

So overall perhaps a summary is that using EventJournal for slow listeners 
could be easier to implement, thus easier in terms of code maintenance, but 
also easier in terms of operational maintenance with the downside of 
potentially being less performant.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.8
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]

[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-10-24 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601699#comment-15601699
 ] 

Marcel Reutegger commented on OAK-4581:
---

Stefan and I had an offline discussion about different approaches. We came to 
the conclusion that it may be favourable to implement 
'head-of-queue-persisting' first because it is simple and able to deal with one 
cause of a growing queue: a slow EventListener.onEvent() implementation. We 
could still implement tail-of-queue-persisting in a next step and compare it 
with other strategies for the case when the queue fills up.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-10-19 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588993#comment-15588993
 ] 

Stefan Egli commented on OAK-4581:
--

h4. Prototype based on a SwitchingObserver
[pushed a 
prototype|https://github.com/stefan-egli/jackrabbit-oak/commit/a521599e89b62ed8d40af5ae0a987cb4a9b546b7]
 to my github fork which tries to scetch an approach that would use a 
SwitchingObserer in front of either a BackgroundObserver/ChangeProcessor pair 
or a PersistingBackgroundObserver/PersistingEventQueue pair. The disadvantage 
of this approach is that there are quite a few new classes in play. The 
advantage is that it clearly distinguishes between normal (live) mode - by 
using the exiting, unchanged BackgroundObserver/ChangeProcessor pair - and a 
new persistent mode. Switching between these modes is delicate and is thus 
explicitly handled with extra switching modes. This conceptually works fine - 
added a test that illustrates the switch.

h4. Prototype based on persistence embedded in the ChangeProcessor
An alternative to the above would be to embedd the persistence directly in the 
ChangeProcessor. The contentChanged method there would inspect the queue size 
and if too large, not deliver events to the listener anymore, but instead go 
via a persistent queue. The advantage is that it's more enclosed in the 
ChangeProcessor. The disadvantage is that if a listener would block it could 
not prevent the queue from growing (but maybe supporting an indefinitely 
blocking listener is not required). I'll look into how such an approach could 
be implemented next.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/m

[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-10-18 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585048#comment-15585048
 ] 

Stefan Egli commented on OAK-4581:
--

An update on filtering and the persistence design: The current filter logic 
(EventFilter/EventQueue) takes NodeStates as input and generates Events that 
can be pulled (via an iterator) at discretion. The fact that we want to 
_persist events_ means that the filter is applied before persisting. Later, at 
read/delivery time this same filter would not be applicable anymore, as it 
can't filter Events (it can only filter based on NodeStates).

Now when we look at how to support persisting events for a set of listeners 
then one question arises: does each listener have its own private queue, or is 
there something like a shared queue. A shared queue would sound advantageous as 
it would reduce disk space. It could for example be solved by figuring out 
which listener is interested in a particular event, and enrich an event in the 
storage format with a set of listener ids. And at delivery time you could then 
just read those events that contain your listener id.

Here's a description of possible private-vs-shared queue scenarios:
h5. shared event queue
Events would only be stored once, they would contain a set of listener ids 
indicating which listener is interested in it. Storage space would be kept 
small.
To support an "infinitely large" commit, the events would have to be persisted 
while traversing through the diff. Therefore the "deduplication" of events must 
happen during this traversal too. Thus it seems reasonable to solve this with 
an "Umbrella EventFilter" that has all the actual filters as sub-filter and 
which marks each events with the set of ids that include the change. While this 
sounds complicated, it might be doable.

However, there's one additional obstacle here: the NamePathMapper: in theory 
each listener can have a different NamePathMapper - besides also having 
different base paths (which have their own Generator, thus could have a 
different order).

In short, I believe this is not easily solvable in the current design.
h5. semi-shared event queue
Events would still be _shareable_, however it would be based on best-effort and 
no longer, unlike above, be precise. So, storage space would be bigger in some 
cases.
The way this could be done is to have the same underlying persistence logic: 
each event can have a set of listener ids for which it applies. The difference 
is in how the filtering would happen: for each listener an 
EventFilter/EventQueue would be created and Events would be generated 
individually. After a certain number of Events are generated for each listener, 
the resulting set of Events is deduplicated. So the deduplication doesn't 
happen in an Umbrella-EventFilter but after the Event generation.

The reason this is best-effort is that the events are worked off in batches: to 
avoid OOME and still support infinitely large commits that's probably the way 
to go. And it's possible that an Event would show up in different batches for 
different listeners. So the deduplication is only happening in one batch, not 
in multiple - and that's where the best-effort lies in.

An additional disadvantage of this approach is more read I/O when deduplication 
isn't perfect.
h5. individual event queues
This model is the simplest: each listener has its own, individual event queue. 
The persistence doesn't have to keep track of listener ids. Of course this 
means more storage is required, but the code would be simpler.

h4. Conclusion
While the shared queue approach seems favorable in terms of storage space, it 
is more complex code and perhaps more prone to bugs. Also, I'm not sure how 
much disk space would be gained as listeners should be very precise thus 
ideally somewhat disjunct. So perhaps we should go for the simpler, individual 
queue model as a first iteration. 
[~chetanm], [~tmueller], [~mreutegg], et al, wdyt?

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to han

[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-10-05 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550982#comment-15550982
 ] 

Chetan Mehrotra commented on OAK-4581:
--

Well I am still bit sceptical around serializing the whole event stuff but lets 
try. I think feature like this might need multiple implementations as poc, done 
in branch. And then we run certain scenarios or benchmark test to see the 
effectiveness of the approach. 

So its fine to start with one approach in a branch and measure the impact

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-10-05 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549104#comment-15549104
 ] 

Stefan Egli commented on OAK-4581:
--

Summarizing those numerous, recent threaded comments I believe we have 
consensus that the goal is to get rid of external use of BackgroundObserver 
(not scope of this ticket, but a consequence) and that we do the *persistence 
on the oak-jcr/ChangeProcessor level*. This makes the persistence independent 
from cache and GC problems, works fine with filter changes, but has the 
downside that it means larger temporary storage required (as events are 
'exploded'). The actual file format of the stored events is somewhat of an 
orthogonal/detail question, but can be something like a flat file.
Unless I hear objections I'm looking at following up on these assumption in the 
next days.
/cc [~chetanm], [~mduerig], [~mreutegg], [~reschke], [~catholicon]

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-29 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15532190#comment-15532190
 ] 

Stefan Egli commented on OAK-4581:
--

[~cziegeler], so IIUC then support for these changed/added/removed properties 
is the reason why these maps have to first be filled in the 
JcrResourceListener, and this is causing memory issue if we're talking about a 
huge change. If we no longer propagate property changes in the JRL, then we 
could avoid these maps. Do I understand you correct that you're suggesting to 
remove this support then? This would allow us indeed to get rid of the 
OakResourceListener.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-27 Thread Carsten Ziegeler (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528553#comment-15528553
 ] 

Carsten Ziegeler commented on OAK-4581:
---

I didn't follow the whole discussion here (way too long), so I just responded 
to the question why we moved to an Oak observer. As said above, having the 
resource type in the event send by Sling is no longer an issue as we don't 
support that anymore.
Personally I think that we should never had support for the names of the 
changed/added/removed properties in the events. Use cases for this are very 
rare and the new RCL in Sling explicitely marks these things as optional, so 
listeners can't rely on this fact. So from a Sling pov I think we don't need 
them.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-27 Thread Carsten Ziegeler (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528524#comment-15528524
 ] 

Carsten Ziegeler commented on OAK-4581:
---

We don't require the functionality in Sling anymore. In general it is too 
resource consuming (we have to think about all possible resource provider 
implementations not just Oak) and there are only very few listeners using this

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-27 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526714#comment-15526714
 ] 

Stefan Egli commented on OAK-4581:
--

[~cziegeler], re
bq. Now, an additional reason at least for Sling was that the event bridge we 
have was reading all added/changed nodes to get the resource type property.
I can't find this dependency in the code, do you know where that's coded in 
JcrResourceListener? IIUC then the reason for having these 
addedEvents/changedEvents/removedEvents maps in onEvent is to be able to build 
a JcrResourceChanged obj that contains all changed properties (unrelated to 
resource type - but perhaps that's hidden somewhere down deep..)

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15526047#comment-15526047
 ] 

Michael Dürig commented on OAK-4581:


bq. knowing that the events are coming in in a breadth first manner

I wouldn't rely on such implementation details but rather expose collecting 
information about child items through a proper API as proposed with OAK-4853.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-27 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525978#comment-15525978
 ] 

Stefan Egli commented on OAK-4581:
--

Created SLING-6070 for invoking reportChanges after child traversal finished

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-27 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525942#comment-15525942
 ] 

Stefan Egli commented on OAK-4581:
--

* re 'maps of added/removed/changed items' : I think we might be able to remove 
these, or lets say drastically reduce its size: by knowing that the events are 
coming in in a breadth first manner, the JcrResourceListener could build up and 
report changes whenever a child traversal finished - no need to wait with 
calling reportChanges until the very end (as the OakResourceListener sends out 
events also after child traversal finished).
* re 'osgiEventQueue' : that has gone with SLING-5163 / 
[here|https://github.com/apache/sling/commit/9c424dfca6a802a6b66b4b7981a313c5856a0e1f]
* Re 'Access control' : that's probably still an open issue indeed, however it 
is the case for both OakResourceListener *and* JcrResourceListener, so that's 
not an argument against JcrResourceListener

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525601#comment-15525601
 ] 

Michael Dürig commented on OAK-4581:


This is one aspect yes. The other is the maps of added, removed and changed 
items kept at {{JcrResourceListener#onEvent}}. Those are used to keep all items 
affected by a single event. For big change sets this pushed us OOM. Further 
{{JcrResourceListener#osgiEventQueue}} is yet another expensive queue of maps, 
which also was causing a lot of pressure on the heap. 

Access control was never taken into account. This was always being evaluated in 
the context of an admin session. 

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-27 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525549#comment-15525549
 ] 

Marcel Reutegger commented on OAK-4581:
---

So, IIUC that means the overhead of reading those properties over the JCR API 
was too high? There are two reasons I can think of why this would be the case. 
1) an event listener first needs to read the path from the event and then get 
the item, which means internally the path must be resolved resulting in reads 
of multiple items, depending on the length of the path. 2) access control must 
be evaluated, whereas an Observer gives you access to everything without access 
control checks. Is this a reasonable assessment?

 Created OAK-4853 for the 'proper' API proposal.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525357#comment-15525357
 ] 

Michael Dürig commented on OAK-4581:


bq. reading all added/changed nodes

This was the main driver for this change. Sling's previous implementation did 
not scale properly. 

+1 though to provide the required functionality though a "proper" API. 

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-26 Thread Carsten Ziegeler (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525207#comment-15525207
 ] 

Carsten Ziegeler commented on OAK-4581:
---

I don't remember the details, but I think this was a recommendation from the 
Oak team back then
Now, an additional reason at least for Sling was that the event bridge we have 
was reading all added/changed nodes to get the resource type property. This is 
less expensive as the observer approach can already provide that value

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-26 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523201#comment-15523201
 ] 

Vikas Saurabh commented on OAK-4581:


Hmm... I probably put too much weight into C2 in description-
bq. The journal should survive restarts

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-26 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523087#comment-15523087
 ] 

Stefan Egli commented on OAK-4581:
--

bq. 3. those observers get events post reboot
Not sure that's really the case. I would argue that after a restart these 
persisted events get deleted. IMO what 'persist' in the context of this ticket 
emphasizes is only additional memory at runtime, not that it survives restarts.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-26 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523022#comment-15523022
 ] 

Vikas Saurabh commented on OAK-4581:


What I meant was that this issue is saying that for JCR observation, we'd be 
more resilient for:
# event generated for node1
# node1 rebounces before event is dispatched to all observers
# those observers get events post reboot

That opens up the case for at-least-once-delivery for events that got generated 
for a given cluster node. My argument being that if we say that we just handle 
a sub-set of cases, then observation client still has to be resilient against 
loss of events - and if that observation client code is resilient what's the 
point of us trying to be more resilient.

Afaict, the issue got created because it's tricky/hard(/impossible??) for 
observation client to be resilient to loss of observation events (barring 
manual intervention). But, yes if we do this (just do simple stuff and not to 
give pedantic guarantee of delivery), then we can have tooling to help with 
that manual intervention.

Again, I'm not saying we need to do this(pedantic delivery) - just something 
that we should keep in mind about what the observation client might expect post 
this implementation.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-26 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522996#comment-15522996
 ] 

Stefan Egli commented on OAK-4581:
--

that sounds like a new class of listener that would want 'at-least-once' 
delivery in a cluster. Something probably useful, but I'm not sure if that fits 
into the observation umbrella of the JCR API. I think that's orthogonal to this 
ticket (somewhat) and could probably be handled separately?

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-26 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522974#comment-15522974
 ] 

Stefan Egli commented on OAK-4581:
--

The latter (GC-decoupling). We could indeed still have a limit to protect from 
disk overflow, but yes, it would be independent.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-26 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522963#comment-15522963
 ] 

Vikas Saurabh commented on OAK-4581:


Btw, just to note (I'm unable to concretely argue if we really want to do this 
or not) - no matter how we proceed, the storage would be cluster-id specific. 
So, do we want to cater the case where some cluster node gets obliterated (with 
pending stored events) - would/should some other node pick that up? Adding the 
disclaimer again that it's a case that would hit us once we start to advertise 
"stronger guarantee delivery of observation events" - not that we must fix it 
(we can as document it that the guarantee is a best-effort thing)

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-26 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522946#comment-15522946
 ] 

Vikas Saurabh commented on OAK-4581:


yeah, if we get around to single way of observation - then, I'd also like 1-B-1 
+ 3-C (same as what Marcel preferred).

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-26 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522942#comment-15522942
 ] 

Vikas Saurabh commented on OAK-4581:


bq. The advantage of persisting events would have been that it wouldn't have 
needed such a cut-off..
Umm... do you mean that we'd be ok with infinite storage with persisted events? 
Or, that it de-couples storage concern with GC and hence the cut-off metric 
could be storage size (as against must-have time boundaries if we remain 
coupled)

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-26 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522741#comment-15522741
 ] 

Marcel Reutegger commented on OAK-4581:
---

In my view it would be best to review the usage of Observers outside of Oak 
again and use JCR/Jackrabbit EventListeners whenever possible. The Oak 
observation package is not a public API and the export version is set to zero. 
I think that was a good decision, because we repeatedly run into situations 
where Oak observer (or BackgroundObserver) usage outside of Oak lags behind 
current development. Examples include missing MBean support, default values for 
queue length or new callbacks on BackgroundObserver. IMO we should stop 
promoting Oak Observer when we actually consider it internal.

SLING-3279 was created to leverage filters not available in JCR. However, the 
most useful filter with multiple paths is already available since Jackrabbit 
2.7.5 (JCR-3745) and gathering names of added/removed/changed properties is 
also possible with plain JCR Events.

[~mduerig] & [~cziegeler], do you remember the main reasons why Oak Observers 
were considered superior over JCR EventListeners?

If there are features missing, I would rather add them to the Jackrabbit API 
and make it available to other users as well and change the JCR Resource 
implementation to only rely on the Jackrabbit API.

Once we have a clear separation of Oak internal Observer usage and client code 
using JCR EventListeners, we can more easily optimize those two parts 
individually. The former is the clear responsibility of the repository and 
would produce events as efficiently and fast as possible, while the latter gets 
those events at its own pace. The current situation is IMO problematic because 
the two aspects are mixed.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2]

[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-26 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522461#comment-15522461
 ] 

Stefan Egli commented on OAK-4581:
--

[~catholicon], thx for your feedback! Re
bq. I thought this issue was about persisting change pointers
Agreed. And when looking at persisting NodeState(+CommitInfo) then the revGC 
issue comes up (you must later be able to do the diff, and revGC must not clean 
up things before the persited observation queues haven't been processed). And 
from this resulted the idea to not persist NodeState but the actual, calculated 
Event (even though that would bloat the storage, as it would become much 
simpler). However, this now again conflicts with support for any type of 
BackgroundObserver, not only ChangeProcessor. So I think the latter question 
becomes central now, and if we want to support any BackgroundObserver we need 
to persist NodeState and prevent revGC from cleaning up too early.
bq. Afaics, we still want remain wary of infinite storage of pointers
bq. Sure if we're saying that revGC deleted node states 
Exactly. There's a dilemma: we want to prevent revGC to being 'paused' for too 
long just because of observation - but if it isn't paused then such a slow or 
overwhelmed listener would loose events. We have to make a choice, it's a 
binary thing. Perhaps we have to cut off events after a certain time (eg after 
exactly 24hours to fit into the segment-tar's default generation cycle)? (The 
advantage of persisting events would have been that it wouldn't have needed 
such a cut-off..)

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elas

[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-22 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513400#comment-15513400
 ] 

Vikas Saurabh commented on OAK-4581:


I got slightly confused about what are we targeting here (sorry, haven't 
followed all the comments - just the description and the last explanation). My 
initial thought about the issue was that we're solving quicker diff elsewhere 
(segment is fast anyway. document now relies on journal, afaik can recover from 
full queue too, and can work on ranges). Given those improvements, I thought 
this issue was about persisting change pointers (rev or root-node-id as the 
need be and commit info for local changes) so that listeners work fine post 
rebounce too.

Afaics, we still want remain wary of infinite storage of pointers (whatever be 
the case - slow listeners or too many changes.. in effect rate_listener < 
rate_incoming).

Assuming (accurate??) diff processing is a different concern, I'm not sure why 
GC (assuming revGC) need to care about node states (thinking about 1A side of 
things) - I don't know about segment... but for document we can still get root 
node state. Sure if we're saying that revGC deleted node states (ie the even 
hadn't been consume in 24 hours) then is that really a case worth solving by 
retaining node states??

Anyway, assuming my original interpretation of the issue, I like 1-A-1 + 3-C... 
as you pointed out that we need to take care of OakResourceListener. I think 
we've bore a lot of pain with us improving at least diagnosing ChangeProcessor 
(jmx, configurable queue) while sling bridge kept reeling under pain with the 
default of 1k. I'd rather have improvements that work uniformly.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/

[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-21 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509914#comment-15509914
 ] 

Stefan Egli commented on OAK-4581:
--

One additional note: should we choose to go the I-B route (queue in 
ChangeProcessor), then this improvement will not become usable by Sling's 
ResourceChangeListener - as that has switched to using an OakResourceListener 
(based on Oak's recommendation to do so, see SLING-3279) and directly bases on 
BackgroundObserver...

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-21 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509752#comment-15509752
 ] 

Marcel Reutegger commented on OAK-4581:
---

My preferred approach is I-B-1 with something similar to III-C, structured like 
a log (Kafka is probably overkill).

In other issues we so far looked at how to improve the producer side of the 
observation system, but even if the producer side is perfect, the queue could 
still fill up when the listener is too slow. And once the listener is too slow, 
it may have a ripple effect on the producer side. The node state comparisons 
may not be served from cache anymore and require I/O interfering/competing with 
other read operations in the repository. I think it's best to create the JCR 
events as soon as possible and then push them into a queue that is consumed by 
the listener. The contents of this queue will obviously have bigger space 
requirements than the current queue with just the root states, but it is 
simpler to handle. The entries are independent of the repository (unrelated to 
GC) and read and write patterns are simple linear operations.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-21 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509287#comment-15509287
 ] 

Stefan Egli commented on OAK-4581:
--

/cc [~mreutegg]

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-20 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15506514#comment-15506514
 ] 

Stefan Egli commented on OAK-4581:
--

I'd like to move this ticket forward and believe we need a few decisions on the 
approach:

h4. I - Who to persist for
There are different possibilities as to where the persisted queue should sit:
h5. A - BackgroundObserver
In this class the BackgroundObserver's queue is persisted - and that can 
logically only be based on {{NodeState}}. This will thus support any type of 
downstream Observer including NodeObserver etc. Being based on Observer it 
requires GC-prevention.
Here's a list of concrete subvariants:
h6. 1 - store serialized root state
This seemlessly serializes and stores {{NodeState}} objects. Later on they are 
read and used for diffing. Which means the data must still be available to do 
the actual diff. This can be achieved by increasing the GC retention period one 
way or another. What's also important here is that the caches aren't poluted 
with these late diffs - ie they should probably not be stored in the cache in 
this late-delivery case.
h6. 2 - store serialized diff (and root state)
Besides serializing the {{NodeState}} this variant (also) stores the diff. This 
speeds up later diffing as the diff is then already there (it probably must be 
stored in 'a cache' temporarily, but only temporarily as it will only be used 
for one event, likely). This variant is still dependent on preventing GC 
though, as we're still on the Observer level, which works on {{NodeState}}.
h6. 3 - base it on the journal
Alternatively the journal is equipped with more diff-like information (perhaps 
with the full, but perhaps only partially), also see OAK-4586. Otherwise this 
has same characteristics as I-A-1 and I-A-2: GC must still be prevented, we're 
still on the Observation/NodeState level. It will be implementation dependent, 
as Segment doesn't have the same type of journal as Document has.
h5. B - ChangeProcessor
In this class the queue is handled on the ChangeProcessor level (not in 
BackgroundObserver), thus no longer based on NodeState, but now independent, 
the format just must be suitable for calculation and later delivery via 
onEvent. Being independent of NodeState allows to become independent of GC and 
cache-hotness issues. However, it's important to note that this class of 
solutions targets concrete EventListeners, not Observers in general!
h6. 1 - store serialized events
The ChangeProcessor calculates events as if for delivery to onEvent, but just 
persists the events as is. This will bloat the amount of data stored and 
increase I/O. However, later delivery is trivial as all the events are already 
there, they just have to be read and onEvent called.
h6. 2 - store serialized diff
The ChangeProcessor stores the serialized diff in a form that it can later be 
processed by the EventFilter and result in events for delivery to 
EventListener.onEvent. (This would then be independent from the NodeState)
h6. 3 - base it on the journal
If the journal contains the complete diff such that ChangeProcessor can 
evaluate the filters and deliver, then the journal could be enough (however 
that might be tricky to achieve). Also, this will be implementation dependent, 
as Segment doesn't have the same type of journal as Document has.
In any case, additionally the CommitInfo must be stored somewhere, either also 
in the Journal or per ChangeProcessor.
h4. II Serializing CommitInfo
Not sure if we have many options here, I think it's just something we have to 
do. And if Oak code prevents serialization, then we can fix it. If it's 
upper/application-layer code that causes problems, we can't do much other than 
issue a warn.
h4. III - Storage Layer
This depends a bit on the actual solution chosen. If we base it eg on journal, 
then a lot comes from there already. If we persist flat events, then surely an 
extra storage is needed.
h5. A - Use a SegmentNodeStore
* would be straight forward but has issues as mentioned by Michael.

h5. B - Use internas of SegmentNodeStore, eg SegmentWriter
* might be much more optimal, but adds dependencies on internas of tarmk.

h5. C - store as JSON in a flat file

[~mduerig], [~chetanm], [~catholicon], [~tmueller], which variant should we 
implement?

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> draw

[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-05 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15464465#comment-15464465
 ] 

Stefan Egli commented on OAK-4581:
--

Waiting with further implementations until we agree on how to proceed

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-05 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15464458#comment-15464458
 ] 

Stefan Egli commented on OAK-4581:
--

I wouldn't rule out that there are better fits, for sure. The advantage of 
using tarMk is that we'd use our own stuff for another use case too.
* _there will be a performance impact as the lists grows_ : we can store in 
sub-folders named with a time pattern, similar as sling jobs does
* _garbage collections is difficult and expensive_ : gc would be done in a 
generational approach: rewrite the queue in a new tarMk once certain thresholds 
are reached (ideally this could be done when the queue is empty, so no rewrite 
would be necessary). But gc on that tarMk incarnation would be turned off 
completely.
* _adding and removing elements from the list will cause rewrites of buckets 
instead of just a single append, remove_ : the idea is to store in batches, so 
I was assuming this shouldn't be that bad

Overall I'd find it simpler, but if ppl agree that we shouldn't use tarMk for 
this, then sure, let's switch.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-05 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1546#comment-1546
 ] 

Stefan Egli commented on OAK-4581:
--

So could we adjust the retention time dynamically (is there an API) when the 
queue grows?

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-05 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15464441#comment-15464441
 ] 

Stefan Egli commented on OAK-4581:
--

yes, this has been seen esp on documentMk with earlier oak versions (although 
it might also happen on newer oak versions, there's nothing preventing it from 
happening, but newer oak versions have several improvements in this area)

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459629#comment-15459629
 ] 

Michael Dürig commented on OAK-4581:


bq. I'm probably not getting the entirety of this point. 

My point was the the TarMK is not a particular good storage solution for linear 
lists. What we *could* do though is reuse some of its internals to implement a 
store that is. We would be reinventing Kafka on segment though...

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459618#comment-15459618
 ] 

Michael Dürig commented on OAK-4581:


bq. One approach that we currently target is to checkpoint the oldest entry, 
such that we prevent gc from removing it (assuming checkpoints are respected).

No it doesn't on {{oak-segment-tar}. We moved from reachability to retention 
time for gc as the former didn't work. 

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459608#comment-15459608
 ] 

Michael Dürig commented on OAK-4581:


bq. is for such rare special cases only.

That is what I was aiming at: do we need to cover rare special cases? Did we 
actually observe them and was it really an observation queue pushing the system 
OOM? Do rare spacial cases really warrant this amount of extra complexity? 

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459586#comment-15459586
 ] 

Michael Dürig commented on OAK-4581:


But storing the events themselves will tremendously increase the sizes of the 
queues. Each commit will occupy roughly as many slots in the queue as there 
where changes in the commit. Each event would need to store a path, a userId, 
an info map etc. Not exactly cheap. And on top of that there is a queue per 
listener. In effect this would multiply the requirements for IO bandwidth by 
the number of observation listeners (each commit is first stored to disk and 
then distributed to N listener queues and stored again in the form of events). 

Agreed, early filtering will reduce the number of events somewhat and above 
picture is just an upper bound. But storing the commit (handle) will occupy 
just a single slot in the queue while storing events will occupy at least as 
many slots unless a commit does not generate a single event. 

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-02 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458826#comment-15458826
 ] 

Stefan Egli commented on OAK-4581:
--

One more comment re
bq. I expect unbounded queues to have adverse effects on the hotness of the 
various caches.
As pointed out, I think we should do pre-filtering. But we should probably also 
do pre-calculation of the actual events we want to deliver. Assuming that the 
listener's filters are "good" it would be efficient to go through the whole 
filter and basically store pre-manufactured events as a result only. This would 
mean that once an event is persisted it is no longer dependent on _any_ cache. 
But it also means that what will be persisted will _actually_ be consumed later 
on - as the raw and final event itself would basically be stored, no later 
filtering whatsoever. 
Doing the filtering as early as possible would also add load to the system at 
commit time, thus result in a natural throttling. As basically the computation 
of all the events as they will be delivered to the listeners then becomes part 
of the commit.
(_two birds with one stone_)

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-09-02 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15457991#comment-15457991
 ] 

Stefan Egli commented on OAK-4581:
--

[~mduerig], thanks for the comments!
bq. Do we know that we need to go off-heap with that queue?
Agreed, the entries are normally cheap, but generally speaking it's the open 
map mechanism that can make them unbounded, thus larger. But even assuming they 
are cheap on average, you can have a situation where you have such a high 
traffic burst that you're overwhelming even a highly optimized listener logic. 
In which case the queues grow and you'll get an OutOfMemoryError. The benefit 
of persisting the queues (when they become big that is) is for such rare 
special cases only. And you'd have to construct such a rare case where 
basically you'd force an OutOfMemoryError versus with this patch not.
bq. I expect unbounded queues to have adverse effects on the hotness of the 
various caches.
Right. There's some ideas left that aren't much mentioned or fleshed out yet: 
on the one hand we should do _pre-filtering_ of events, such that only events 
end up on queues that are indeed meant to go to a listener. The listener 
shouldn't have to filter afterwards anymore. Currently we're putting events on 
each listener's queues and only filter after hte fact. If queues become large, 
then this very fact becomes an issue exactly due to cache inefficiencies in 
this case. Ie a lot of computation is then lost purely to figure out if a 
listener needs an entry or not (as it can't find it in the cache anymore). So 
with prefiltering this would not be an issue anymore. 
What would be left though is the cache-inefficiency for actual events that 
listeners _want_. There we might optimize by including a bit more info into 
what we persist, perhaps the actual diff if it's not too big etc etc.
bq. Any thoughts on how unbounded queues should interact with gc?
One approach that we currently target is to checkpoint the oldest entry, such 
that we prevent gc from removing it (assuming checkpoints are respected).
bq. However I dislike having to cope with serialising the open CommitInfo 
class. At least we should rely on a general purpose library here. 
Open for alternatives for sure! I was assuming that we need to store the 
CommitInfo obj, as that's what persisting is mostly about. And if something in 
there is not serializable, then we're lost and have to skip it (we can warn 
loudly though). What exactly were you thinking of as alternatives?
bq. I don't think PersistedBlockingQueue should use a node store as its 
back-end.
I'm probably not getting the entirety of this point. I guess one argument to 
reuse the tarMk is that it's something we have and know we can use it - we can 
surely use something else, for sure. Regarding GC the idea was to _not_ rely on 
GCing that observation-tarMk but to use generations of tarMk similar to how 
that's done in persistent cache: so we'd throw away a whole tarMk set once we 
switched to a new one.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The d

[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-08-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453237#comment-15453237
 ] 

Michael Dürig commented on OAK-4581:


A couple of comments/thoughts without diving too deep into the implementation:

* Do we know that we need to go off-heap with that queue? After all the entries 
are relatively cheap. At least we should run a comparison regarding heap 
utilisation with unbounded queues in memory and off-heap (now that we have that 
possibility).

* I expect unbounded queues to have adverse effects on the hotness of the 
various caches. In the worst case leading to some sort of thrashing when old 
observation handlers need to load old data replacing new entries required to 
server requests. 

* Any thoughts on how unbounded queues should interact with gc? On 
{{oak-segment-tar}} revisions get cleared after 2 gc cycles (48h). 

* On the implementation side I like abstracting over the queue implementation. 
However I dislike having to cope with serialising the open {{CommitInfo}} 
class. At least we should rely on a general purpose library here. Better even 
we find a way so we don't need to do it. 

* I don't think {{PersistedBlockingQueue}} should use a node store as its 
back-end. A hierarchical store is not a good fit for storing linear lists: 
there will be a performance impact as the lists grows, garbage collections is 
difficult and expensive, adding and removing elements from the list will cause 
rewrites of buckets instead of just a single append, remove. If we want to 
reuse the TarMK here, it might be better to implement a linked list on top of a 
segment writer. There are mechanisms which would allow you to partition along 
individual observation handlers. Adding and removing elements and gc would be 
quite simple. 


> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.h

[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-08-30 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15448502#comment-15448502
 ] 

Chetan Mehrotra commented on OAK-4581:
--

To start with we can go for #4. Just have a new interface for that which is 
implemented by NodeStore. This would be similar to Observable which is 
implemented by both NodeStore impl currently. 

Later we need to ensure via checkpoint that oldest entry in queue is backed by 
checkpoint lifetime.


> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-08-29 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15445633#comment-15445633
 ] 

Stefan Egli commented on OAK-4581:
--

Re {{A.1 - Serialized Root States}} I see the following options:
# Store entire diff in a json format
# Assume that incoming NodeState in {{contentChanged(NodeState,CommitInfo)}} is 
still the current root, then create a checkpoint and store the checkpoint Id.
# a combination of 1. and 2. above: for small diffs store the diff, for large 
diffs, create a checkpoint
# or introduce a more lightweight checkpoint mechanism which allows to 
_serialize NodeState_ (compared to {{checkpoint}} which is more heavy weight as 
it stores an id->nodeState mapping including properties).

[~chetanm], wdyt, I think having the option to do 4. would be good, but afaics 
would require a change in either the {{NodeState}} or the {{NodeStore}} APIs. I 
fear that the assumption done in 2. would perhaps not always hold and is a bit 
a weak link - which would mean that the only other option would be 1. which is 
expensive.

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Stefan Egli
>  Labels: observation
> Fix For: 1.6
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-08-22 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430918#comment-15430918
 ] 

Stefan Egli commented on OAK-4581:
--

bq. 2. The journal should survive restarts
For pure _infinite observation queue_ support this wouldn't really be necessary 
I guess? But having this could probably be used also to support JCR's 
_Journaled Observation_?

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
>  Labels: observation
> Fix For: 1.6
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-07-21 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387518#comment-15387518
 ] 

Chetan Mehrotra commented on OAK-4581:
--

Ack there would still be the issue of "events lagging to far behind". So either 
we prevent RevisionGC for all that duration. Or persist the events to be 
generated by just running the event iterator producer logic for such Listeners 
i.e. where what event gets generated can be determined from filter. But this 
might be too costly also!

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
> Fix For: 1.6
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-07-21 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387397#comment-15387397
 ] 

Michael Dürig commented on OAK-4581:


bq. With persistent journal observers which are fast would be able to get 
prompt delivery as they would be pulling directly and not impacted by other 
slow listeners. 

But this is the case already: the listeners operate off independent threads. 
The only dependency comes in from thread pool size, JVM scheduling and IO 
bandwidth. The same issues that you have to deal with in your proposal. 

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
> Fix For: 1.6
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-07-20 Thread Ian Boston (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387264#comment-15387264
 ] 

Ian Boston commented on OAK-4581:
-

Only consider my suggestions for a WAL if the existing Journal implementation 
is unsuitable. It seems the existing journal is suitable, so ignore WAL 
references.

---

Assuming the persisted journal capacity is only bounded by available disk 
space, the need to reference a remote journal to conserve queue capacity in 
exchange for additional computation or network is probably unnecessary. I 
assume "remote" means remote to the VM where the JVM is running and not some 
other interpretation of remote.

---

Events lagging too far behind (data latency) is:
1. something that will (I assume) be caused by the commit traffic an 
application is subjecting Oak to, rather than caused by Oak internal commits.
2. something an application will care about.
3. given sufficient and appropriate feedback (queue latency metric) something 
an application developer will be keen to act on during dev and QE before 
production.



> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
> Fix For: 1.6
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-07-20 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387105#comment-15387105
 ] 

Chetan Mehrotra commented on OAK-4581:
--

bq. However, I doubt this will solve OAK-2683. Instead it will most likely just 
shift the problem to "events lagging to far behind". Creating all sorts of 
problems in applications that rely on somewhat prompt event delivery. 

With persistent journal observers which are fast would be able to get prompt 
delivery as they would be pulling directly and not impacted by other slow 
listeners.  

bq. Also interaction with GC will be tricky as here the assumption is 
contrariwise and revisions are purged after a certain time. 

Yes that would need to be accounted for in implementation say via checkpoint.

 In DocumentNodeStore as part of OAK-4528 its ensured that we have a valid 
checkpoint based on what diff need to be processed ([~mreutegg] can confirm). 
Further we can then checkout repository state at any intermediate time interval.

However for SegmentNodeStore I am not sure of the impact of compaction on such 
intermediate root. For e.g. its ensured that one can fetch the repository state 
at given valid checkpoint reference but its not clear if same holds true for 
any intermediate "root"

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
> Fix For: 1.6
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

2016-07-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386566#comment-15386566
 ] 

Michael Dürig commented on OAK-4581:


In the {{SegmentNodeStore}} case you only need to persist the record ids of the 
root node states. I.e. conceptionally all you need is a persisted queue of 
{{BackgroundObserver.ContentChange}} instances. In fact I think it should be 
the same way for the {{DocumentNodeStore}}, maybe with a local cache to speed 
things up. 

However, I doubt this will solve OAK-2683. Instead it will most likely just 
shift the problem to "events lagging to far behind". Creating all sorts of 
problems in applications that rely on somewhat prompt event delivery. Also 
interaction with GC will be tricky as here the assumption is contrariwise and 
revisions are purged after a certain time. 

> Persistent local journal for more reliable event generation
> ---
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: core
>Reporter: Chetan Mehrotra
> Fix For: 1.6
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)