[
https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522741#comment-15522741
]
Marcel Reutegger commented on OAK-4581:
---------------------------------------
In my view it would be best to review the usage of Observers outside of Oak
again and use JCR/Jackrabbit EventListeners whenever possible. The Oak
observation package is not a public API and the export version is set to zero.
I think that was a good decision, because we repeatedly run into situations
where Oak observer (or BackgroundObserver) usage outside of Oak lags behind
current development. Examples include missing MBean support, default values for
queue length or new callbacks on BackgroundObserver. IMO we should stop
promoting Oak Observer when we actually consider it internal.
SLING-3279 was created to leverage filters not available in JCR. However, the
most useful filter with multiple paths is already available since Jackrabbit
2.7.5 (JCR-3745) and gathering names of added/removed/changed properties is
also possible with plain JCR Events.
[~mduerig] & [~cziegeler], do you remember the main reasons why Oak Observers
were considered superior over JCR EventListeners?
If there are features missing, I would rather add them to the Jackrabbit API
and make it available to other users as well and change the JCR Resource
implementation to only rely on the Jackrabbit API.
Once we have a clear separation of Oak internal Observer usage and client code
using JCR EventListeners, we can more easily optimize those two parts
individually. The former is the clear responsibility of the repository and
would produce events as efficiently and fast as possible, while the latter gets
those events at its own pace. The current situation is IMO problematic because
the two aspects are mixed.
> Persistent local journal for more reliable event generation
> -----------------------------------------------------------
>
> Key: OAK-4581
> URL: https://issues.apache.org/jira/browse/OAK-4581
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: core
> Reporter: Chetan Mehrotra
> Assignee: Stefan Egli
> Labels: observation
> Fix For: 1.6
>
> Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple
> drawbacks. Quite a bit of work is done to make diff generation faster.
> However there are still chances of event queue getting filled up.
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous
> observer
> # Observors which are meant to handle such events in async way (by virtue of
> being wrapped in BackgroundObserver) would instead pull the events from this
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form.
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of
> SegmentNodeStore root state looks like{color} - Possible the RecordId of
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote
> journal to determine the affected paths. Which reduces the need for
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains
> information about affected path. Similar to what is current being stored in
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching.
> [~mreutegg] suggested that for persisted local journal we can also utilize a
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis.
> This would allow each Listener to "pull" content changes and generate diff as
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2]
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3]
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)