Stefan Egli commented on OAK-4581:

An update on filtering and the persistence design: The current filter logic 
(EventFilter/EventQueue) takes NodeStates as input and generates Events that 
can be pulled (via an iterator) at discretion. The fact that we want to 
_persist events_ means that the filter is applied before persisting. Later, at 
read/delivery time this same filter would not be applicable anymore, as it 
can't filter Events (it can only filter based on NodeStates).

Now when we look at how to support persisting events for a set of listeners 
then one question arises: does each listener have its own private queue, or is 
there something like a shared queue. A shared queue would sound advantageous as 
it would reduce disk space. It could for example be solved by figuring out 
which listener is interested in a particular event, and enrich an event in the 
storage format with a set of listener ids. And at delivery time you could then 
just read those events that contain your listener id.

Here's a description of possible private-vs-shared queue scenarios:
h5. shared event queue
Events would only be stored once, they would contain a set of listener ids 
indicating which listener is interested in it. Storage space would be kept 
To support an "infinitely large" commit, the events would have to be persisted 
while traversing through the diff. Therefore the "deduplication" of events must 
happen during this traversal too. Thus it seems reasonable to solve this with 
an "Umbrella EventFilter" that has all the actual filters as sub-filter and 
which marks each events with the set of ids that include the change. While this 
sounds complicated, it might be doable.

However, there's one additional obstacle here: the NamePathMapper: in theory 
each listener can have a different NamePathMapper - besides also having 
different base paths (which have their own Generator, thus could have a 
different order).

In short, I believe this is not easily solvable in the current design.
h5. semi-shared event queue
Events would still be _shareable_, however it would be based on best-effort and 
no longer, unlike above, be precise. So, storage space would be bigger in some 
The way this could be done is to have the same underlying persistence logic: 
each event can have a set of listener ids for which it applies. The difference 
is in how the filtering would happen: for each listener an 
EventFilter/EventQueue would be created and Events would be generated 
individually. After a certain number of Events are generated for each listener, 
the resulting set of Events is deduplicated. So the deduplication doesn't 
happen in an Umbrella-EventFilter but after the Event generation.

The reason this is best-effort is that the events are worked off in batches: to 
avoid OOME and still support infinitely large commits that's probably the way 
to go. And it's possible that an Event would show up in different batches for 
different listeners. So the deduplication is only happening in one batch, not 
in multiple - and that's where the best-effort lies in.

An additional disadvantage of this approach is more read I/O when deduplication 
isn't perfect.
h5. individual event queues
This model is the simplest: each listener has its own, individual event queue. 
The persistence doesn't have to keep track of listener ids. Of course this 
means more storage is required, but the code would be simpler.

h4. Conclusion
While the shared queue approach seems favorable in terms of storage space, it 
is more complex code and perhaps more prone to bugs. Also, I'm not sure how 
much disk space would be gained as listeners should be very precise thus 
ideally somewhat disjunct. So perhaps we should go for the simpler, individual 
queue model as a first iteration. 
[~chetanm], [~tmueller], [~mreutegg], et al, wdyt?

> Persistent local journal for more reliable event generation
> -----------------------------------------------------------
>                 Key: OAK-4581
>                 URL: https://issues.apache.org/jira/browse/OAK-4581
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: core
>            Reporter: Chetan Mehrotra
>            Assignee: Stefan Egli
>              Labels: observation
>             Fix For: 1.6
>         Attachments: OAK-4581.v0.patch
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple 
> drawbacks. Quite a bit of work is done to make diff generation faster. 
> However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous 
> observer
> # Observors which are meant to handle such events in async way (by virtue of 
> being wrapped in BackgroundObserver) would instead pull the events from this 
> persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of 
> SegmentNodeStore root state looks like{color} - Possible the RecordId of 
> "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote 
> journal to determine the affected paths. Which reduces the need for 
> persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and 
> then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains 
> information about affected path. Similar to what is current being stored in 
> DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure 
> whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. 
> [~mreutegg] suggested that for persisted local journal we can also utilize a 
> SegmentNodeStore instance. Care needs to be taken for compaction. Either via 
> generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log 
> implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. 
> This would allow each Listener to "pull" content changes and generate diff as 
> per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] 
> https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] 
> https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog

This message was sent by Atlassian JIRA

Reply via email to