ContentSessionImpl.java)

Michael Dürig Wed, 20 Jun 2012 04:27:18 -0700


On 20.6.12 11:35, Jukka Zitting wrote:

Hi,

On Tue, Jun 19, 2012 at 11:45 PM, Michael Dürig<[email protected]>  wrote:

But this is no different with polling not backed by the Microkernel journal:
when a client takes a long time to digest changes this might cause the next
poll to be deferred so much that the relevant revisions are not available
any more.


That's still a problem, but a somewhat different one (see the earlier
discussion about revision lifetimes and leases). The changeset
approach requires that *all* revisions since the last observed one are
still available, whereas the polling approach just requires *two*
revisions to be available: the last observed one and the very latest
one.

Yes but... with and without such a lease mechanism my approach is moregeneral and doesn't hurt anything: if older revisions are available myapproach generates a more fine grained set of events. If older revisionsare not available any more it just gracefully degenerated to yourapproach. If in the extreme only two revisions (last observed andlatest) are available it is the same as your approach.


In a write-heavy deployment we could easily see hundreds of revisions
per second. A system that wants to preserve the entire journal for
even just an hour could face the need to keep track of something like
a million revisions, potentially much more. I don't think that's a
feasible approach at least as a general solution.

There may well be deployments where we *do* want to keep detailed
audit logs of everything anyone has done, but I'd rather handle that
as an optional extension than a core part of the API.

There is a linear order of the events on each cluster node. The order is
just not the same for all of them. As I said, a cluster sync is just viewed
as changes applied by any other session. So its all in the journal.


If you do view the cluster sync as a change applied by another
session, then how do you handle user data and other event details from
the potentially many changes that got applied by perhaps multiple
different sessions on the other cluster node?

Just compare the states from before the sync and after the sync tocalculate the events. I wouldn't forward any user data from remotesessions since I'd like to view cluster sync as changes applied by a"sync" session which look just like any other session to the user.

In other words: the observable results should be the same like if a user"sync" created a session and manually merged the differences of thecluster nodes.


For example, consider the following scenario with cluster nodes A and B:

A: set property P from X to Y at time 1 with user data K
B: set property P from X to Y at time 2 with user data L
B: set property P from Y to Z at time 3 with user data M
B: set property Q with user data N

When A syncs with B, the resulting property P would presumably be set
to Z (the only sane way of merging such changes), but which events and
what user data will an observer on A see? Will user data L ever be
seen by an observer on A? Will M? If yes, what is the sequence of
property P changes seen by an observer on A: X ->  Y ->  Z or X ->  Y, X
->  Y ->  Z?

The (imaginary) sync session for A would set P from Y to Z and setproperty Q. A would observe X -> Y (its own change) and Y -> Z, settingof Q (change by the sync session).


Michael


BR,

Jukka Zitting

Re: Observation design (Was: svn commit: r1351414 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak: api/ChangeSet.java api/ContentSession.java core/ContentSessionImpl.java)

Reply via email to