ContentSessionImpl.java)

Michael Dürig Tue, 19 Jun 2012 06:46:07 -0700


On 18.6.12 23:16, Jukka Zitting wrote:

- ChangeSet is just a container carrying the trees as they where after and
before the change. So this is very close to the diffing approach you
describe only a bit more explicit. Also ChangeSet is the place where
additional information like change set meta data could live. I'm close to
certain that we will need something along these lines (i.e. userData,
timestamps, user who initiated that change, session id of the originating
session).


The reason why I worry about the ChangeSet concept is that it implies
that each commit() produces a separate ChangeSet that then gets
delivered to each observation listener for processing. This is
troublesome for two key reasons:

1) Performance: Consider a large cluster that supports lots of
concurrent writes hitting all cluster nodes. We should be able to
support at least hundreds or thousands of commits per second on such
systems, and ideally the only limit here would be the amount of
available hardware. With the ChangeSet concept each of those commits
would result in a separate waitForChanges() return value, which would
cause event queues to start growing indefinitely if any one of the
listeners can't keep up with the stream of incoming changes. The
poll+diff approach avoids that problem since a listener only sees the
combined set of changes across the polling interval.

There is nothing which implies "creating" a ChangeSet instance for eachcommit. The change sets are just implicitly there and can be retrievedwith instances of ChangeSet created on the fly by callingwaitForChanges(). So no queues. Consumers which use the blocking featurewould just process a backlog which they'd consume at their own pace. Thebacklog is *not* represented by a queue but by a position (the previousparameter). Just like polling only that the call would block if there isno next change set yet.

2) Linearity: Our overall design explicitly allows concurrent commits
that are only later merged together. This makes the concept of a
"previous" or "following" ChangeSet somewhat troublesome. You could
avoid that trouble by interpreting all concurrent commits from another
cluster node as a singe merge ChangeSet, but then you already lose
per-commit metadata. Again the poll+diff approach avoids this problem
since it doesn't care how and from where changes entered the latest
visible state of the tree.

I see. In your scenario you would return all changes (i.e. the diff ofthe trees) between the last poll and this poll. In my scenario pollingwould just follow the entries in the Microkernel journal and return thechanges (again as diff of the trees) of the revisions therein.

My reasoning for this is - as I said earlier - that it allows us toimplement JCR journals (which has the concept of change sets through thepersist event) and also allows us to thread through userData and relatedinformation.

In the clustered case I'd handle changed from a cluster sync likechanges from any other session. So to the end user changes occurring dueto cluster synchronisation do not look any different than changes madeby another session on the same instance.

Different cluster nodes would see a different linear order of events andeven different events. The end result however would be the same for allof them.


Michael

- The approach aligns neatly with the JCR features: implement observation
using blocking calls and implement journalling by using non blocking calls.


There's no concept of blocking calls for observation in JCR.

BR,

Jukka Zitting

Re: Observation design (Was: svn commit: r1351414 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak: api/ChangeSet.java api/ContentSession.java core/ContentSessionImpl.java)

Reply via email to