Hi, Thank you for the detailed explanation. I can now see how this works with a consistent root document as the slow node effectively waits till its time is ahead of the last root commit and it is clear to commit. This ensures that all commits are sequential based on the revision timestamp.
Presumably, having a cluster node running behind real time will result in lower throughput, making it critical to run NTP on all cluster nodes to eliminate as much clock drift as possible ? Also, does the current revision model behave with an eventually consistent storage mechanism, or does Oak require that the underlying storage is immediately consistent in nature ? Best Regards Ian On 16 February 2016 at 10:36, Marcel Reutegger <[email protected]> wrote: > Hi, > > On 16/02/16 09:56, "[email protected]<mailto:[email protected]> on > behalf of Ian Boston" wrote: > So, IIUC, (based on Revision.compareTo(Revision) used by > StableRevisionComparitor. > > yes. > > If one instance within a cluster has a clock that is lagging the others, > and all instances are making changes at the same time, then the changes > that the other instances make will be used, even the the lagging instance > makes changes after (in real synchronised time) the others ? > > no, either cluster node has equal chances of getting its > change in, but the other cluster node's change will be rejected. > > Let's assume we have two cluster nodes A and B and cluster node > A's clock is lagging 5 seconds. Now both cluster nodes try to > to set a property P on document D. One of the cluster nodes will be > first to update document D. No matter, which cluster node is first, > the second cluster node will see the previous change when it attempts > the commit and will consider the change as not yet visible and > in conflict with its own changes. The change of the second cluster > node will therefore be rolled back. > > The behaviour of the cluster nodes will be different when external > changes are pulled in from external cluster nodes. The background > read operation of the DocumentNodeStore reads the most recent > root document and compare the _lastRev entries of the other cluster > nodes with its own clock (the _lastRev entries are the most recent > commits visible to other cluster nodes). Here we have two cases: > > a) Cluster node A was successful to commit its change on P > > Cluster node A wrote a _lastRev on the root document for this > change: r75-0-a. Cluster node B picks up that change and compares > the revision with its own clock, which corresponds to r80-0-b > (for readability, assuming for now the timestamp is a decimal > and in seconds instead of milliseconds). Cluster node B will > consider r75-0-a as visible from now on, because the timestamp > of r80-0-b is newer than r75-0-a. From this point on Cluster > node B can overwrite P again because it is able to see the most > recent value set by A with r75-0-a. > > b) Cluster node B was successful to commit its change on P > > Cluster node B wrote a _lastRev on the root document for this > change: r80-0-b. Cluster node A picks up that change and compares > the revision with its own clock, which corresponds to r75-0-a. > Cluster node A will still not consider r80-0-b as visible, > because its own clock is considered behind. It will wait until > its clock is passed r80-0-a. This makes a new change by A > overwriting B's previous value of P, will have a newer timestamp > than the previously made visible change of B. > > This means: > > 1) all changes considered visible can be compared with the > StableRevisionComparator without the need to take clock > differences into account. > > 2) a change will conflict if it is not the most recent > revision (using StableRevisionComparator) or the other > change is not yet visible but already committed. > > > I can see that this won't matter for the majority of nodes, as collisions > are rare, but won't the lagging instance be always overridden in the root > document _revisions list ? > > Depending on usage, collisions are actually not that rare ;) > > The _revisions map on the root document contains just > the commit entry. A cluster node cannot overwrite the > entry of another cluster node, because they use unique > revisions for commits. Each cluster node generates revisions > with a unique clusterId suffix. > > Are there any plans to maintain a clock difference vector for the cluster ? > > Oak 1.0.x and 1.2.x still have something like this. See > RevisionComparator. However, it only maintains the clock > differences for the past 60 minutes. > > Oak 1.4 introduced a RevisionVector, which is inspired by > version vectors [0]. > > Regards > Marcel > > [0] > https://issues.apache.org/jira/browse/OAK-3646?focusedCommentId=15028698&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15028698 >
