[
https://issues.apache.org/jira/browse/OAK-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14747144#comment-14747144
]
Marcel Reutegger commented on OAK-3388:
---------------------------------------
I think there is a more fundamental problem with the current use of
RevisionComparator
and how revisions are compared across cluster nodes.
Changes from other cluster nodes are made visible in a bulk operation by adding
the
_lastRev on the root document of another cluster to the RevisionComparator at a
given
'seenAt' revision. This 'seenAt' revision is the same for all new _lastRevs seen
from other cluster nodes. The problem now is when two revisions from other
cluster nodes
are compared and they have the same 'seenAt' revision. The RevisionComparator
will simply
decide which revision comes first based on the clusterId. Assume we have three
cluster nodes
A, B and C with clusterIds 1, 2 and 3 respectively. Cluster A does not write,
but simply
observes changes done by B and C. Consider the following sequence of events:
- C adds a node N to the repository
- C runs background operations
- B runs background operations and sees N
- B removes N
- B runs background operations
- A runs background operations
To A changes done by B and C will have the same 'seenAt' revision and the
RevisionComparator
will decide based on the clusterId. The comparator will tell the change done by
B (removing
the node) happend before the change done by C (adding the node), because B has
a lower
clusterId than C.
This issue only had a minor impact so far because
NodeDocument.getNodeAtRevision() uses
a StableRevisionComparator when it goes through the revisions of a property. It
only uses
the RevisionComparator to decide if a change is visible or not based on a given
read revision.
In my view there are multiple issue to solve:
- The RevisionComparator mechanism to make multiple external changes visible at
a given seenAt revision is fragile because the comparison is only based on the
clusterId. Taken to the extreme it means an old revision made visible via a
background read or on startup may be considered newer than a current revision,
just because the clusterId is higher.
- The RevisionComparator allows time shifting. That is, the implementation
tries to accommodate clock differences on cluster nodes and puts revisions in
proper sequence as they were made visible. However, it only remembers this
sequence for a one hour time frame. Older revisions are compared only based on
the revision timestamp. This make the comparison unstable over time. While the
comparator may say a revision R1 happened before R2 while in the one hour time
frame, the comparison may later tell the opposite.
I see the following solution to these problems:
- Prevent changes that happen in the past (by just looking at the revision
timestamp). This is what the ignored tests added in this issue do. Due to clock
differences a subsequent change has a lower revision timestamp. This is
currently allowed because the RevisionComparator accommodates this difference,
but it only works reliably within a one hour timeframe.
- Detect clock differences in the background read operation. The background
read operation must only make external changes visible that have a lower
timestamp than the local clock. This is similar to the first item.
- The RevisionComparator always uses the revision timestamp to compare
revisions, unless a revision is not yet visible.
> Inconsistent read in cluster with clock differences
> ---------------------------------------------------
>
> Key: OAK-3388
> URL: https://issues.apache.org/jira/browse/OAK-3388
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: core, mongomk
> Affects Versions: 1.0, 1.2
> Reporter: Marcel Reutegger
> Assignee: Marcel Reutegger
> Fix For: 1.3.7
>
>
> This issue is similar to OAK-2929 but related to how the DocumentNodeStore
> reads a node state when there is a clock difference between multiple cluster
> nodes. The node state read from a NodeDocument may not be correct when there
> is a clock difference.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)