Hi,
There is org.apache.jackrabbit.oak.spi.commit.PartialConflictHandler and
a couple of its implementations already. Maybe this could be leveraged
here by somehow connecting it to the mix-ins you propose.
Michael
On 21.3.16 9:03 , Stefan Egli wrote:
Hi oak-devs,
tl.dr: suggestion is to introduce a new property (or mixin) that enables
async merge for a subtree in a cluster case while at the same time
pre-defines conflict resolution, since conflicts currently prevent
trouble-free async merging.
In case this has been discussed/suggested before, please point me to the
discussion, in case not, here's the suggestion:
When it comes to handling conflicts we either deal with them in a
synchronous way (we throw a CommitFailedException right away) or have no
feasible/implemented solutions how to asynchronously handle them (we'd have
the possibility of leaving :conflict markers persisted, which would in
theory allow asynchronous merges, but so far we don't have anything built
ontop of that)
In any case, for cluster scalability it's critical that we avoid
'synchronous' checks and instead switch to asynchronous merging wherever
possible: while for some parts of the content (eg '/var') it is always
necessary to have synchronous checks, the assumption is that other areas (eg
'/content') might well live with something asynchronous - as normally no
conflicts occur and if, then a predefined schema that then kicks in is fine.
And one way to tackle this would be to mark nodes (and thus implicitly its
subtree) in a way that says "from here on below it's ok to do asynchronous
conflict resolution of type X". Something that could be solved by
introducing an explicit marker in the form of eg a mixin or a property
'oak:asyncConflictResolution' (that could either refer to a globally defined
resolution or further detail 'how' that resolution should look like). If a
transaction would involve both normal as well as async conflict resolution,
then not much is gained as you'd still have to do conflict checks at least
for that 'normal/sync' part. But if the expectation is that there are cases
of transactions that include only such async marked areas, then you can
avoid the synchronous checks.
Examples for these pre-defined resolutions are: 'delete-wins, then
latest-change-wins' (which might be the easiest), or 'latest-change-wins'
(which might be more tricky as that would mean those 'changeDeleted' cases
would resurrect deleted data magically - possible but perhaps too magic), a
third one could again be 'strict' (which would correspond to JCR semantics
as are the default today) - or again
'no-resolution-but-persist-conflict-marker' etc...
Having such pre-defined conflict resolution and at the same time clearly
indicating that doing conflict-checking asynchronously is OK would allow to
have truly parallel writes into the NodeStore from different instance's pov.
Wdyt?
Cheers,
Stefan