Hi,
I was looking into how to enforce referential integrity for
referenceable nodes (https://issues.apache.org/jira/browse/OAK-685,
https://issues.apache.org/jira/browse/OAK-101).
Currently references are implemented through an (unique) query index on
the uuid property. Resolving references and finding references to a
referenceable node thus involves doing a query. If we want to enforce
referential integrity in this design, we'd need access to an up to date
query index from within the respective commit hook. This could be either
through a query engine or some other means to access the uuid index
directly.
Instead of this we could however change the design such that no query
index is needed to track references. In such a design referenced nodes
would contain back references to all its referents. A commit hook could
be employed to keep the back references up to date. Furthermore that
commit hook could simply enforce referenceable integrity by checking
whether the set of back references is empty on remove.
However, this design is not enough to ensure uniqueness of uuids and to
look up nodes by uuid. For this we still need some kind of an index
structure. So we could roll our own here or reuse query indexes. In the
latter case the commit hook again needs access to the query index in
order to do its job of updating back references.
In summary the options are:
a) Build our own ad-hoc index structure for uuid uniqueness and lookup.
Use back references to find referring nodes and to enforce referential
integrity.
b) Use query indexes for uuid uniqueness and look up and for enforcing
referential integrity in a commit hook and for finding referring nodes.
c) Use query indexes for uuid uniqueness and look up and for enforcing
referential integrity in a commit hook. Use back references to find
referring nodes. In this scenario the commit hook still needs access to
the query index in order to be able to properly update the back references.
I'm not in favour of c) since it adds complexity from both worlds and I
don't see much added value.
For b), it would be best if we had a way to access query indexes without
having to go through an actual query.
Finally a) duplicates some of the indexing logic we have already for
query indexes, but can do that in a way which is optimal for handling
references.
Implementation wise b) would be least effort and a) is probably the
leanest, cleanest and meanest solution.
WDYT?
Michael