Hi,

I was looking into how to enforce referential integrity for referenceable nodes (https://issues.apache.org/jira/browse/OAK-685,
https://issues.apache.org/jira/browse/OAK-101).

Currently references are implemented through an (unique) query index on the uuid property. Resolving references and finding references to a referenceable node thus involves doing a query. If we want to enforce referential integrity in this design, we'd need access to an up to date query index from within the respective commit hook. This could be either through a query engine or some other means to access the uuid index directly.

Instead of this we could however change the design such that no query index is needed to track references. In such a design referenced nodes would contain back references to all its referents. A commit hook could be employed to keep the back references up to date. Furthermore that commit hook could simply enforce referenceable integrity by checking whether the set of back references is empty on remove.

However, this design is not enough to ensure uniqueness of uuids and to look up nodes by uuid. For this we still need some kind of an index structure. So we could roll our own here or reuse query indexes. In the latter case the commit hook again needs access to the query index in order to do its job of updating back references.

In summary the options are:

a) Build our own ad-hoc index structure for uuid uniqueness and lookup. Use back references to find referring nodes and to enforce referential integrity.

b) Use query indexes for uuid uniqueness and look up and for enforcing referential integrity in a commit hook and for finding referring nodes.

c) Use query indexes for uuid uniqueness and look up and for enforcing referential integrity in a commit hook. Use back references to find referring nodes. In this scenario the commit hook still needs access to the query index in order to be able to properly update the back references.

I'm not in favour of c) since it adds complexity from both worlds and I don't see much added value.

For b), it would be best if we had a way to access query indexes without having to go through an actual query.

Finally a) duplicates some of the indexing logic we have already for query indexes, but can do that in a way which is optimal for handling references.

Implementation wise b) would be least effort and a) is probably the leanest, cleanest and meanest solution.

WDYT?

Michael



Reply via email to