I've used an approach simular to a) in the past (other projects) which was efficient and successful at both bi directional RI and optimised queries. I also found approaches simular to b) and c) problematic both from a write throughput point of view and from a query point of view. I dont know enough about the low level details of Oak to really be any help, other than to say, requiring a query index that is ms uptodate at scale and transactional is quite hard to achieve without adding overhead to write. Ian
On 11 April 2013 20:01, Michael Dürig <[email protected]> wrote: > > Hi, > > Here is a summary of a quick f2f discussion Jukka, Angela, Tom and I had > today: since there is no index for finding all references to a node, using > a query is troublesome here. We should thus update the code such that > referenced nodes maintain back references to its referrers and use a commit > hook to keep the set of back reference up to date. These back references > would then be used to enforce referential integrity of referenceable nodes > and to implement Node.getReferences() (instead of the inefficient query > based implementation we have to today). > > Michael > > On 4.4.13 13:34, Michael Dürig wrote: > >> >> Hi, >> >> I was looking into how to enforce referential integrity for >> referenceable nodes >> (https://issues.apache.org/**jira/browse/OAK-685<https://issues.apache.org/jira/browse/OAK-685> >> , >> https://issues.apache.org/**jira/browse/OAK-101<https://issues.apache.org/jira/browse/OAK-101> >> ). >> >> Currently references are implemented through an (unique) query index on >> the uuid property. Resolving references and finding references to a >> referenceable node thus involves doing a query. If we want to enforce >> referential integrity in this design, we'd need access to an up to date >> query index from within the respective commit hook. This could be either >> through a query engine or some other means to access the uuid index >> directly. >> >> Instead of this we could however change the design such that no query >> index is needed to track references. In such a design referenced nodes >> would contain back references to all its referents. A commit hook could >> be employed to keep the back references up to date. Furthermore that >> commit hook could simply enforce referenceable integrity by checking >> whether the set of back references is empty on remove. >> >> However, this design is not enough to ensure uniqueness of uuids and to >> look up nodes by uuid. For this we still need some kind of an index >> structure. So we could roll our own here or reuse query indexes. In the >> latter case the commit hook again needs access to the query index in >> order to do its job of updating back references. >> >> In summary the options are: >> >> a) Build our own ad-hoc index structure for uuid uniqueness and lookup. >> Use back references to find referring nodes and to enforce referential >> integrity. >> >> b) Use query indexes for uuid uniqueness and look up and for enforcing >> referential integrity in a commit hook and for finding referring nodes. >> >> c) Use query indexes for uuid uniqueness and look up and for enforcing >> referential integrity in a commit hook. Use back references to find >> referring nodes. In this scenario the commit hook still needs access to >> the query index in order to be able to properly update the back >> references. >> >> I'm not in favour of c) since it adds complexity from both worlds and I >> don't see much added value. >> >> For b), it would be best if we had a way to access query indexes without >> having to go through an actual query. >> >> Finally a) duplicates some of the indexing logic we have already for >> query indexes, but can do that in a way which is optimal for handling >> references. >> >> Implementation wise b) would be least effort and a) is probably the >> leanest, cleanest and meanest solution. >> >> WDYT? >> >> Michael >> >> >> >>
