Hi all,
Last year David Nuescheler provided some very useful data modelling rules -
1 of which related to the use of "reference" properties.
We had various, reasonable, use cases for which we felt references were
appropriate e.g. to ensure referential integrity. While semantically they've
worked well for us, as our repositories grow in size we're now seeing how
expensive references can be where you have 1,000's of items referencing the
same node. Monitoring our SQL logs (on MySql) we can see some pretty huge
database operations which are getting incrementally slower as more and more
items reference a node.
Ignoring the data modelling semantic of using "reference" properties for the
moment, is there anything that can be done in order to improve the
performance of references or do you always have to design with this
limitation in mind (hindsight being very useful)?
We're facing a tricky remodelling/migration exercise to ensure further
scalability.
Regards,
Shaun
--- Begin Message ---
Explanation
---
References imply referential integrity. I find it important to
understand that references do not just add additional cost for the
repository managing the referential integrity, but they also are
costly from a content flexibility perspective.
Personally I make sure I only ever use references when I really cannot
deal with a dangling reference and otherwise use a path, a name or a
string UUID to refer to another node.
Example
---
Let's assume I allow "references" from a document (a) to another
document (b). If I model this relation using reference properties this
means that the two documents are linked on a repository level. I
cannot export/import document (a) individually, since the reference
property's target may not exist. Other operations like merge, update,
restore or clone are affected as well.
So I would either model those references as "weak-references" (in JCR
v1.0 his essentially boils down to string properties that contain the
uuid of the target node) or simply use a path. Sometimes the path is
more meaningful to begin with.
I think there are usecases where a system really can't work if a
reference is dangling, but I just can't come up with a good "real" yet
simple example from my direct experience.
--- End Message ---