I think that the main problem is not really about the specific case, but in general that when people design relational databases, they always use references (or more properly, joins) to define data that belongs logically to many entities, but should not duplicated.

Imagine that you have a company tree, with "positions", "departments", "employees", "health plans" etc. An employee could belong to a department, have a position and an health plan, but typically you would not make all those nodes child nodes of the employee: you would instead define references to the proper node in the "position" and "health plan" subtrees. It's easy to see how, in a large company, there could be thousands of employee holding the same position and health plan, and those specific nodes ("Secretary" and "Plan A") would have thousand of references pointing to them. So, given the issue as explained by Marcel that "whenever a reference is added that points to a node N the complete set of references pointing to N is re-written to the persistence manager", it seems that using references to a node that is very "popular" is really going to be creating problems in the long term.

What could be the right way to model things? Maybe using a "path" property to point to the node instead? Of course, it would not be as easy to use as a reference, and it would be requiring global updates if the pointed node ever change position, but I can't see other options.

Any suggestions?

Alessandro Bologna


On Apr 26, 2007, at 2:38 PM, Jukka Zitting wrote:

Hi,

On 4/26/07, Stefan Kurla <[EMAIL PROTECTED]> wrote:
I would appreciate the thoughts on references though. Reason being
that one of the biggest strengths of JSR-170 is the ability to store
references. I imagine a situation where i could have a nodetype call
docType which is either pdf or word strings. Say 80% of my documents
are word documents. Then the docType will have a reference to 80% of
all documents in my repository. If my repository is 100,000 files then
docType references 80,000 nodes.

If what you say is correct that at every new reference, the complete
set of references are rewritten, then obviously this is a bottleneck.

Should such a situation be avoided?

Why would you need to use such references structure? I would rather
use the node types to model such information. A search query like
//element(*,my:wordDocument) will efficiently return you all such Word
documents in your workspace.

BR,

Jukka Zitting

Reply via email to