Re: importing jackrabbit into jackrabbit

Alessandro Bologna Thu, 26 Apr 2007 12:19:32 -0700

I think that the main problem is not really about the specific case,but in general that when people design relational databases, theyalways use references (or more properly, joins) to define data thatbelongs logically to many entities, but should not duplicated.

Imagine that you have a company tree, with "positions","departments", "employees", "health plans" etc.An employee could belong to a department, have a position and anhealth plan, but typically you would not make all those nodes childnodes of the employee: you would instead define references to theproper node in the "position" and "health plan" subtrees.It's easy to see how, in a large company, there could be thousands ofemployee holding the same position and health plan, and thosespecific nodes ("Secretary" and "Plan A") would have thousand ofreferences pointing to them.So, given the issue as explained by Marcel that "whenever areference is added that points to a node N the complete set ofreferences pointing to N is re-written to the persistence manager",it seems that using references to a node that is very "popular" isreally going to be creating problems in the long term.

What could be the right way to model things? Maybe using a "path"property to point to the node instead? Of course, it would not be aseasy to use as a reference, and it would be requiring global updatesif the pointed node ever change position, but I can't see other options.


Any suggestions?

Alessandro Bologna


On Apr 26, 2007, at 2:38 PM, Jukka Zitting wrote:

Hi,

On 4/26/07, Stefan Kurla <[EMAIL PROTECTED]> wrote:

I would appreciate the thoughts on references though. Reason being
that one of the biggest strengths of JSR-170 is the ability to store
references. I imagine a situation where i could have a nodetype call
docType which is either pdf or word strings. Say 80% of my documents
are word documents. Then the docType will have a reference to 80% of

all documents in my repository. If my repository is 100,000 filesthen

docType references 80,000 nodes.

If what you say is correct that at every new reference, the complete
set of references are rewritten, then obviously this is a bottleneck.

Should such a situation be avoided?


Why would you need to use such references structure? I would rather
use the node types to model such information. A search query like
//element(*,my:wordDocument) will efficiently return you all such Word
documents in your workspace.

BR,

Jukka Zitting

Re: importing jackrabbit into jackrabbit

Reply via email to