hi cris,

So some form of hashing sounds desirable. If the hashed nodes were
mapped to a separate node structure / workspace that did not hash to
create a logical view, would the performance impact still apply?
hashing seems like a good approach to get an even distribution.
if you have a linear id to go by that would work even better.

generally usually i try to use a "logical" hierarchy that makes sense to the
user and most importantly with respect to access control and other
hierarchical operations like locking or versioning.

so grouping by "department" or similar would generally be my recommendation
so all the hierarchy operations make the most sense. in many db applications
hierarchies are not used and therefore it is more of a thought-process that
requires more knowledge about the specific application to come up with
a useful hierarchy.

You mentioned some testing... it would be great
for us to do similar testing with mock data reflecting our
environment. If the tests you performed included any special
harnesses, configuration, etc, would it be possible to see them?
actually, we use a combination of very application specific stress testing
and some very simple tests. even the simple tests though are already
tied into our framework, but i think it would definitely be a very good
contribution to jackrabbit.

one of our most simple tests to compare persistence manager configurations
is to just load files (nt:file, nt:resource) into the repository in batches of
1000 files per folder (nt:folder). so our structure looks like this:

/testroot/0-999999/0-999/0.txt
/testroot/0-999999/0-999/1.txt
/testroot/0-999999/0-999/2.txt
...
/testroot/0-999999/1000-1999/1000.txt
/testroot/0-999999/1000-1999/1001.txt
/testroot/0-999999/1000-1999/1002.txt
...
/testroot/1000000-1999999/1000000-1000999/1000000.txt

regards,
david

Reply via email to