hi cris,

I'm expecting that we will have about 10-15 nodes per "document" in most
cases, though some could have 35-50.
sounds good.

When you say adequate hierarchical structure, does this imply that we should
try to keep our tree "bushy"? Really, because we rely on the external search
engine for location, we only direct query on sequential ID at the database.
Should a partitioning strategy be used? If so, what sort of depth might we
aim for?
i see... i think it is important to mention that jackrabbit is not optimized for
long lists of child nodes currently, so i would recommend to stay away if
possible from more than a couple of hundred child nodes.
as a guidance for hierarchy i usually use something like:
"if i wouldn't do it in a filesystem, i don't do it in a content repository"
(assuming that i view a node as a file or folder)

so let's assume your sequential hex-id is something like "123abc" i would
recommend something like a partitioning for the node structure as follows:
/12/3a/bc which leaves you with 256 child nodes per node.

Also, what sort of persistence store did you use in these tests? I would
assume, among other things, that XML is a bad choice, for example :)
i would recommend to use a "bundle persistence manager".

I have been finding some evidence that people are using jackrabbit in these
situations successfully, but not a lot of information on how they are
handling this, backup, etc.
personally, i like to use the derby persistence manager with external
fs based blobs (standard setup). with this setup i do
"hot backups" by just backing up the full repository folder in the filesystem.

some people already have backup&restore facilities in their rdbms which
means that they can setup the persistence manager to store all the
information (blobs, workspace information, nodetypes, etc...) into
the rdbms and leverage their existing backup/restore infrastructure.

regards,
david

Reply via email to