2009/12/9  <[email protected]>:
>
> I have a requirement to archive several million documents with variable
> metadata (dates, types of document). [...] Documents will be retrieved by 
> their
> position in the tree, plus some filtering on the metadata.

Note that putting thousands of documents as children of a node is
discouraged. You can use a subtree of first letter approach, eg. by
document name, using node paths such as;

<prefix>/a/b/abstract.pdf

or simply use partial hashes of the file name;

<prefix>/2b/6f/abstract.pdf

> Couple million docs, terabyte of data, modest throughput, availability of
> upgrade path, easy backups.  From my reading/lurking, this sounds like a job
> for bundle persistence manager using H2 database.
>
> Have I missed something?  Does this sound like a workable plan?

Sure. But you might want to consider using a DataStore with the
persistence manager. Also, with jackrabbit 1.6 and up, it's easy
(although slow) to move from one repository config to another with the
migration tool, if you find you need to reconfigure.

-- 
-Tor

Reply via email to