2009/12/9 <[email protected]>: > > I have a requirement to archive several million documents with variable > metadata (dates, types of document). [...] Documents will be retrieved by > their > position in the tree, plus some filtering on the metadata.
Note that putting thousands of documents as children of a node is discouraged. You can use a subtree of first letter approach, eg. by document name, using node paths such as; <prefix>/a/b/abstract.pdf or simply use partial hashes of the file name; <prefix>/2b/6f/abstract.pdf > Couple million docs, terabyte of data, modest throughput, availability of > upgrade path, easy backups. From my reading/lurking, this sounds like a job > for bundle persistence manager using H2 database. > > Have I missed something? Does this sound like a workable plan? Sure. But you might want to consider using a DataStore with the persistence manager. Also, with jackrabbit 1.6 and up, it's easy (although slow) to move from one repository config to another with the migration tool, if you find you need to reconfigure. -- -Tor
