Hi,
I recently extended the DatabasePersistenceManager to create a
PostgresqlPersistenceManager that takes advantage of postgresql's
LargeObject API. This was done primarily by extending DbBLOBStore and
overwriting the methods so that they store an oid instead of a bytea
column for BINVAL_DATA and then using the LargeObject API to actually
store the binary data based off of the oid. My new postgresql persistence
manager successfully removed the memory problems associated with using the
bytea data type to store large blobs (I can now store any size file with
default VM heap sizes). Although this fixed the memory problems I was
working on earlier, I noticed that performance still wasn't as fast as I
would've hoped. Stepping through the code, I realized that every time
Serializer.serialize or Serializer.deserialize for PropertyState gets
called, any binary data stored in a blob gets downloaded to a temporary
file as the InternalValues are getting assembled. For a large file, this
may take 30 seconds. Thus, simple operations like listing child nodes
causes the entire blob to be downloaded from the postgresql database and
written to a temporary file, which seems unnecessary since all I really
care about are the names of the nodes, not the contents, in most of these
situations. For example, here is the code I use to completely clean out my
repository starting at the root. Unfortunately, this has the nasty side
effect of creating temporary files for all of the blobs in the repository
before the nodes are deleted:
Node root = repoSession.getRootNode();
Node jackrabbitRoot =
root.getNode(JackrabbitResourceRepository.REPOSITORY_ROOT_NAME);
for (Iterator childItr = jackrabbitRoot.getNodes();
childItr.hasNext();) {
Node nodeToDelete = (Node) childItr.next();
nodeToDelete.remove();
}
Is there any way to avoid downloading the blob data unnecessarily in
situations like the one above? I really only want to download the blob if
a user asks for it. Instead, it seems the blob is always getting
downloaded so that Jackrabbit can create the BLOBFileValue for each blob
in the DB.
Thanks,
Joe.