Hi,
I recently extended the DatabasePersistenceManager to create a 
PostgresqlPersistenceManager that takes advantage of postgresql's 
LargeObject API.  This was done primarily by extending DbBLOBStore and 
overwriting the methods so that they store an oid instead of a bytea 
column for BINVAL_DATA and then using the LargeObject API to actually 
store the binary data based off of the oid.  My new postgresql persistence 
manager successfully removed the memory problems associated with using the 
bytea data type to store large blobs (I can now store any size file with 
default VM heap sizes).  Although this fixed the memory problems I was 
working on earlier, I noticed that performance still wasn't as fast as I 
would've hoped.  Stepping through the code, I realized that every time 
Serializer.serialize or Serializer.deserialize for PropertyState gets 
called, any binary data stored in a blob gets downloaded to a temporary 
file as the InternalValues are getting assembled.  For a large file, this 
may take 30 seconds.  Thus, simple operations like listing child nodes 
causes the entire blob to be downloaded from the postgresql database and 
written to a temporary file, which seems unnecessary since all I really 
care about are the names of the nodes, not the contents,  in most of these 
situations. For example, here is the code I use to completely clean out my 
repository starting at the root.  Unfortunately, this has the nasty side 
effect of creating temporary files for all of the blobs in the repository 
before the nodes are deleted:

                Node root = repoSession.getRootNode();
            Node jackrabbitRoot = 
root.getNode(JackrabbitResourceRepository.REPOSITORY_ROOT_NAME);
            for (Iterator childItr = jackrabbitRoot.getNodes(); 
childItr.hasNext();) {
                Node nodeToDelete = (Node) childItr.next();
                nodeToDelete.remove();
            }

 Is there any way to avoid downloading the blob data unnecessarily in 
situations like the one above?  I really only want to download the blob if 
a user asks for it.  Instead, it seems the blob is always getting 
downloaded so that Jackrabbit can create the BLOBFileValue for each blob 
in the DB.
Thanks,
Joe.

Reply via email to