ZEO has two modes for dealing with client blob data, shared, and non-
shared. In shared mode, a distributed file system is used to share a
blob directory with a ZEO server. This requires management of a
distributed file system, in addition to the ZEO protocol. Any caching
is provided by the distributed file system.
In non-shared mode, blob data are downloaded to the ZEO client using
the ZEO protocol. No distributed file-system is needed and blob files
are cached locally. Unfortunately, the current implementation provides
no facilities for managing the client cache. There are no provisions
in the ZEO client software for removing unused blob files and the blob
implementation makes almost no provision for blob file removal.
I'm working on refactoring ClientStorage's handling of non-shared blob
data. I'm implementing a mechanism for periodically cleaning out
files that haven't been accessed in a while. As part of this, I'm
going to radically change the layout of the ClientStorage's non-shared
Currently, the bushy layout, with deeply nested directories is used.
While I think this layout makes some sense on the server, I don't
think it makes much sense on the client. Cleaning up unused blob
files is complicated by the need to clean up directories too. I'm
going to go for a fairly flat layout. There will be a small number
(997) of directories and blob files will reside directly in these
directories. (The directory will be chosen by taking the remainder of
dividing an oid by 997.) It appears that modern operating systems can
handle large directories just fine. I've created directories with 1
million files on Linux/Ext, Mac OS X/HFS+, and Windows XP/NTFS and saw
no degredation in performance as the number of files in a directory
I plan to have ClientStorage use the file layout mentioned above. The
ClientStorage constructor will fail if an older layout is found. An
alternative is to just log a warning and ignore the existing
directories, as the new directories will have non-overlapping names.
I mention this both as a heads up and to see if anyone can point out a
problem with my approach. I have a feeling that no one is using non-
shared client blob directories for anything important yet, so I assume
the change won't have much effect.
Comments are welcome, but non necessary. :)
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org