On Dec 3, 2008, at 1:50 AM, Christian Theune wrote: > Hi, > > On Tue, 2008-12-02 at 12:03 -0500, Jim Fulton wrote: >> ZEO has two modes for dealing with client blob data, shared, and non- >> shared. In shared mode, a distributed file system is used to share a >> blob directory with a ZEO server. This requires management of a >> distributed file system, in addition to the ZEO protocol. Any >> caching >> is provided by the distributed file system. >> >> In non-shared mode, blob data are downloaded to the ZEO client using >> the ZEO protocol. No distributed file-system is needed and blob >> files >> are cached locally. Unfortunately, the current implementation >> provides >> no facilities for managing the client cache. There are no provisions >> in the ZEO client software for removing unused blob files and the >> blob >> implementation makes almost no provision for blob file removal. >> >> I'm working on refactoring ClientStorage's handling of non-shared >> blob >> data. I'm implementing a mechanism for periodically cleaning out >> files that haven't been accessed in a while. As part of this, I'm >> going to radically change the layout of the ClientStorage's non- >> shared >> blob directory. >> >> Currently, the bushy layout, with deeply nested directories is used. >> While I think this layout makes some sense on the server, I don't >> think it makes much sense on the client. Cleaning up unused blob >> files is complicated by the need to clean up directories too. I'm >> going to go for a fairly flat layout. There will be a small number >> (997) of directories and blob files will reside directly in these >> directories. (The directory will be chosen by taking the remainder >> of >> dividing an oid by 997.) > > Any specific reason for this specific number?
It is prime, and ~1000 directories seems pretty manageable. More importantly, I'm using a file lock per directory and I only allow one process/thread at a time to operate on a file in the directory. I want a somewhat large number to try to avoid contention. >> It appears that modern operating systems can >> handle large directories just fine. I've created directories with 1 >> million files on Linux/Ext, Mac OS X/HFS+, and Windows XP/NTFS and >> saw >> no degredation in performance as the number of files in a directory >> increased. > > FTR: The reason for introducing the bushy layout is due to > restrictions > on the number of directory entries a directory can contain which > seem to > be a different restriction than the number of file entries a directory > can contain. At least on ext3 I can't create more than 65k directories > in a directory while I still can create a lot more files in the same > directory. Wikipedia has a generally good overview and comparison > between file systems but doesn't cover the maximum number of directory > entries per directory. The ext limitation on the number of subdirectories arises from a limit on the number of links to an inode. Each subdirectory has a ".." entry which ads a link to the containing directory. I don't know if there is a limit on the number of directory entries. If there is, it is quite large. (I did a test of adding 10 million files to a ext directory, although I got tired of waiting for it after it had gotten up to a bit over 6 million.) >> I plan to have ClientStorage use the file layout mentioned above. >> The >> ClientStorage constructor will fail if an older layout is found. An >> alternative is to just log a warning and ignore the existing >> directories, as the new directories will have non-overlapping names. >> >> I mention this both as a heads up and to see if anyone can point >> out a >> problem with my approach. I have a feeling that no one is using non- >> shared client blob directories for anything important yet, so I >> assume >> the change won't have much effect. > > I am. I'd prefer if you'd fail on the directory structure instead of > mixing it with the new approach. OK. Jim -- Jim Fulton Zope Corporation _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev