On Dec 3, 2008, at 1:50 AM, Christian Theune wrote:
Hi,
On Tue, 2008-12-02 at 12:03 -0500, Jim Fulton wrote:
ZEO has two modes for dealing with client blob data, shared, and non-
shared. In shared mode, a distributed file system is used to share a
blob directory with a ZEO server. This requires management of a
distributed file system, in addition to the ZEO protocol. Any
caching
is provided by the distributed file system.
In non-shared mode, blob data are downloaded to the ZEO client using
the ZEO protocol. No distributed file-system is needed and blob
files
are cached locally. Unfortunately, the current implementation
provides
no facilities for managing the client cache. There are no provisions
in the ZEO client software for removing unused blob files and the
blob
implementation makes almost no provision for blob file removal.
I'm working on refactoring ClientStorage's handling of non-shared
blob
data. I'm implementing a mechanism for periodically cleaning out
files that haven't been accessed in a while. As part of this, I'm
going to radically change the layout of the ClientStorage's non-
shared
blob directory.
Currently, the bushy layout, with deeply nested directories is used.
While I think this layout makes some sense on the server, I don't
think it makes much sense on the client. Cleaning up unused blob
files is complicated by the need to clean up directories too. I'm
going to go for a fairly flat layout. There will be a small number
(997) of directories and blob files will reside directly in these
directories. (The directory will be chosen by taking the remainder
of
dividing an oid by 997.)
Any specific reason for this specific number?
It is prime, and ~1000 directories seems pretty manageable. More
importantly, I'm using a file lock per directory and I only allow one
process/thread at a time to operate on a file in the directory. I
want a somewhat large number to try to avoid contention.
It appears that modern operating systems can
handle large directories just fine. I've created directories with 1
million files on Linux/Ext, Mac OS X/HFS+, and Windows XP/NTFS and
saw
no degredation in performance as the number of files in a directory
increased.
FTR: The reason for introducing the bushy layout is due to
restrictions
on the number of directory entries a directory can contain which
seem to
be a different restriction than the number of file entries a directory
can contain. At least on ext3 I can't create more than 65k directories
in a directory while I still can create a lot more files in the same
directory. Wikipedia has a generally good overview and comparison
between file systems but doesn't cover the maximum number of directory
entries per directory.
The ext limitation on the number of subdirectories arises from a limit
on the number of links to an inode. Each subdirectory has a ..
entry which ads a link to the containing directory. I don't know if
there is a limit on the number of directory entries. If there is, it
is quite large. (I did a test of adding 10 million files to a ext
directory, although I got tired of waiting for it after it had gotten
up to a bit over 6 million.)
I plan to have ClientStorage use the file layout mentioned above.
The
ClientStorage constructor will fail if an older layout is found. An
alternative is to just log a warning and ignore the existing
directories, as the new directories will have non-overlapping names.
I mention this both as a heads up and to see if anyone can point
out a
problem with my approach. I have a feeling that no one is using non-
shared client blob directories for anything important yet, so I
assume
the change won't have much effect.
I am. I'd prefer if you'd fail on the directory structure instead of
mixing it with the new approach.
OK.
Jim
--
Jim Fulton
Zope Corporation
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/
ZODB-Dev mailing list - ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev