Re: [ZODB-Dev] Major refactoring of the ZEO ClientStorage Blob Cache

2008-12-03 Thread Jim Fulton

On Dec 3, 2008, at 1:50 AM, Christian Theune wrote:

 Hi,

 On Tue, 2008-12-02 at 12:03 -0500, Jim Fulton wrote:
 ZEO has two modes for dealing with client blob data, shared, and non-
 shared.  In shared mode, a distributed file system is used to share a
 blob directory with a ZEO server.  This requires management of a
 distributed file system, in addition to the ZEO protocol.  Any  
 caching
 is provided by the distributed file system.

 In non-shared mode, blob data are downloaded to the ZEO client using
 the ZEO protocol.  No distributed file-system is needed and blob  
 files
 are cached locally. Unfortunately, the current implementation  
 provides
 no facilities for managing the client cache. There are no provisions
 in the ZEO client software for removing unused blob files and the  
 blob
 implementation makes almost no provision for blob file removal.

 I'm working on refactoring ClientStorage's handling of non-shared  
 blob
 data.  I'm implementing a mechanism for periodically cleaning out
 files that haven't been accessed in a while. As part of this, I'm
 going to radically change the layout of the ClientStorage's non- 
 shared
 blob directory.

 Currently, the bushy layout, with deeply nested directories is used.
 While I think this layout makes some sense on the server, I don't
 think it makes much sense on the client.  Cleaning up unused blob
 files is complicated by the need to clean up directories too.  I'm
 going to go for a fairly flat layout.  There will be a small number
 (997) of directories and blob files will reside directly in these
 directories.  (The directory will be chosen by taking the remainder  
 of
 dividing an oid by 997.)

 Any specific reason for this specific number?

It is prime, and ~1000 directories seems pretty manageable.  More  
importantly, I'm using a file lock per directory and I only allow one  
process/thread at a time to operate on a file in the directory.  I  
want a somewhat large number to try to avoid contention.


 It appears that modern operating systems can
 handle large directories just fine.  I've created directories with 1
 million files on Linux/Ext, Mac OS X/HFS+, and Windows XP/NTFS and  
 saw
 no degredation in performance as the number of files in a directory
 increased.

 FTR: The reason for introducing the bushy layout is due to  
 restrictions
 on the number of directory entries a directory can contain which  
 seem to
 be a different restriction than the number of file entries a directory
 can contain. At least on ext3 I can't create more than 65k directories
 in a directory while I still can create a lot more files in the same
 directory. Wikipedia has a generally good overview and comparison
 between file systems but doesn't cover the maximum number of directory
 entries per directory.

The ext limitation on the number of subdirectories arises from a limit  
on the number of links to an inode.  Each subdirectory has a ..  
entry which ads a link to the containing directory.   I don't know if  
there is a limit on the number of directory entries. If there is, it  
is quite large. (I did a test of adding 10 million files to a ext  
directory, although I got tired of waiting for it after it had gotten  
up to a bit over 6 million.)

 I plan to have ClientStorage use the file layout mentioned above.   
 The
 ClientStorage constructor will fail if an older layout is found. An
 alternative is to just log a warning and ignore the existing
 directories, as the new directories will have non-overlapping names.

 I mention this both as a heads up and to see if anyone can point  
 out a
 problem with my approach.  I have a feeling that no one is using non-
 shared client blob directories for anything important yet, so I  
 assume
 the change won't have much effect.

 I am. I'd prefer if you'd fail on the directory structure instead of
 mixing it with the new approach.

OK.

Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Major refactoring of the ZEO ClientStorage Blob Cache

2008-12-02 Thread Jim Fulton
ZEO has two modes for dealing with client blob data, shared, and non- 
shared.  In shared mode, a distributed file system is used to share a  
blob directory with a ZEO server.  This requires management of a  
distributed file system, in addition to the ZEO protocol.  Any caching  
is provided by the distributed file system.

In non-shared mode, blob data are downloaded to the ZEO client using  
the ZEO protocol.  No distributed file-system is needed and blob files  
are cached locally. Unfortunately, the current implementation provides  
no facilities for managing the client cache. There are no provisions  
in the ZEO client software for removing unused blob files and the blob  
implementation makes almost no provision for blob file removal.

I'm working on refactoring ClientStorage's handling of non-shared blob  
data.  I'm implementing a mechanism for periodically cleaning out  
files that haven't been accessed in a while. As part of this, I'm  
going to radically change the layout of the ClientStorage's non-shared  
blob directory.

Currently, the bushy layout, with deeply nested directories is used.  
While I think this layout makes some sense on the server, I don't  
think it makes much sense on the client.  Cleaning up unused blob  
files is complicated by the need to clean up directories too.  I'm  
going to go for a fairly flat layout.  There will be a small number  
(997) of directories and blob files will reside directly in these  
directories.  (The directory will be chosen by taking the remainder of  
dividing an oid by 997.)  It appears that modern operating systems can  
handle large directories just fine.  I've created directories with 1  
million files on Linux/Ext, Mac OS X/HFS+, and Windows XP/NTFS and saw  
no degredation in performance as the number of files in a directory  
increased.

I plan to have ClientStorage use the file layout mentioned above.  The  
ClientStorage constructor will fail if an older layout is found. An  
alternative is to just log a warning and ignore the existing  
directories, as the new directories will have non-overlapping names.

I mention this both as a heads up and to see if anyone can point out a  
problem with my approach.  I have a feeling that no one is using non- 
shared client blob directories for anything important yet, so I assume  
the change won't have much effect.

Comments are welcome, but non necessary. :)

Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev