On Thursday 14 April 2005 20:23, Tim Peters wrote:

> The size of the objects in the database has little to do with memory
> consumed by a FileStorage pack; it's more the number of distinct object
> revisions at work, since an in-memory object reachability graph is
> constructed.  I'm not sure how DirectoryStorage could perform packing
> without constructing a similar reachability graph (Toby?).

Both storages *traverse* the object reachability graph, keeping a record of 
which oids are reachable. They both keep a traversal to-do list in memory, 
which is sized proportional to the height of the reachability graph.

They differ in how they record which oids are reachable. FileStorage uses an 
fsIndex instance, which stores everything in memory (in a memory-efficient 
manner). The default implementation in DirectoryStorages uses a bit in the 
file permissions to mark reached objects. The I/O cost of this is the main 
reason for DirectoryStorage's relative slowness in packing.

There is an alternative implementation in DirectoryStorage which creates a 
second temporary ZODB to hold an OIBTree to store the list of reachable 
objects. This also has a fixed memory cost and performs better than the 
standard permissions bit implementation. One big disadvantage last time I 
looked was memory leaks when creating and destroying ZODB.DB objects - but I 
think Tim and Jeremy have since addressed that.

> The last time Jeremy and I watched a pack work on a 20GB Data.fs, on a very
> slow Solaris box, we noticed that it was only taking 10-20% of the RAM, and
> regretted the then-last round of packing changes, which favored reducing RAM
> usage at the cost of increasing runtime.  That appears to have been a wrong
> tradeoff for most modern boxes.

Interesting. DirectoryStorage can use an all-in-memory implementation too. 
Anyone with a big storage fancy trying it?

> Toby, I know (or think I know <wink>) that DirectoryStorage won't commit a
> transaction containing dangling references.  I think that's great, and I'd
> like (if possible) to introduce such a check at a higher level, so that all
> storages would benefit. 

There are races in this dangling reference detection. I guess thats OK since 
it is only there to warn about a bug in a higher layer.

> Does DirectoryStorage do something beyond that 
> check specifically aimed at preventing POSKeyErrors?  

There are numerous corner cases that can lead to objects incorrectly appearing 
to be unreachable during packing. I describe one here:
DirectoryStorage takes two precautions to reduce the chances of being bitten 
by this class of problem:

a. Ensuring that the pack threshold time leaves sufficient margin of safety. 
storage.pack(one day ago) is fine.
storage.pack(zero days ago) is silently converted to
storage.pack(10 minutes ago)

b. Both storages keep all objects that are reachable from a sufficiently 
recent version of the root object. DirectoryStorage will also keep objects 
that have been modified in any sufficiently recent transaction even if they 
do not appear to be reachable. (this set in almost always empty, unless we 
have hit a corner case. Objects almost always have to be reachable in order 
to get modified)

Toby Dickenson
For more information about ZODB, see the ZODB Wiki:

ZODB-Dev mailing list  -  ZODB-Dev@zope.org

Reply via email to