On Nov 6, 2008, at 7:24 PM, Shane Hathaway wrote:
> Jim Fulton wrote:
>> I've posted a new proposal:
>> That addresses multi-database garbage collection and can also be
>> useful in other situations.
>> Comments are welcome. Absent objections, I may start working on this
>> fairly soon.
> I see where you're going with this. The "Sample (naive)"
> would be very expensive with large databases; do you have ideas on how
> it might be done more efficiently?
Sure. First, you don't need a good set. You can just remove good
oids from the starting set, which becomes the bad set. I'd store the
oids on disk as a oid->flag mapping, or maybe even as a set. An
advantage of making this external is that we can innovate on the
external gc independent of the zodb release, although, eventually,
we'd include a built-in gc tool.
Another bonus is that, in the presence of replication, the analysis
phase can be performed against a secondary storage, keeping load off
the primary until the final deletion step.
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org