sorry about the digest subject in the last email...
> Message: 4
> Date: Thu, 2 Jun 2005 19:28:34 +0200
> From: Dieter Maurer <[EMAIL PROTECTED]>
> Subject: [ZODB-Dev] [Proposal] Size controlled ZODB cache
> To: email@example.com
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=us-ascii
> I therefore propose the implementation of cache replacement policies
> based on the estimated size of its objects.
I'm all for it, rdflib (with the ZODB backend) stores RDF statements
(primarily) in a four dimensional IOBTree (the final dimension is an
IISet). Each dimension is an integer that references a separately
indexed RDF terms, which are unicode strings.
As an RDF store grows, it tends to grow a lot more references to
existing terms than to new ones, thus the sparse graph of integer
references in the btree grows faster and at a more linear pace than the
variably sized, although typically "human sized", terms.
In other words, I got a lotta integers. I've stuffed on the order of a
million RDF statements into some of my test stores. Depending on the
RDF each statement could insert from 1-4 integers and from 1-4 unicode
objects into the database. For "narrow" queries that don't activate a
lot of statements, it's really damn fast, hundreds of milliseconds. The
"broader" a query gets, the performance drops off pretty bad, going to
the point that activating the whole database, being the pathologically
worst case, took several seconds.
I have some performance experiences from this that'd I'd love to share
if I ever had the time, which I might soon, but for now I have no hard
numbers so I won't elaborate on the cache issue being the cause, but
increasing the cache size has a definite improvement in performance for
broader queries. I presume your proposal would help out a lot in this
regard by making the system smarter about how to handle lots of little
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org