Jim Fulton <jim <at> zope.com> writes:
> On Wed, Jan 26, 2011 at 3:15 PM, Matt Hamilton <matth <at>
> netsight.co.uk> wrote:
> > So, with up to 300,000 items in some of these IISets, it means to
> > iterate over the entire set (during a Catalog query) means loading
> > 5,000 objects over ZEO from the ZODB, which adds up to quite a bit
> > of
> > latency. With quite a number of these data structures about, means
> > we
> > can end up with in the order of 50,000 object in the ZODB cache
> > *just*
> > for these IISets!
> Hopefully, you're not iterating over the entire tree, but still. :)
Alas we are. Or rather, alas, ZCatalog does ;) It would be great if it
didn't but it's just the way it is. If I have 300,000 items in my
site, and everyone of them visible to someone with the 'Reader'
role, then the allowedRolesAndUsers index will have an IITreeSet
with 300,000 elements in it. Yes, we could try and optimize out that
specific case, but there are others like that too. If all of my
items have no effective or expires date, then the same happens with
the effective range index (DateRangeIndex 'always' set).
> > So... has anyone tried increasing the size of MAX_BUCKET_SIZE in
> > life?
> We have, mainly to reduce the number of conflicts.
> > I understand that this will increase the potential for conflicts
> > if the bucket/set size is larger (however in reality this probably
> > can't get worse than it is, as currently as the value inserted is
> > of the time greater than the current max value stored -- it is a
> > timestamp -- you always hit the last bucket/set in the tree).
> Actually, it reduces the number of unresolveable conflicts.
> Most conflicting bucket changes can be resolved, but bucket
> splits can't be and bigger buckets means fewer splits.
> The main tradeoff is record size.
Ahh interesting, that is good to know. I've not actually checked the
conflict resolution code, but do bucket change conflicts actually get
resolved in some sane way, or does the transaction have to be
Actually... that is a good point, and something I never thought
of... when you get a Conflict Error in the logs (that was
resolved) does that mean that _p_resolveConflict was called and
successful, or does it mean that the transactions were retried
and that resolved the conflict?
> > I was going to experiment with increasing the MAX_BUCKET_SIZE on
> > from 120 to 1200. Doing a quick test, a pickle of an IISet of 60
> > is around 336 bytes, an of 600 items is 1580 bytes... so still
> > much in the realms of a single disk read / network packet.
> And imagine if you use zc.zlibstorage to compress records! :)
This is Plone 3, which is Zope 2.10.11, does zc.zlibstorage work on
that, or does it need newer ZODB? Also, unless I can sort out that
large number of small pickles being loaded, I'd imagine this would
actually slow things down.
> > I'm not sure how the current MAX_BUCKET_SIZE values were
> > but looks like they have been the same since the dawn of time, and
> > guessing might be due a tune?
> > It looks like I can change that constant and recompile the BTree
> > package, and it will work fine with existing IISets and just take
> > effect on new sets created (ie clear and rebuild the catalog
> > Anyone played with this before or see any major flaws to my
> We have. My long term goal is to arrange things so that you can
> specify/change limits by sub-classing the BTree classes.
> Unfortunately, that's been a long-term priority for too long.
> This could be a great narrow project for someone who's willing
> to grok the Python C APIs.
I remember you introduced me to the C API for things like this waaaay
back in Reading at the first non US Zope 3 sprint... I was trying to
create compressed list data structures for catalogs.... I never could
quite get rid of the memory leaks I was getting! ;) Maybe I'll be
brave and take another look.
> Changing the default sizes for the II ad LL BTrees is pretty
> We were more interested in LO (and similar) BTrees. For those,
> it's much harder to guess sizes because you don't know generally
> how big the objects will be, which is why I'd like to make it
tunable at the
> application level.
Yeah, I guess that is the issue. I wonder if it would be easy for the
code to work out the total size of the bucket in bytes and then
split based upon that. Or something like 120 items, or 500kB,
whichever comes first.
Just looking at the cache on the site at the moment, and we have a
978,355 objects in cache, of which:
So 83% of my cache is just those four object types.
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org