I have been doing some performance investigation into a large Plone
site we have running. The site in question has approx 300,000 items of
content. Each piece of content is indexed by ZCatalog.
The main thing I was tracking down was the very large number of
objects being loaded by the ZODB, mostly IISet instances.
The large numebr of instances seems to be caused by a particular usage
pattern, in various indexes in the Catalog there are a number of
IITreeSet instances that are used to map, for instance, time ->
UID. As content items are added, you end up adding monotonically
increasing values to a set. The result of this is that you end up
'leaving behind' loads of buckets (or IISets in the case of an
IITreeSet) that are half full.
Looking at the BTrees code, I see there is a MAX_BUCKET_SIZE constant
that is set for the various BTree/Set types, and in the case of an
IISet it is set to 120. This means, when inserting into a IITreeSet,
when the IISet gets beyond 120 items it is split and a new IISet
created. Hence as above I see a lage number of 60 item IISets due to
the pattern in which these data structures are filled.
So, with up to 300,000 items in some of these IISets, it means to
iterate over the entire set (during a Catalog query) means loading
5,000 objects over ZEO from the ZODB, which adds up to quite a bit of
latency. With quite a number of these data structures about, means we
can end up with in the order of 50,000 object in the ZODB cache *just*
for these IISets!
So... has anyone tried increasing the size of MAX_BUCKET_SIZE in real
life? I understand that this will increase the potential for conflicts
if the bucket/set size is larger (however in reality this probably
can't get worse than it is, as currently as the value inserted is 99%
of the time greater than the current max value stored -- it is a
timestamp -- you always hit the last bucket/set in the tree).
I was going to experiment with increasing the MAX_BUCKET_SIZE on an IISet
from 120 to 1200. Doing a quick test, a pickle of an IISet of 60 items
is around 336 bytes, an of 600 items is 1580 bytes... so still very
much in the realms of a single disk read / network packet.
I'm not sure how the current MAX_BUCKET_SIZE values were determined,
but looks like they have been the same since the dawn of time, and I'm
guessing might be due a tune?
It looks like I can change that constant and recompile the BTree
package, and it will work fine with existing IISets and just take
effect on new sets created (ie clear and rebuild the catalog index).
Anyone played with this before or see any major flaws to my cunning plan?
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org