Hanno Schlichting <hanno <at> hannosch.eu> writes:
> You are using queryplan in the site, right? The most typical catalog
> query for Plone consists of something like ('allowedRolesAndUsers',
> 'effectiveRange', 'path', 'sort_on'). Without queryplan you indeed
> load the entire tree (or trees inside allowedRolesAndUsers) for each
> of these indexes.
Yes we are using queryplan. Without it the site becomes pretty much
> With queryplan it knows from prior execution, that the set returned by
> the path index is the smallest. So it first calculates this. Then it
> uses this small set (usually 10-100 items per folder) to look inside
> the other indexes. It then only needs to do an intersection of the
> small path set with each of the trees. If the path set has less then
> 1000 items, it won't even use the normal intersection function from
> the BTrees module, but use the optimized Cython based version from
> queryplan, which essentially does a for-in loop over the path set.
> Depending on the size ratio between the sets this is up to 20 times
> faster with in-memory data, and even more so if it avoids database
> loads. In the worst case you would load buckets equal to length of the
> path set, usually you should load a lot less.
There still seem to be instances in which the entire set is loaded. This
could be an artifact of the fact I am clearing the ZODB cache before each
]test, which I think seems to be clearing the query plan. Speaking of
which I saw in the query plan code, some hook to load a pre-defined query
plan... but I can't see exactly how you supply this plan or in what format
it is. Do you use this feature?
> We have large Plone sites in the same range of multiple 100.000 items
> and with queryplan and blobs we can run them with ZODB cache sizes of
> less than 100.000 items and memory usage of 500mb per single-threaded
> Of course it would still be really good to optimize the underlying
> data structures, but queryplan should help make this less urgent.
Well, I think we are already at that point ;) There are also I think other
times in which the full set is loaded.
> > Ahh interesting, that is good to know. I've not actually checked the
> > conflict resolution code, but do bucket change conflicts actually get
> > resolved in some sane way, or does the transaction have to be
> > retried?
> Conflicts inside the same bucket can be resolved and you won't get to
> see any log message for them. If you get a ConflictError in the logs,
> it's one where the request is being retried.
Great. That was that I always thought, but just wanted to check. So in
that case, what does it mean if I see a conflict error for an IISet? Can
they not resolve conflicts internally?
> >> And imagine if you use zc.zlibstorage to compress records! :)
> > This is Plone 3, which is Zope 2.10.11, does zc.zlibstorage work on
> > that, or does it need newer ZODB?
> zc.zlibstorage needs a newer ZODB version. 3.10 and up to be exact.
> > Also, unless I can sort out that
> > large number of small pickles being loaded, I'd imagine this would
> > actually slow things down.
> The Data.fs would be smaller, making it more likely to fit into the OS
> disk cache. The overhead of uncompressing the data is small compared
> to the cost of a disk read instead of a memory read. But it's hard to
> say what exactly happens with the cache ratio in practice.
Yeah, if we could use it I certainly would :) I guess what I mean above is
that larger pickles would compress better, so lots of small pickles the
compression would be less effective.
For more information about ZODB, see the ZODB Wiki:
ZODB-Dev mailing list - ZODB-Dev@zope.org