[Dylan Jay] > ... > Like generational garbage collection, which is I guess is one way to take > into account of frequency of use. When an object is used more than X > times it jumps to the next generation. In the next generation it is > checked less often making it more efficient than just checking the > frequency + LRU of all objects to see who should go. But your're right > that my suggestion is to use frequency.
Then I'd rather discuss the latter than a specific implementation. There are other things that "should" go into cache decisions too, like the number of bytes occupied by an object. A realistic cost function is complicated; for example, booting a large object may be attractive because getting rid of it can make room for many smaller objects, but booting a large object may also be unattractive because it costs more to (re)fetch a large number of bytes on a subsequent cache miss. And regardless of size, it's potentially much more expensive to boot an object from the ZODB cache if it's not in the ZEO cache than if it is. Stuff like that. Alas, optimal cache replacement in the face of most real-life complications is known to be in NP even with perfect foreknowledge of future accesses. It's still necessary to define what the goal (cost function) is, though. For example, you mentioned spiders originally, and Chris suggested a practical way to address that (albeit on the Zope side). >> ... But that's a more complicated cache design than pure LRU, so who >> knows if it would be a _net_ win. > It may not but it seems worth thinking about. It certainly seems that > there could be lots of 'temporary' loads vs frequently used objects. I > guess this also overlaps with another recent email talking about > releasing objects before the end of the transaction due to loads which > are 'temporary'. Yup, it overlaps with a lot of things. In the end it needs code and comparative measurement against "typical" configurations and workloads. That's hard. >> Buying more RAM is an idea that just never gets old <wink>. > Hearing that suggestion does get very old. "<wink>" meant I was joking. > When running your own machine buying ram is very cheap. When using > hosted zope instances it is not so cheap. If zope use memory > ineffectively and it is possible to make it more effective within a > reasonable effort then I don't see that as something that should be > dismissed. Neither do I, but we're lacking a demonstration that such an opportunity is being missed here. Cache behavior on real workloads is really (really) complicated, and I don't trust any "head arguments" as a result. It's especially complicated for ZODB because serious Zope installations run ZEO too, and a ZODB cache miss can go on to become a ZEO cache hit or ZEO cache miss, with vastly different costs on the ZEO end. Jeremy Hylton summarized what we learned about trying a large number of alternatives for the ZEO cache (based on as many "real life" ZEO cache traces as we were able to get at the time) here: http://www.python.org/~jeremy/weblog/031209b.html There's a list of references here: http://www.python.org/~jeremy/weblog/031126a.html One striking thing is that what authors said about their algorithms' behavior, based on their workloads, rarely matched what we observed on our workloads. That's common when using inexpensive heuristics aiming at NP-class problems, and worse here because ZEO's cache is really a second-level cache. I don't expect it to become vastly easier when modeling ZODB's first-level cache, though. The current ZODB LRU algorithm is a vast improvement over what came before it, and does appear to hit a sweet spot. Dieter seems convinced that we're missing an efficiently addressable win in booting "very large" objects out of ZODB's cache first. If that gets somewhere it will be because he backs it with code and real-life measurement. If I had time (I don't ...), I'd like to move toward a more general cost-function based approach, informed by ZODB cache traces taken from real applications. We don't have any ZODB cache traces (neither code to create them) now. And I'd rather measure than count votes <0.9 wink>. _______________________________________________ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev