On Fri, Aug 12, 2011 at 4:36 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > On Fri, Aug 12, 2011 at 5:05 AM, Robert Haas <robertmh...@gmail.com> wrote: > >> On >> the other hand, the buffer manager has *no problem at all* trashing >> the buffer arena if we're faulting in pages for an index scan rather >> than a sequential scan. If you manage to get all of sample_data into >> memory (by running many copies of the above query in parallel, you can >> get each one to allocate its own ring buffer, and eventually pull in >> all the pages), and then run some query that probes an index which is >> too large to fit in shared_buffers, it cheerfully blows the whole >> sample_data table out without a second thought. Had you sequentially >> scanned a big table, of course, there would be some protection, but an >> index scan can stomp all over everything with complete impunity. > > That's a good observation and I think we should do this > > * Make an IndexScan use a ring buffer once it has used 32 blocks. The > vast majority won't do that, so we avoid overhead on the common path. > > * Make an BitmapIndexScan use a ring buffer when we know that the > index is larger than 32 blocks. (Ignore upper parts of tree for that > calc).
We'd need to think about what happens to the internal pages of the btree, the leaf pages, and then the heap pages from the underlying relation; those probably shouldn't all be treated the same. Also, I think the tricky part is figuring out when to apply the optimization in the first place. Once we decide we need a ring buffer, a very small one (like 32 blocks) is probably the way to go. But it will be a loser to apply the optimization to data sets that would otherwise have fit in shared_buffers. This is a classic case of the LRU/MRU problem. You want to evict buffers in LRU fashion until the working set gets larger than you can cache; and then you want to switch to MRU to avoid uselessly caching pages that you'll never manage to revisit before they're evicted. The point of algorithms like Clock-Pro is to try to have the system work that out on the fly, based on the actual workload, rather than using heuristics. I agree with you there's no convincing evidence that Clock-Pro would be better for us; I mostly thought it was interesting because it seems that the NetBSD and Linux guys find it interesting, and they're managing much larger caches than the ones we're dealing with. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers