On 6 May 2014 17:55, Andres Freund <and...@2ndquadrant.com> wrote: >> All this changes is the cost of >> IndexScans that would use more than 25% of shared_buffers worth of >> data. Hopefully not many of those in your workload. Changing the cost >> doesn't necessarily prevent index scans either. And if there are many >> of those in your workload AND you run more than one at same time, then >> the larger setting will work against you. So the benefit window for >> such a high setting is slim, at best. > > Why? There's many workloads where indexes are larger than shared buffers > but fit into the operating system's cache. And that's precisely what > effective_cache_size is about. > Especially on bigger machines shared_buffers can't be set big enough to > actually use all the machine's memory. It's not uncommon to have 4GB > shared buffers on a machine with 512GB RAM... It'd be absolutely > disastrous to set effective_cache_size to 1GB for an analytics workload.
In this case, a setting of effective_cache_size > (4 * shared_buffers) could be appropriate, as long as we are certain we have the memory. We don't have any stats on peak memory usage to be certain - although in that case its pretty clear. If we had stats on how effective the indexscan was at multiple-hitting earlier read blocks, we'd be able to autotune, but I accept that without that we do still need the parameter. >> I specifically picked 25% of shared_buffers because that is the point >> at which sequential scans become more efficient and use the cache more >> efficiently. If our cost models are correct, then switching away from >> index scans shouldn't hurt at all. > > More often than not indexes are smaller than the table size, so this > argument doesn't seem to make much sense. If we believe that 25% of shared_buffers worth of heap blocks would flush the cache doing a SeqScan, why should we allow 400% of shared_buffers worth of index blocks? In your example, that would be 1GB of heap blocks, or 16GB of index blocks. If our table is 100GB with a 32GB index, then yes, that is 1% of the heap and 50% of the index. But that doesn't matter, since I am discussing the point at which we prevent the cache being churned. Given your example we do not allow a SeqScan of a table larger than 1GB to flush cache, since we use BAS_BULKREAD. If we allow an indexscan plan that will touch 16GB of an index that will very clearly flush out our 4GB of shared_buffers, increasing time for later queries even if they only have to read from OS buffers back into shared_buffers. That will still show itself as a CPU spike, which is what people say they are seeing. I think I'm arguing myself towards using a BufferAccessStrategy of BAS_BULKREAD for large IndexScans, BitMapIndexScans and BitMapHeapScans. Yes, we can make plans assuming we can use OS cache, but we shouldn't be churning shared_buffers when we execute those plans. "large" here meaning the same thing as it does for SeqScans, which is a scan that seems likely to touch more than 25% of shared buffers. I'll work up a patch. Perhaps it would also be useful to consider using a sequential scan of the index relation for less selective BitmapIndexScans, just as we do very effectively during VACUUM. Maybe that is a better idea than bitmap indexes. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers