On 6 May 2014 17:55, Andres Freund <and...@2ndquadrant.com> wrote:

>> All this changes is the cost of
>> IndexScans that would use more than 25% of shared_buffers worth of
>> data. Hopefully not many of those in your workload. Changing the cost
>> doesn't necessarily prevent index scans either. And if there are many
>> of those in your workload AND you run more than one at same time, then
>> the larger setting will work against you. So the benefit window for
>> such a high setting is slim, at best.
> Why? There's many workloads where indexes are larger than shared buffers
> but fit into the operating system's cache. And that's precisely what
> effective_cache_size is about.
> Especially on bigger machines shared_buffers can't be set big enough to
> actually use all the machine's memory. It's not uncommon to have 4GB
> shared buffers on a machine with 512GB RAM... It'd be absolutely
> disastrous to set effective_cache_size to 1GB for an analytics workload.

In this case, a setting of effective_cache_size > (4 * shared_buffers)
could be appropriate, as long as we are certain we have the memory.

We don't have any stats on peak memory usage to be certain - although
in that case its pretty clear.

If we had stats on how effective the indexscan was at multiple-hitting
earlier read blocks, we'd be able to autotune, but I accept that
without that we do still need the parameter.

>> I specifically picked 25% of shared_buffers because that is the point
>> at which sequential scans become more efficient and use the cache more
>> efficiently. If our cost models are correct, then switching away from
>> index scans shouldn't hurt at all.
> More often than not indexes are smaller than the table size, so this
> argument doesn't seem to make much sense.

If we believe that 25% of shared_buffers worth of heap blocks would
flush the cache doing a SeqScan, why should we allow 400% of
shared_buffers worth of index blocks? In your example, that would be
1GB of heap blocks, or 16GB of index blocks.  If our table is 100GB
with a 32GB index, then yes, that is 1% of the heap and 50% of the
index. But that doesn't matter, since I am discussing the point at
which we prevent the cache being churned. Given your example we do not
allow a SeqScan of a table larger than 1GB to flush cache, since we
use BAS_BULKREAD. If we allow an indexscan plan that will touch 16GB
of an index that will very clearly flush out our 4GB of
shared_buffers, increasing time for later queries even if they only
have to read from OS buffers back into shared_buffers. That will still
show itself as a CPU spike, which is what people say they are seeing.

I think I'm arguing myself towards using a BufferAccessStrategy of
BAS_BULKREAD for large IndexScans, BitMapIndexScans and
BitMapHeapScans. Yes, we can make plans assuming we can use OS cache,
but we shouldn't be churning shared_buffers when we execute those
plans. "large" here meaning the same thing as it does for SeqScans,
which is a scan that seems likely to touch more than 25% of shared
buffers. I'll work up a patch.

Perhaps it would also be useful to consider using a sequential scan of
the index relation for less selective BitmapIndexScans, just as we do
very effectively during VACUUM. Maybe that is a better idea than
bitmap indexes.

 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to