Simon Riggs wrote:
We need to think of the interaction with partitioning here. People will
ask whether we would recommend that individual partitions of a large
table should be larger/smaller than a particular size, to allow these
optimizations to kick in.
My thinking is that database designers would attempt to set partition
size larger than the sync scan limit, whatever it is. That means:
- they wouldn't want the limit to vary when cache increases, so we *do*
need a GUC to control the limit. My suggestion now would be
large_scan_threshold, since it effects both caching and synch scans.
They wouldn't? If you add more memory to your box, so that a table that
didn't fit in memory before does now fit, surely you want to switch your
strategy from "don't pollute the cache because it won't fit anyway" to
"let's keep it in cache, so the next scan won't do I/O".
The basic problem with partitions is that if you have a query like
"SELECT * FROM partitioned_table", so that you seq scan multiple
partitions, the size of each partition alone could be below the
threshold, whatever that is, but since you're scanning them all the net
result is the same as scanning one large table above the threshold. The
same could happen in any plan that does multiple seq scans. It could
even happen at the application level if the application submits multiple
statements like "SELECT * FROM table1", "SELECT * FROM table2" one after
One way to address that would be to manage the recycle buffer ring size
dynamically. When a backend gets a cache miss, the ring would shrink,
and when you get cache hits, it would grow. That way, if you do a seq
scan on a table that fits in cache repeatedly, that table would get more
buffers from the cache on each iteration until it's completely in cache.
But if the table or tables being scanned are too big to fit in cache,
the ring would stay small and not pollute the cache much.
I'm not going to implement that for now, I'm seeing some scary negative
feedback behavior with that, and it'd need a lot of testing anyway. I'm
thinking of just using shared_buffers as the limit. One could argue for
effective_cache_size as well.
- so there will be lots of partitions, so a hardcoded limit of 1000
would not be sufficient. A new GUC, or a link to an existing one, is
No matter how many partitions you have, each backend could still be
scanning only one of them at a time.
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at