Re: [HACKERS] Sequential scans

Heikki Linnakangas Thu, 03 May 2007 01:57:46 -0700

Simon Riggs wrote:

We need to think of the interaction with partitioning here. People will
ask whether we would recommend that individual partitions of a large
table should be larger/smaller than a particular size, to allow these
optimizations to kick in.


My thinking is that database designers would attempt to set partition
size larger than the sync scan limit, whatever it is. That means:
- they wouldn't want the limit to vary when cache increases, so we *do*
need a GUC to control the limit. My suggestion now would be
large_scan_threshold, since it effects both caching and synch scans.

They wouldn't? If you add more memory to your box, so that a table thatdidn't fit in memory before does now fit, surely you want to switch yourstrategy from "don't pollute the cache because it won't fit anyway" to"let's keep it in cache, so the next scan won't do I/O".

The basic problem with partitions is that if you have a query like"SELECT * FROM partitioned_table", so that you seq scan multiplepartitions, the size of each partition alone could be below thethreshold, whatever that is, but since you're scanning them all the netresult is the same as scanning one large table above the threshold. Thesame could happen in any plan that does multiple seq scans. It couldeven happen at the application level if the application submits multiplestatements like "SELECT * FROM table1", "SELECT * FROM table2" one aftereach other.

One way to address that would be to manage the recycle buffer ring sizedynamically. When a backend gets a cache miss, the ring would shrink,and when you get cache hits, it would grow. That way, if you do a seqscan on a table that fits in cache repeatedly, that table would get morebuffers from the cache on each iteration until it's completely in cache.But if the table or tables being scanned are too big to fit in cache,the ring would stay small and not pollute the cache much.

I'm not going to implement that for now, I'm seeing some scary negativefeedback behavior with that, and it'd need a lot of testing anyway. I'mthinking of just using shared_buffers as the limit. One could argue foreffective_cache_size as well.

- so there will be lots of partitions, so a hardcoded limit of 1000
would not be sufficient. A new GUC, or a link to an existing one, is
probably required.

No matter how many partitions you have, each backend could still bescanning only one of them at a time.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Re: [HACKERS] Sequential scans

Reply via email to