Re: table/index options | was: [HACKERS] COUNT(*) and index-only scans

2011-10-11 Thread Robert Haas
On Mon, Oct 10, 2011 at 3:16 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 2011/10/10 Robert Haas robertmh...@gmail.com:
 On Mon, Oct 10, 2011 at 1:36 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:

 That gives you an index-only scan; but without the WHERE clause it
 uses a seq scan.  I think it's mainly a matter of doing enough
 benchmarks to figure out how best to model the costs of the index
 scan so that it can be picked for that case.

 Right now, our costing model for index-only scans is pretty dumb.  It
 assumes that using an index-only scan will avoid 10% of the heap
 fetches.  That could easily be low, and on an insert-only table or one
 where only the recently-updated rows are routinely accessed, it could
 also be high.  To use an index-only scan for a full-table COUNT(*),
 we're going to have to be significantly smarter, because odds are good
 that skipping 10% of the heap fetches won't be sufficient inducement
 to the planner to go that route; we are going to need a real number.

 I have to raise that I think we are going to face the exact same issue
 with the visibility_fraction that we face with the hack to set
 random_page_cost very low to help optimizer (when index/table is
 mostly in cache).

 4 options have been viewed so far:
 1. pg_class (initial proposal to store the cache estimates)
 2. pg_class_nt (revived by Alvaro IIRC)
 3. reloption
 4. GUC (by Tom for visibility_fraction)

 I am in favor of 1 or 2, 4 is a backup option, and 3 an open door to
 planner hint (others also let DBA use its knowledge if he wants, but 3
 make it mandatory for the DBA to decide, and no automatic way can be
 used to update it, except if someone make ALTER TABLE lock free)

 (It does not prevent a cost_indexonly() to be written meawhile...)

 What do you think/prefer/suggest ?

Well, I think a GUC is kind of useless, because you're going to want
to make this per-table.

As to the rest, I think they're all going to have the same problems -
or non-problems - with ALTER TABLE locking the full table.  If that's
a show-stopper, we should try to fix it.  But how to do that is a
topic for another thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


table/index options | was: [HACKERS] COUNT(*) and index-only scans

2011-10-10 Thread Cédric Villemain
2011/10/10 Robert Haas robertmh...@gmail.com:
 On Mon, Oct 10, 2011 at 1:36 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:

 That gives you an index-only scan; but without the WHERE clause it
 uses a seq scan.  I think it's mainly a matter of doing enough
 benchmarks to figure out how best to model the costs of the index
 scan so that it can be picked for that case.

 Right now, our costing model for index-only scans is pretty dumb.  It
 assumes that using an index-only scan will avoid 10% of the heap
 fetches.  That could easily be low, and on an insert-only table or one
 where only the recently-updated rows are routinely accessed, it could
 also be high.  To use an index-only scan for a full-table COUNT(*),
 we're going to have to be significantly smarter, because odds are good
 that skipping 10% of the heap fetches won't be sufficient inducement
 to the planner to go that route; we are going to need a real number.

I have to raise that I think we are going to face the exact same issue
with the visibility_fraction that we face with the hack to set
random_page_cost very low to help optimizer (when index/table is
mostly in cache).

4 options have been viewed so far:
1. pg_class (initial proposal to store the cache estimates)
2. pg_class_nt (revived by Alvaro IIRC)
3. reloption
4. GUC (by Tom for visibility_fraction)

I am in favor of 1 or 2, 4 is a backup option, and 3 an open door to
planner hint (others also let DBA use its knowledge if he wants, but 3
make it mandatory for the DBA to decide, and no automatic way can be
used to update it, except if someone make ALTER TABLE lock free)

(It does not prevent a cost_indexonly() to be written meawhile...)

What do you think/prefer/suggest ?


-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers