> Robert > > I think the first > thing to do would be to try to come up with a reproducible test case > where clustering the tables improves performance. >
On that note, is there any standard way you guys do benchmarks? > Bruce > >I think CLUSTER is a win when you are looking up multiple rows in the same table, either using a non-unique index or a range search. What places do such lookups? >Having them all in adjacent pages would be a win --- single-row lookups are usually not. > Mostly the tables that track column level data. Typically you will want to grab rows for multiple columns for a given table at once so it would be helpful to have them be contiguous on disk. I could design a benchmark to display this by building a thousand tables one column at a time using 'alter add column' to scatter the catalog rows for the tables across many blocks. So they'll be a range with column 1 for each table and column 2 for each table and column three for each table. Then fill a couple data tables with a lot of data and set some noise makers to loop through them over and over with full table scans ... filling up cache with unrelated data and hopefully ageing out the cache of the pg_tables. Then do some benchmark index lookup queries to see the retrieval time before and after clustering the pg_ctalog tables to record a difference. If the criteria is "doesn't hurt anything and helps a little" I think this passes. Esp since clusters aren't maintained automatically so adding them has no negative impact on insert or update. It'd just be a nice thing to do if you know it can be done that doesn't harm anyone who doesn't know. -Simone Aiken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers