Re: [PERFORM] How to determine whether to VACUUM or CLUSTER

2005-06-20 Thread Jim C. Nasby
On Wed, Jun 15, 2005 at 11:34:18AM -0400, Ken Shaw wrote:
 Hi All,
 
 I have an app that updates a PostgreSQL db in a batch fashion.  After
 each batch (or several batches), it issues VACUUM and ANALYZE calls on
 the updated tables.  Now I want to cluster some tables for better
 performance.  I understand that doing a VACUUM and a CLUSTER on a table
 is wasteful as the CLUSTER makes the VACUUM superfluous.  The app does
 not have a built-in list of the tables and whether each is clustered or
 not.  It looks to me as if the only way to determine whether to issue a
 VACUUM (on a non-clustered table) or a CLUSTER (on a clustered table) is
 to query the table pg_index, much like view pg_indexes does, for the
 column indisclustered.  Is this right?

I don't think that's what you want. 'indisclustered' only indicates if
the last time the table was clustered was on that index. The best thing
that comes to mind is looking at the correlation of the first field in
the index for the table. You'll find this info in pg_stats.

 Also, how expensive is CLUSTER compared to VACUUM?  Does CLUSTER read in
 the whole table, sort it, and write it back out?  Or write out a
 completely new file?  Is the time for a CLUSTER the same whether one row
 is out of place or the table is completely disordered?

AFAIK, cluster completely re-creates the table from scratch, then
rebuilds all the indexes. It's basically the most expensive operation
you can perform on a table. There probably will be some increased
performance from the sort if the table is already mostly in the right
order though.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] How to determine whether to VACUUM or CLUSTER

2005-06-16 Thread Tom Lane
ken shaw [EMAIL PROTECTED] writes:
 It looks to me as if the only way to determine whether to issue a
 VACUUM (on a non-clustered table) or a CLUSTER (on a clustered table)
 is to query the table pg_index, much like view pg_indexes does,
 for the column indisclustered. Is this right?

indisclustered is certainly the ground truth here, and [ ... digs around
in the source code ... ] it doesn't look like there are any views that
present the information in a different fashion.  So yup, that's what
you gotta do.

 Also, how expensive is CLUSTER compared to VACUUM?

Well, it's definitely expensive compared to plain VACUUM, but compared
to VACUUM FULL the case is not clear-cut.  I would say that if you had
a seriously bloated table (where VACUUM FULL would have to move all or
most of the live tuples in order to compact the table completely) then
CLUSTER will be faster --- not to mention any possible future benefits
from having the table more or less in order with respect to the index.

As near as I can tell, VACUUM FULL was designed to work nicely when you
had maybe 10%-25% free space in the table and you want it all compacted
out.  In a scenario where it has to move all the tuples it is certainly
not faster than CLUSTER; plus the end result is much worse as far as the
state of the indexes goes, because VACUUM FULL does *nothing* for
compacting indexes.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend