Re: [HACKERS] Our CLUSTER implementation is pessimal

Heikki Linnakangas Mon, 01 Sep 2008 00:42:43 -0700

Martijn van Oosterhout wrote:

On Mon, Sep 01, 2008 at 12:25:26AM +0100, Gregory Stark wrote:

The problem is that it does a full index scan and looks up each tuple in the
order of the index. That means it a) is doing a lot of random i/o and b) has
to access the same pages over and over again.


<snip>

a) We need some way to decide *when* to do a sort and when to do an index
scan. The planner has all this machinery but we don't really have all the
pieces handy to use it in a utility statement. This is especially important
for the case where we're doing a cluster operation on a table that's already
clustered. In that case an index scan could conceivably actually win (though I
kind of doubt it). I don't really have a solution for this.


The case I had recently was a table that was hugely bloated. 300MB data
and only 110 live rows. A cluster was instant, a seqscan/sort would
probably be much slower. A VACUUM FULL probably worse :)

Isn't there some compromise. Like say scanning the index to collect a
few thousand records and then sort them the way a bitmap index scan
does. Should be much more efficient that what we have now.

Ideally we would use the planner, and the planner would choose the bestplan for a bloated table like that (it probably does, I'm not sure) as well.

However, I'm not sure how much we can trust the statistics for a tablewe're about to CLUSTER. Often you run CLUSTER on a table you've justloaded or mass-updated.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our CLUSTER implementation is pessimal

Reply via email to