Simon Riggs wrote: > On Mon, 2006-09-11 at 06:20 -0700, Say42 wrote: >> That's what I want to do: >> 1. Replace not very useful indexCorrelation with indexClustering. > > An opinion such as "not very useful" isn't considered sufficient > explanation or justification for a change around here.
"Not sufficient for some types of data" would have been more fair. I speculate that an new additional stat of "average # of unique values for a column within a block" would go a long way to helping my worst queries. It's common here for queries to vastly overestimate the number of pages that would need to be read because postgresql's guess at the correlation being practically 0 despite the fact that the distinct values for any given column are closely packed on a few pages. Our biggest tables (180G or so) are mostly spatial data with columns like "City" "State" "Zip" "County" "Street" "School District", "Police Beat", "lat/long" etc; and we cluster the table on zip,street. Note that practically all the rows for any single value of any of the columns will lay in the same few blocks. However the calculated "correlation" being low because the total ordering of the other values doesn't match that of zip codes. This makes the optimizer vastly overestimate the cost of index scans because it guesses that most of the table will need to be read, even though in reality just a few pages are needed. If someone does look at the correlation calculations, I hope this type of data gets considered as well. I speculate that a new stat of "average # of unique values for a column within a block" could be useful here in addition to correlation. For most all my columns in my big table, this stat would be 1 or 2; which I think would be a useful hint that despite a low "correlation", the distinct values are indeed packed together in blocks. That way the optimizer can see that a smaller number of pages would need to be accessed than correlation alone would suggest. Does this make sense, or am I missing something. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org