This is yet another issue entirely. This is about estimating how much
io will be random io if we do an index order scan. Correlation is a
passable tool for this but we might be able to do better.
But it has nothing to do with the cross-column stats problem.
greg
On 17 Oct 2008, at 01:29 AM, Ron Mayer <[EMAIL PROTECTED]>
wrote:
Josh Berkus wrote:
Yes, or to phrase that another way: What kinds of queries are being
poorly optimized now and why?
Well, we have two different correlation problems. One is the
problem of dependant correlation, such as the 1.0 correlation of
ZIP and CITY fields as a common problem. This could in fact be
fixed, I believe, via a linear math calculation based on the
sampled level of correlation, assuming we have enough samples. And
it's really only an issue if the correlation is
0.5.
I'd note that this can be an issue even without 2 columns involved.
I've seen a number of tables where the data is loaded in batches
so similar-values from a batch tend to be packed into relatively few
pages.
Thinks a database for a retailer that nightly aggregates data from
each of many stores. Each incoming batch inserts the store's data
into tightly packed disk pages where most all rows on the page are for
that store. But those pages are interspersed with pages from other
stores.
I think I like the ideas Greg Stark had a couple years ago:
http://archives.postgresql.org/pgsql-hackers/2006-09/msg01040.php
"...sort the sampled values by value
and count up the average number of distinct blocks per value.... Or
perhaps we need a second histogram where the quantities are of
distinct pages rather than total records.... We might also need a
separate "average number of n-block spans per value"
since those seem to me to lead more directly to values like "blocks
that need to be read".
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers