Martijn van Oosterhout <[EMAIL PROTECTED]> writes: > Just a note: using a multidimensional histograms will work well for the > cases like (startdate,enddate) where the histogram will show a > clustering of values along the diagonal. But it will fail for the case > (zipcode,state) where one implies the other. Histogram-wise you're not > going to see any correlation at all
Huh? Sure you are. What the histogram will show is that there is only one state value per zipcode, and only a limited subset of zipcodes per state. The nonempty cells won't cluster along the "diagonal" but we don't particularly care about that. What we really want from this is to not think that WHERE zip = '80210' AND state = 'CA' is significantly more selective than just WHERE zip = '80210' A histogram is certainly capable of telling us that. Whether it's the most compact representation is another question of course --- in an example like this, only about 1/50th of the cells would contain nonzero counts ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers