Re: [HACKERS] Correlation in cost_index()

Zeugswetter Andreas SB SD Thu, 14 Aug 2003 18:14:00 -0700

> In both cases ANALYZE will calculate correlation 1.0 for column X,
> and something near zero for column Y.  We would like to come out with
> index correlation 1.0 for the left-hand case and something much less
> (but, perhaps, not zero) for the right-hand case.  I don't really see
> a way to do this without actually examining the multi-column ordering
> relationship during ANALYZE.


The only way the second column correlation will be irrelevant is if
the first column is already (nearly) unique (enough so, that the second
column wont scatter fetches enough to fill the buffer before seeing cache hits).
Thus I think when merging correlations you could take nunique into account.

corr = corr_1 * (corr_2 * ( 1 - nunique_1 / nrows))
 
But, I think one (new) correlation metric for the whole index (whole key) and the 
data pages would actually be sufficient. This metric could imho always be used 
instead of the per column correlations to calculate index cost. This holds true 
as long as you walk an index range, and that is what it is all about, no ?

???
Andreas

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Correlation in cost_index()

Reply via email to