>>>>> "Andrew" == Andrew Gierth <and...@tao11.riddles.org.uk> writes: >>>>> "Tom" == Tom Lane <t...@sss.pgh.pa.us> writes:
Tom> What I did in the patch is to scale the formerly fixed "-1.0" Tom> stadistinct estimate to discount the fraction of nulls we found. Andrew> This seems quite dubious to me. stadistinct representing only Andrew> the non-null values seems to me to be substantially more useful Andrew> and less confusing; it should be up to consumers to take Andrew> stanullfrac into account (in general they already do) since in Andrew> many cases we explicitly do _not_ want to count nulls. Hm. I am wrong about this, since it's the fact that consumers are taking stanullfrac into account that makes the value wrong in the first place. For example, if a million-row table has stanullfrac=0.9 and stadistinct=-1, then get_variable_numdistinct is returning 1 million, and (for example) var_eq_non_const divides 0.1 by that to give a selectivity of 1 in 10 million, which is obviously wrong. But I think the fix is still wrong, because it changes the meaning of ALTER TABLE ... ALTER col SET (n_distinct=...) in a non-useful way; it is no longer possible to nail down a useful negative n_distinct value if the null fraction of the column is variable. Would it not make more sense to do the adjustment in get_variable_numdistinct, instead? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers