Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

Andrew Gierth Fri, 05 Aug 2016 03:30:16 -0700

>>>>> "Andrew" == Andrew Gierth <and...@tao11.riddles.org.uk> writes:
>>>>> "Tom" == Tom Lane <t...@sss.pgh.pa.us> writes:


 Tom> What I did in the patch is to scale the formerly fixed "-1.0"
 Tom> stadistinct estimate to discount the fraction of nulls we found.

 Andrew> This seems quite dubious to me. stadistinct representing only
 Andrew> the non-null values seems to me to be substantially more useful
 Andrew> and less confusing; it should be up to consumers to take
 Andrew> stanullfrac into account (in general they already do) since in
 Andrew> many cases we explicitly do _not_ want to count nulls.

Hm. I am wrong about this, since it's the fact that consumers are taking
stanullfrac into account that makes the value wrong in the first place.
For example, if a million-row table has stanullfrac=0.9 and
stadistinct=-1, then get_variable_numdistinct is returning 1 million,
and (for example) var_eq_non_const divides 0.1 by that to give a
selectivity of 1 in 10 million, which is obviously wrong.

But I think the fix is still wrong, because it changes the meaning of
ALTER TABLE ... ALTER col SET (n_distinct=...)  in a non-useful way; it
is no longer possible to nail down a useful negative n_distinct value if
the null fraction of the column is variable. Would it not make more
sense to do the adjustment in get_variable_numdistinct, instead?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls

Reply via email to