> Instead of simply multiplying the ndistinct estimate with selecticity, > we instead use the formula for the expected number of distinct values > observed in 'k' rows when there are 'd' distinct values in the bin > > d * (1 - ((d - 1) / d)^k) > > This is 'with replacements' which seems appropriate for the use, and it > mostly assumes uniform distribution of the distinct values. So if the > distribution is not uniform (e.g. there are very frequent groups) this > may be less accurate than the current algorithm in some cases, giving > over-estimates. But that's probably better than OOM. > --- > src/backend/utils/adt/selfuncs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/backend/utils/adt/selfuncs.c > b/src/backend/utils/adt/selfuncs.c > index f8d39aa..6eceedf 100644 > --- a/src/backend/utils/adt/selfuncs.c > +++ b/src/backend/utils/adt/selfuncs.c > @@ -3466,7 +3466,7 @@ estimate_num_groups(PlannerInfo *root, List > *groupExprs, double input_rows, > /* > * Multiply by restriction selectivity. > */ > - reldistinct *= rel->rows / rel->tuples; > + reldistinct = reldistinct * (1 - powl((reldistinct - 1) > / reldistinct,rel->rows));
Why do you change "*=" style? I see no reason to change this. reldistinct *= 1 - powl((reldistinct - 1) / reldistinct, rel->rows); Looks better to me because it's shorter and cleaner. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers