> Instead of simply multiplying the ndistinct estimate with selecticity,
> we instead use the formula for the expected number of distinct values
> observed in 'k' rows when there are 'd' distinct values in the bin
> 
>     d * (1 - ((d - 1) / d)^k)
> 
> This is 'with replacements' which seems appropriate for the use, and it
> mostly assumes uniform distribution of the distinct values. So if the
> distribution is not uniform (e.g. there are very frequent groups) this
> may be less accurate than the current algorithm in some cases, giving
> over-estimates. But that's probably better than OOM.
> ---
>  src/backend/utils/adt/selfuncs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/backend/utils/adt/selfuncs.c 
> b/src/backend/utils/adt/selfuncs.c
> index f8d39aa..6eceedf 100644
> --- a/src/backend/utils/adt/selfuncs.c
> +++ b/src/backend/utils/adt/selfuncs.c
> @@ -3466,7 +3466,7 @@ estimate_num_groups(PlannerInfo *root, List 
> *groupExprs, double input_rows,
>                       /*
>                        * Multiply by restriction selectivity.
>                        */
> -                     reldistinct *= rel->rows / rel->tuples;
> +                     reldistinct = reldistinct * (1 - powl((reldistinct - 1) 
> / reldistinct,rel->rows));

Why do you change "*=" style? I see no reason to change this.

                        reldistinct *= 1 - powl((reldistinct - 1) / 
reldistinct, rel->rows);

Looks better to me because it's shorter and cleaner.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to