Re: [HACKERS] multivariate statistics (v25)

Tomas Vondra Wed, 05 Apr 2017 02:42:15 -0700


On 04/05/2017 08:41 AM, Sven R. Kunze wrote:

Thanks Tomas and David for hacking on this patch.

On 04.04.2017 20:19, Tomas Vondra wrote:
I'm not sure we still need the min_group_size, when evaluatingdependencies. It was meant to deal with 'noisy' data, but I think itafter switching to the 'degree' it might actually be a bad idea.
Consider this:

    create table t (a int, b int);
    insert into t select 1, 1 from generate_series(1, 10000) s(i);
    insert into t select i, i from generate_series(2, 20000) s(i);
    create statistics s with (dependencies) on (a,b) from t;
    analyze t;

    select stadependencies from pg_statistic_ext ;
                  stadependencies
    --------------------------------------------
     [{1 => 2 : 0.333344}, {2 => 1 : 0.333344}]
    (1 row)
So the degree of the dependency is just ~0.333 although it's obviouslya perfect dependency, i.e. a knowledge of 'a' determines 'b'. Thereason is that we discard 2/3 of rows, because those groups are only asingle row each, except for the one large group (1/3 of rows).
Just for me to follow the comments better. Is "dependency" roughly thesame as when statisticians speak about " conditional probability"?


No, it's more 'functional dependency' from relational normal forms.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v25)

Reply via email to