On Tue, Aug 13, 2019 at 8:46 AM Thomas Kellerer <spam_ea...@gmx.net> wrote:

> I stumbled across this question on SO:
> https://stackoverflow.com/questions/56517852
>
> Disregarding the part about Postgres 9.3, the example for Postgres 11
> looks a bit confusing.
>
> There is a script to setup test data in that question:
>
> ==== start of script ====
>
>     create table foo (
>         foo_id integer not null,
>         foo_name varchar(10),
>         constraint foo_pkey primary key (foo_id)
>     );
>
>     insert into foo
>       (foo_id, foo_name)
>     values
>       (1, 'eeny'),
>       (2, 'meeny'),
>       (3, 'miny'),
>       (4, 'moe'),
>       (5, 'tiger'),
>       (6, 'toe');
>
>     create table foo_bar_baz (
>         foo_id integer not null,
>         bar_id integer not null,
>         baz    integer not null,
>         constraint foo_bar_baz_pkey primary key (foo_id, bar_id, baz),
>         constraint foo_bar_baz_fkey1 foreign key (foo_id)
>             references foo (foo_id)
>     ) partition by range (foo_id)
>     ;
>
>     create table if not exists foo_bar_baz_0 partition of foo_bar_baz for
> values from (0) to (1);
>     create table if not exists foo_bar_baz_1 partition of foo_bar_baz for
> values from (1) to (2);
>     create table if not exists foo_bar_baz_2 partition of foo_bar_baz for
> values from (2) to (3);
>     create table if not exists foo_bar_baz_3 partition of foo_bar_baz for
> values from (3) to (4);
>     create table if not exists foo_bar_baz_4 partition of foo_bar_baz for
> values from (4) to (5);
>     create table if not exists foo_bar_baz_5 partition of foo_bar_baz for
> values from (5) to (6);
>
>     with foos_and_bars as (
>         select ((random() * 4) + 1)::int as foo_id, bar_id::int
>         from generate_series(0, 1499) as t(bar_id)
>     ), bazzes as (
>         select baz::int
>         from generate_series(1, 1500) as t(baz)
>     )
>     insert into foo_bar_baz (foo_id, bar_id, baz)
>     select foo_id, bar_id, baz
>     from bazzes as bz
>       join foos_and_bars as fab on mod(bz.baz, fab.foo_id) = 0;
>
> ==== end of script ====
>
> I see the some strange behaviour similar to to what is reported in the
> comments to that question:
>
> When I run the test query immediately after populating the tables with the
> sample data:
>
>     explain analyze
>     select count(*)
>     from foo_bar_baz as fbb
>       join foo on fbb.foo_id = foo.foo_id
>     where foo.foo_name = 'eeny'
>
> I do see an "Index Only Scan .... (never executed)" in the plan for the
> irrelevant partitions:
>
>   https://explain.depesz.com/s/AqlE
>
> However once I run "analyze foo_bar_baz" (or "vacuum analyze"), Postgres
> chooses to do a "Parallel Seq Scan" for each partition:
>
>   https://explain.depesz.com/s/WwxE
>
> Why does updating the statistics mess up (runtime) partition pruning?
>
>
> I played around with random_page_cost and that didn't change anything.
> I tried to create extended statistics on "foo(id, name)" so that the
> planner would no, that there is only one name per id. No change.
>
> I saw the above behaviour when running this on Windows 10 (my Laptop) or
> CentOS 7 (a test environment on a VM)
>
> On the CentOS server default_statistics_target is set to 100, on my laptop
> it is set to 1000
>
> In both cases the Postgres version was 11.4
>
> Any ideas?
>
> Thomas
>
>
Ran into the same behaviour of the planner. The amount of rows in the
partitions influence the statistics being generated and the statistics in
turn influence the plan chosen.

I managed to force the "correct" plan by manually setting the n_distinct
statistics for the partitioned table.
E.g.: alter table foo_bar_baz alter column foo_id set ( n_distinct=-1,
n_distinct_inherited=-1);

With a certain number of rows in the partitions the analyser sets the
n_distinct value for the partitioned table to the number of unique
partition keys and the n_distinct value
for the individual partitions to number of unique partition keys in that
partition. Unfortunately this causes the planner to pick a plan that
doesn't allow for execution pruning,
resulting in very slow execution times.

Regards,
Sverre

Reply via email to