Re: Strange runtime partition pruning behaviour with 11.4

2019-08-13 Thread Sverre Boschman
On Tue, Aug 13, 2019 at 8:46 AM Thomas Kellerer  wrote:

> I stumbled across this question on SO:
> https://stackoverflow.com/questions/56517852
>
> Disregarding the part about Postgres 9.3, the example for Postgres 11
> looks a bit confusing.
>
> There is a script to setup test data in that question:
>
>  start of script 
>
> create table foo (
> foo_id integer not null,
> foo_name varchar(10),
> constraint foo_pkey primary key (foo_id)
> );
>
> insert into foo
>   (foo_id, foo_name)
> values
>   (1, 'eeny'),
>   (2, 'meeny'),
>   (3, 'miny'),
>   (4, 'moe'),
>   (5, 'tiger'),
>   (6, 'toe');
>
> create table foo_bar_baz (
> foo_id integer not null,
> bar_id integer not null,
> bazinteger not null,
> constraint foo_bar_baz_pkey primary key (foo_id, bar_id, baz),
> constraint foo_bar_baz_fkey1 foreign key (foo_id)
> references foo (foo_id)
> ) partition by range (foo_id)
> ;
>
> create table if not exists foo_bar_baz_0 partition of foo_bar_baz for
> values from (0) to (1);
> create table if not exists foo_bar_baz_1 partition of foo_bar_baz for
> values from (1) to (2);
> create table if not exists foo_bar_baz_2 partition of foo_bar_baz for
> values from (2) to (3);
> create table if not exists foo_bar_baz_3 partition of foo_bar_baz for
> values from (3) to (4);
> create table if not exists foo_bar_baz_4 partition of foo_bar_baz for
> values from (4) to (5);
> create table if not exists foo_bar_baz_5 partition of foo_bar_baz for
> values from (5) to (6);
>
> with foos_and_bars as (
> select ((random() * 4) + 1)::int as foo_id, bar_id::int
> from generate_series(0, 1499) as t(bar_id)
> ), bazzes as (
> select baz::int
> from generate_series(1, 1500) as t(baz)
> )
> insert into foo_bar_baz (foo_id, bar_id, baz)
> select foo_id, bar_id, baz
> from bazzes as bz
>   join foos_and_bars as fab on mod(bz.baz, fab.foo_id) = 0;
>
>  end of script 
>
> I see the some strange behaviour similar to to what is reported in the
> comments to that question:
>
> When I run the test query immediately after populating the tables with the
> sample data:
>
> explain analyze
> select count(*)
> from foo_bar_baz as fbb
>   join foo on fbb.foo_id = foo.foo_id
> where foo.foo_name = 'eeny'
>
> I do see an "Index Only Scan  (never executed)" in the plan for the
> irrelevant partitions:
>
>   https://explain.depesz.com/s/AqlE
>
> However once I run "analyze foo_bar_baz" (or "vacuum analyze"), Postgres
> chooses to do a "Parallel Seq Scan" for each partition:
>
>   https://explain.depesz.com/s/WwxE
>
> Why does updating the statistics mess up (runtime) partition pruning?
>
>
> I played around with random_page_cost and that didn't change anything.
> I tried to create extended statistics on "foo(id, name)" so that the
> planner would no, that there is only one name per id. No change.
>
> I saw the above behaviour when running this on Windows 10 (my Laptop) or
> CentOS 7 (a test environment on a VM)
>
> On the CentOS server default_statistics_target is set to 100, on my laptop
> it is set to 1000
>
> In both cases the Postgres version was 11.4
>
> Any ideas?
>
> Thomas
>
>
Ran into the same behaviour of the planner. The amount of rows in the
partitions influence the statistics being generated and the statistics in
turn influence the plan chosen.

I managed to force the "correct" plan by manually setting the n_distinct
statistics for the partitioned table.
E.g.: alter table foo_bar_baz alter column foo_id set ( n_distinct=-1,
n_distinct_inherited=-1);

With a certain number of rows in the partitions the analyser sets the
n_distinct value for the partitioned table to the number of unique
partition keys and the n_distinct value
for the individual partitions to number of unique partition keys in that
partition. Unfortunately this causes the planner to pick a plan that
doesn't allow for execution pruning,
resulting in very slow execution times.

Regards,
Sverre


Re: Strange runtime partition pruning behaviour with 11.4

2019-08-05 Thread Thomas Kellerer
Tom Lane schrieb am 03.08.2019 um 18:05:
> Yeah, I get the same plan with or without ANALYZE, too.  In this example,
> having the ANALYZE stats barely moves the rowcount estimates for
> foo_bar_baz at all, so it's not surprising that the plan doesn't change.
> (I do wonder how Thomas got a different outcome...)

I don't know why either ;) 

I am using a JDBC based SQL tool to run that - I don't know if that matters.

I just tried this script with Postgres 12 beta2 and there I do not get 
the initial plan with "never executed" (so the same behaviour as everybody
else seems to have).

If the reason why my initial plan is different than the "analyzed" plan 
lies in the configuration, I am happy to share my postgresql.conf if 
that is of any interest.

Thomas






Re: Strange runtime partition pruning behaviour with 11.4

2019-08-03 Thread MichaelDBA
I too got the same plan (non runtime partition pruning plan) with or 
without the statistics.  So it looks like the workaround until this is 
fixed is to re-arrange the query to do a subselect to force the runtime 
partition pruning as Andreas suggested, which I tested and indeed does 
work for me too!


Regards,
Michael Vitale

Thomas Kellerer wrote on 8/3/2019 10:06 AM:

it's posible to rewrite the query to:


test=# explain analyse select count(*) from foo_bar_baz as fbb where 
foo_id = (select foo_id from foo where foo_name = 'eeny');


I know, that's not a solution, but a workaround. :-(


Yes, I discovered that as well.

But I'm more confused (or concerned) by the fact that the (original) 
query works correctly *without* statistics.


Thomas











Re: Strange runtime partition pruning behaviour with 11.4

2019-08-03 Thread Tom Lane
Andreas Kretschmer  writes:
> Am 03.08.19 um 16:06 schrieb Thomas Kellerer:
>> But I'm more confused (or concerned) by the fact that the (original) 
>> query works correctly *without* statistics.

> can't reproduce that :-(  (PG 11.4 Community)

Yeah, I get the same plan with or without ANALYZE, too.  In this example,
having the ANALYZE stats barely moves the rowcount estimates for
foo_bar_baz at all, so it's not surprising that the plan doesn't change.
(I do wonder how Thomas got a different outcome...)

Given the shape of the preferred plan:

 Finalize Aggregate  (cost=15779.59..15779.60 rows=1 width=8) (actual 
time=160.329..160.330 rows=1 loops=1)
   ->  Gather  (cost=15779.38..15779.59 rows=2 width=8) (actual 
time=160.011..161.712 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Partial Aggregate  (cost=14779.38..14779.39 rows=1 width=8) 
(actual time=154.675..154.675 rows=1 loops=3)
   ->  Hash Join  (cost=1.09..14612.90 rows=66590 width=0) (actual 
time=86.814..144.793 rows=100500 loops=3)
 Hash Cond: (fbb_1.foo_id = foo.foo_id)
 ->  Parallel Append  (cost=0.00..12822.21 rows=399537 
width=4) (actual time=0.019..95.644 rows=318950 loops=3)
   ->  Parallel Seq Scan on foo_bar_baz_1 fbb_1  
(cost=0.00..3403.53 rows=177353 width=4) (actual time=0.012..18.881 rows=100500 
loops=3)
   ->  Parallel Seq Scan on foo_bar_baz_2 fbb_2  
(cost=0.00..3115.53 rows=162353 width=4) (actual time=0.018..51.716 rows=276000 
loops=1)
   ->  Parallel Seq Scan on foo_bar_baz_3 fbb_3  
(cost=0.00..2031.82 rows=105882 width=4) (actual time=0.011..16.854 rows=9 
loops=2)
   ->  Parallel Seq Scan on foo_bar_baz_4 fbb_4  
(cost=0.00..1584.00 rows=82500 width=4) (actual time=0.011..26.950 rows=140250 
loops=1)
   ->  Parallel Seq Scan on foo_bar_baz_5 fbb_5  
(cost=0.00..667.65 rows=34765 width=4) (actual time=0.014..11.896 rows=59100 
loops=1)
   ->  Parallel Seq Scan on foo_bar_baz_0 fbb  
(cost=0.00..22.00 rows=1200 width=4) (actual time=0.001..0.001 rows=0 loops=1)
 ->  Hash  (cost=1.07..1.07 rows=1 width=4) (actual 
time=0.038..0.038 rows=1 loops=3)
   Buckets: 1024  Batches: 1  Memory Usage: 9kB
   ->  Seq Scan on foo  (cost=0.00..1.07 rows=1 
width=4) (actual time=0.021..0.023 rows=1 loops=3)
 Filter: ((foo_name)::text = 'eeny'::text)
 Rows Removed by Filter: 5

it's obvious that no pruning can happen, run-time or otherwise,
because the partitioned table is being scanned on the outside
of the join --- so the target value of foo_id isn't available.

We can force the planner to its second best choice with
set enable_hashjoin to 0;

and then we get

 Aggregate  (cost=31954.09..31954.10 rows=1 width=8) (actual 
time=420.158..420.158 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..31554.55 rows=159815 width=0) (actual 
time=0.058..389.974 rows=301500 loops=1)
 Join Filter: (fbb.foo_id = foo.foo_id)
 Rows Removed by Join Filter: 655350
 ->  Seq Scan on foo  (cost=0.00..1.07 rows=1 width=4) (actual 
time=0.025..0.028 rows=1 loops=1)
   Filter: ((foo_name)::text = 'eeny'::text)
   Rows Removed by Filter: 5
 ->  Append  (cost=0.00..19567.35 rows=958890 width=4) (actual 
time=0.026..280.510 rows=956850 loops=1)
   ->  Seq Scan on foo_bar_baz_0 fbb  (cost=0.00..30.40 rows=2040 
width=4) (actual time=0.003..0.003 rows=0 loops=1)
   ->  Seq Scan on foo_bar_baz_1 fbb_1  (cost=0.00..4645.00 
rows=301500 width=4) (actual time=0.022..57.836 rows=301500 loops=1)
   ->  Seq Scan on foo_bar_baz_2 fbb_2  (cost=0.00..4252.00 
rows=276000 width=4) (actual time=0.019..51.834 rows=276000 loops=1)
   ->  Seq Scan on foo_bar_baz_3 fbb_3  (cost=0.00..2773.00 
rows=18 width=4) (actual time=0.016..31.951 rows=18 loops=1)
   ->  Seq Scan on foo_bar_baz_4 fbb_4  (cost=0.00..2161.50 
rows=140250 width=4) (actual time=0.015..24.392 rows=140250 loops=1)
   ->  Seq Scan on foo_bar_baz_5 fbb_5  (cost=0.00..911.00 
rows=59100 width=4) (actual time=0.012..10.252 rows=59100 loops=1)

This is a good deal slower, and the planner correctly estimates that it's
a good deal slower, so that's why it didn't get picked.

But ... why didn't any run-time pruning happen?  Because the shape of the
plan is still wrong: the join condition is being applied at the nestloop
node.  If we'd pushed down the foo_id condition to the foo_bar_baz scan
then there'd be hope of pruning.

I think the reason that that isn't happening is that the planner has
not been taught that run-time pruning is a thing, so it's not giving
any cost preference to doing things in a way that would enable 

Re: Strange runtime partition pruning behaviour with 11.4

2019-08-03 Thread Andreas Kretschmer




Am 03.08.19 um 16:06 schrieb Thomas Kellerer:

it's posible to rewrite the query to:


test=# explain analyse select count(*) from foo_bar_baz as fbb where 
foo_id = (select foo_id from foo where foo_name = 'eeny');


I know, that's not a solution, but a workaround. :-(


Yes, I discovered that as well.

But I'm more confused (or concerned) by the fact that the (original) 
query works correctly *without* statistics.


Thomas




can't reproduce that :-(  (PG 11.4 Community)

(all in a file and executed the explain immediately)


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com





Re: Strange runtime partition pruning behaviour with 11.4

2019-08-03 Thread Thomas Kellerer

it's posible to rewrite the query to:


test=# explain analyse select count(*) from foo_bar_baz as fbb where foo_id = 
(select foo_id from foo where foo_name = 'eeny');

I know, that's not a solution, but a workaround. :-(


Yes, I discovered that as well.

But I'm more confused (or concerned) by the fact that the (original) query 
works correctly *without* statistics.

Thomas







Re: Strange runtime partition pruning behaviour with 11.4

2019-08-03 Thread Andreas Kretschmer

Hi,


Am 03.08.19 um 15:16 schrieb MichaelDBA:
I too am a bit perplexed by why runtime partition pruning does not 
seem to work with this example.  Anybody got any ideas of this? 



please don't top-posting.

it's posible to rewrite the query to:


test=# explain analyse select count(*) from foo_bar_baz as fbb where 
foo_id = (select foo_id from foo where foo_name = 'eeny');

QUERY PLAN
---
 Finalize Aggregate  (cost=15880.63..15880.64 rows=1 width=8) (actual 
time=48.447..48.448 rows=1 loops=1)

   InitPlan 1 (returns $0)
 ->  Seq Scan on foo  (cost=0.00..24.50 rows=6 width=4) (actual 
time=0.243..0.246 rows=1 loops=1)

   Filter: ((foo_name)::text = 'eeny'::text)
   Rows Removed by Filter: 5
   ->  Gather  (cost=15855.92..15856.13 rows=2 width=8) (actual 
time=48.376..51.468 rows=3 loops=1)

 Workers Planned: 2
 Params Evaluated: $0
 Workers Launched: 2
 ->  Partial Aggregate  (cost=14855.92..14855.93 rows=1 
width=8) (actual time=42.600..42.600 rows=1 loops=3)
   ->  Parallel Append  (cost=0.00..13883.01 rows=389162 
width=0) (actual time=0.139..34.914 rows=83500 loops=3)
 ->  Parallel Bitmap Heap Scan on foo_bar_baz_0 
fbb  (cost=4.23..14.73 rows=6 width=0) (never executed)

   Recheck Cond: (foo_id = $0)
   ->  Bitmap Index Scan on foo_bar_baz_0_pkey  
(cost=0.00..4.23 rows=10 width=0) (never executed)

 Index Cond: (foo_id = $0)
 ->  Parallel Seq Scan on foo_bar_baz_2 fbb_2  
(cost=0.00..3865.72 rows=178218 width=0) (never executed)

   Filter: (foo_id = $0)
 ->  Parallel Seq Scan on foo_bar_baz_1 fbb_1  
(cost=0.00..3195.62 rows=147250 width=0) (actual time=0.129..24.735 
rows=83500 loops=3)

   Filter: (foo_id = $0)
 ->  Parallel Seq Scan on foo_bar_baz_3 fbb_3  
(cost=0.00..2334.49 rows=107559 width=0) (never executed)

   Filter: (foo_id = $0)
 ->  Parallel Seq Scan on foo_bar_baz_4 fbb_4  
(cost=0.00..1860.95 rows=85756 width=0) (never executed)

   Filter: (foo_id = $0)
 ->  Parallel Seq Scan on foo_bar_baz_5 fbb_5  
(cost=0.00..665.69 rows=30615 width=0) (never executed)

   Filter: (foo_id = $0)
 Planning Time: 12.648 ms
 Execution Time: 52.621 ms
(27 rows)

test=*#


I know, that's not a solution, but a workaround. :-(

(pg 12beta2 and also with PostgreSQL 11.4 (2ndQPG 11.4r1.6.7))



Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com





Re: Strange runtime partition pruning behaviour with 11.4

2019-08-03 Thread MichaelDBA
I too am a bit perplexed by why runtime partition pruning does not seem 
to work with this example.  Anybody got any ideas of this?


Regards,
Michael Vitale

Thomas Kellerer wrote on 8/2/2019 9:58 AM:

I stumbled across this question on SO: 
https://stackoverflow.com/questions/56517852

Disregarding the part about Postgres 9.3, the example for Postgres 11 looks a 
bit confusing.

There is a script to setup test data in that question:

 start of script 

 create table foo (
 foo_id integer not null,
 foo_name varchar(10),
 constraint foo_pkey primary key (foo_id)
 );

 insert into foo
   (foo_id, foo_name)
 values
   (1, 'eeny'),
   (2, 'meeny'),
   (3, 'miny'),
   (4, 'moe'),
   (5, 'tiger'),
   (6, 'toe');

 create table foo_bar_baz (
 foo_id integer not null,
 bar_id integer not null,
 bazinteger not null,
 constraint foo_bar_baz_pkey primary key (foo_id, bar_id, baz),
 constraint foo_bar_baz_fkey1 foreign key (foo_id)
 references foo (foo_id)
 ) partition by range (foo_id)
 ;

 create table if not exists foo_bar_baz_0 partition of foo_bar_baz for 
values from (0) to (1);
 create table if not exists foo_bar_baz_1 partition of foo_bar_baz for 
values from (1) to (2);
 create table if not exists foo_bar_baz_2 partition of foo_bar_baz for 
values from (2) to (3);
 create table if not exists foo_bar_baz_3 partition of foo_bar_baz for 
values from (3) to (4);
 create table if not exists foo_bar_baz_4 partition of foo_bar_baz for 
values from (4) to (5);
 create table if not exists foo_bar_baz_5 partition of foo_bar_baz for 
values from (5) to (6);

 with foos_and_bars as (
 select ((random() * 4) + 1)::int as foo_id, bar_id::int
 from generate_series(0, 1499) as t(bar_id)
 ), bazzes as (
 select baz::int
 from generate_series(1, 1500) as t(baz)
 )
 insert into foo_bar_baz (foo_id, bar_id, baz)
 select foo_id, bar_id, baz
 from bazzes as bz
   join foos_and_bars as fab on mod(bz.baz, fab.foo_id) = 0;

 end of script 

I see the some strange behaviour similar to to what is reported in the comments 
to that question:

When I run the test query immediately after populating the tables with the 
sample data:

 explain analyze
 select count(*)
 from foo_bar_baz as fbb
   join foo on fbb.foo_id = foo.foo_id
 where foo.foo_name = 'eeny'

I do see an "Index Only Scan  (never executed)" in the plan for the 
irrelevant partitions:

   https://explain.depesz.com/s/AqlE

However once I run "analyze foo_bar_baz" (or "vacuum analyze"), Postgres chooses to do a 
"Parallel Seq Scan" for each partition:

   https://explain.depesz.com/s/WwxE

Why does updating the statistics mess up (runtime) partition pruning?


I played around with random_page_cost and that didn't change anything.
I tried to create extended statistics on "foo(id, name)" so that the planner 
would no, that there is only one name per id. No change.

I saw the above behaviour when running this on Windows 10 (my Laptop) or CentOS 
7 (a test environment on a VM)

On the CentOS server default_statistics_target is set to 100, on my laptop it 
is set to 1000

In both cases the Postgres version was 11.4

Any ideas?

Thomas