On Mon, May 6, 2024 at 4:26 PM Jakub Wartak <jakub.war...@enterprisedb.com> wrote:
> Hi Ashutosh & hackers, > > On Mon, Apr 15, 2024 at 9:00 AM Ashutosh Bapat > <ashutosh.bapat....@gmail.com> wrote: > > > > Here's patch with > > > [..] > > Adding to the next commitfest but better to consider this for the next > set of minor releases. > > 1. The patch does not pass cfbot - > https://cirrus-ci.com/task/5486258451906560 on master due to test > failure "not ok 206 + partition_join" > So I need to create a patch for master first. I thought CFBot somehow knew that the patch was created for PG 15. :) > > 2. Without the patch applied, the result of the meson test on master > was clean (no failures , so master is fine). After applying patch > there were expected some hunk failures (as the patch was created for > 15_STABLE): > > patching file src/backend/optimizer/plan/planner.c > Hunk #1 succeeded at 7567 (offset 468 lines). > Hunk #2 succeeded at 7593 (offset 468 lines). > patching file src/test/regress/expected/partition_join.out > Hunk #1 succeeded at 4777 (offset 56 lines). > Hunk #2 succeeded at 4867 (offset 56 lines). > patching file src/test/regress/sql/partition_join.sql > Hunk #1 succeeded at 1136 (offset 1 line). > > 3. Without patch there is performance regression/bug on master (cost > is higher with enable_partitionwise_join=on that without it): > > data preparation: > -- Test the process_outer_partition() code path > CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c); > CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0000', > '0001', '0002'); > CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0003', > '0004'); > INSERT INTO plt1_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM > generate_series(0, 24) i; > ANALYZE plt1_adv; > > CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c); > CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002'); > CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0003', > '0004'); > INSERT INTO plt2_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM > generate_series(0, 24) i WHERE i % 5 IN (2, 3, 4); > ANALYZE plt2_adv; > > CREATE TABLE plt3_adv (a int, b int, c text) PARTITION BY LIST (c); > CREATE TABLE plt3_adv_p1 PARTITION OF plt3_adv FOR VALUES IN ('0001'); > CREATE TABLE plt3_adv_p2 PARTITION OF plt3_adv FOR VALUES IN ('0003', > '0004'); > INSERT INTO plt3_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM > generate_series(0, 24) i WHERE i % 5 IN (1, 3, 4); > ANALYZE plt3_adv; > > off: > EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 > LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c > = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != > 4 ORDER BY t1.c, t1.a, t2.a, t3.a; > QUERY PLAN > > ----------------------------------------------------------------------------------------------- > Sort (cost=22.02..22.58 rows=223 width=27) > Sort Key: t1.c, t1.a, t2.a, t3.a > -> Hash Full Join (cost=4.83..13.33 rows=223 width=27) > [..] > > > with enable_partitionwise_join=ON (see the jump from cost 22.02 -> 27.65): > EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 > LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c > = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != > 4 ORDER BY t1.c, t1.a, t2.a, t3.a; > QUERY PLAN > > ----------------------------------------------------------------------------------------------- > Sort (cost=27.65..28.37 rows=289 width=27) > Sort Key: t1.c, t1.a, t2.a, t3.a > -> Append (cost=2.23..15.83 rows=289 width=27) > -> Hash Full Join (cost=2.23..4.81 rows=41 width=27) > [..] > -> Hash Full Join (cost=2.45..9.57 rows=248 width=27) > [..] > > > However with the patch applied the plan with minimal cost is always > chosen ("22"): > > explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT > JOIN > plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE > coalesce(t1.a, 0 ) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY > t1.c, t1.a, t2.a, t3.a; > QUERY PLAN > > ----------------------------------------------------------------------------------------------- > Sort (cost=22.02..22.58 rows=223 width=27) > Sort Key: t1.c, t1.a, t2.a, t3.a > -> Hash Full Join (cost=4.83..13.33 rows=223 width=27) > [..] > > > set enable_partitionwise_join to on; > explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT > JOIN > plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE > coalesce(t1.a, 0 ) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY > t1.c, t1.a, t2.a, t3.a; > QUERY PLAN > > ----------------------------------------------------------------------------------------------- > Sort (cost=22.02..22.58 rows=223 width=27) > Sort Key: t1.c, t1.a, t2.a, t3.a > -> Hash Full Join (cost=4.83..13.33 rows=223 width=27) > [..] > > with the patch applied, the minimal cost (with toggle on or off) the > cost always stays the minimal from the available ones. We cannot > provide a reproducer for real performance regression, but for the > affected customer it took 530+s (with enable_partitionwise_join=on) and without that GUC it it was ~23s. > Thanks for providing actual timing. That's a huge difference. > 4. meson test ends up with failures like below: > > 4/290 postgresql:regress / regress/regress > ERROR 32.67s > 6/290 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade > ERROR 56.96s > 35/290 postgresql:recovery / recovery/027_stream_regress > ERROR 40.20s > > (all due to "regression tests pass" failures) > > the partition_join.sql is failing for test 206, so for this: > > -- partitionwise join with fractional paths > CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id); > CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO > ('1000'); > CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO > ('2000'); > > -- insert data > INSERT INTO fract_t (id) (SELECT generate_series(0, 1999)); > ANALYZE fract_t; > > -- verify plan; nested index only scans > SET max_parallel_workers_per_gather = 0; > SET enable_partitionwise_join = on; > > the testsuite was expecting the below with enable_partitionwise_join = on; > > EXPLAIN (COSTS OFF) > SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER > BY x.id ASC LIMIT 10; > QUERY PLAN > ----------------------------------------------------------------------- > Limit > -> Merge Append > Sort Key: x.id > -> Merge Left Join > Merge Cond: (x_1.id = y_1.id) > -> Index Only Scan using fract_t0_pkey on fract_t0 x_1 > -> Index Only Scan using fract_t0_pkey on fract_t0 y_1 > -> Merge Left Join > Merge Cond: (x_2.id = y_2.id) > -> Index Only Scan using fract_t1_pkey on fract_t1 x_2 > -> Index Only Scan using fract_t1_pkey on fract_t1 y_2 > > but actually with patch it gets this (here with costs): > > EXPLAIN (COSTS) SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y > USING (id) ORDER BY x.id ASC LIMIT 10; > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------- > Limit (cost=1.10..2.21 rows=10 width=16) > -> Merge Left Join (cost=1.10..223.10 rows=2000 width=16) > Merge Cond: (x.id = y.id) > -> Append (cost=0.55..96.55 rows=2000 width=8) > [..] > -> Append (cost=0.55..96.55 rows=2000 width=8) > [..] > > > if you run it without patch and again with enable_partitionwise_join=on: > > EXPLAIN SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING > (id) ORDER BY x.id ASC LIMIT 10; > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------- > Limit (cost=1.11..2.22 rows=10 width=16) > -> Merge Append (cost=1.11..223.11 rows=2000 width=16) > Sort Key: x.id > -> Merge Left Join (cost=0.55..101.55 rows=1000 width=16) > [..] > -> Merge Left Join (cost=0.55..101.55 rows=1000 width=16) > [..] > > So with the patch that SQL does not use partitionwise join as it finds > it more optimal to stick to a plan with cost of "1.10..2.21" instead > of "1.11..2.22" (w/ partition_join), nitpicking but still a failure > technically. Perhaps it could be even removed? (it's pretty close to > noise?). > I think we need to replace the failing query with something which uses partitionwise join even with the patch. I will take a look at this after returning from a two week long vacation, unless someone else is interested in fixing this before that. -- Best Wishes, Ashutosh Bapat