Hi all, I'm reporting a planner regression that started in PostgreSQL 16 and persists through 18. The same query on identical data runs in ~70ms on PG 15 but ~1440ms on PG 16/17/18, due to a different join order chosen by the planner.
I've attached a fully self-contained reproducer script (DDL + data
generation + query) with anonymised schema names.
## Environment
- PG 15.17: ~70ms execution time (good plan)
- PG 18.3: ~1440ms execution time (bad plan)
- OS: RHEL 9.7
- Architecture: x86_64
- Both servers: identical hardware, identical postgresql.conf,
identical data, freshly ANALYZEd
## Summary of the problem
The query joins 5 tables (with 2 self-joins) using composite keys.
All joins use Nested Loop with Index Scan. The planner estimates
rows=1 at every join level, but actuals are 105-345 rows.
In PG 15, the planner happens to choose a join order where the
self-join to `product_options` (aliased as opt2) is at the outermost
level with 216 loops.
In PG 16+, the planner nests this same self-join *inside* the
`option_rules` loop, causing 17,325 loops instead of 216 — an 80x
increase in index scans (~2.8M buffer hits vs trivial).
The estimated costs are nearly identical in both versions (~14),
so a small change in cost model tips the planner to a different
(and much worse) join order.
## Key observations from EXPLAIN ANALYZE
### PG 15 (good plan)
product_options opt2 Index Scan:
loops=216, rows_per_loop=279
Total index searches: 216
Execution Time: 70.053 ms
### PG 18 (bad plan)
product_options opt2 Index Scan:
loops=17,325, rows_per_loop=209
Total index searches: 17,325
Buffers: shared hit=2,825,460
Execution Time: 1440.238 ms
The only difference is join order. All other nodes use the same
indexes and produce the same row counts.
## Workaround
SET LOCAL join_collapse_limit = 1;
This forces the written join order and restores PG 15 performance.
I also tried:
- Increased statistics_target to 1000 on all filter columns -> no effect
- Extended statistics (ndistinct, mcv) on the 5-column composite
key -> no effect
- ANALYZE after both changes -> no effect
None of these improved the row estimates enough to change the
planner's join order decision.
## Reproducer
The attached SQL script (reproducer_pg16_join_regression.sql) is
fully self-contained:
1. Creates 4 tables with anonymised names
2. Generates ~14M rows with realistic distributions (WARNING 3+ GB of
data):
- product_options: ~1.39M rows
- pricelist_options: ~11.9M rows
- option_rules: ~628K rows
- tax_codes: 88 rows
3. Creates all indexes
4. Runs ANALYZE
5. Executes the query with EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
Expected results:
- PG 15: ~70ms, product_options opt2 scanned 216 times
- PG 16+: ~1400ms, product_options opt2 scanned ~17K times
The script also includes the workaround query (with
join_collapse_limit = 1) for comparison.
## EXPLAIN ANALYZE output — PG 15
Nested Loop Left Join (cost=4.53..14.03 rows=1 width=618)
(actual time=0.904..69.853 rows=216 loops=1)
-> Nested Loop Left Join (cost=4.10..11.35 rows=1 width=506)
(actual time=0.523..40.812 rows=216 loops=1)
-> Nested Loop Left Join (cost=3.67..8.66 rows=1 width=416)
(actual time=0.462..31.734 rows=216 loops=1)
-> Nested Loop (cost=3.25..5.98 rows=1 width=332)
(actual time=0.204..21.633 rows=165 loops=1)
-> Index Scan using pk_product_options
on product_options
(actual time=0.063..0.190 rows=345 loops=1)
Index Cond: (brand_id = 10 AND line_code = 'ABC'
AND model_year = 'YR' AND model_code = 'ABC'
AND version_code = 'VER-001')
-> Merge Right Join (cost=2.82..3.31 rows=1 width=220)
(actual time=0.003..0.051 rows=165 loops=345)
-> Index Scan using pk_tax_codes on tax_codes
(actual time=0.001..0.002 rows=9 loops=345)
-> Sort (cost=2.67..2.68 rows=1 width=90)
(actual time=0.000..0.006 rows=165 loops=345)
Sort Key: pricelist_options.tax_id
Sort Method: quicksort Memory: 48kB
-> Index Scan using ix_pricelist_options_1
on pricelist_options
(actual time=0.030..0.081 rows=165 loops=1)
Index Cond: (brand_id = 10
AND line_code = 'ABC'
AND model_year = 'YR'
AND model_code = 'ABC'
AND version_code = 'VER-001'
AND pricelist_id = 100)
Join Filter: (pricelist_options.brand_id =
option_rules.brand_id AND ...)
Rows Removed by Join Filter: 17220
-> Index Scan using pk_option_rules on option_rules
(actual time=0.014..0.040 rows=105 loops=165)
Index Cond: (brand_id = 10 AND line_code = 'ABC'
AND model_year = 'YR' AND model_code = 'ABC'
AND version_code = 'VER-001')
Join Filter: (pl_check.pricelist_id =
pricelist_options.pricelist_id AND ...)
Rows Removed by Join Filter: 28925
-> Index Scan using ix_pricelist_options_1
on pricelist_options pl_check
(actual time=0.009..0.022 rows=134 loops=216)
Index Cond: (brand_id = 10 AND line_code = 'ABC'
AND model_year = 'YR' AND model_code = 'ABC'
AND version_code = 'VER-001' AND pricelist_id = 100)
-- *** opt2 joined LAST, at outermost level ***
-> Index Scan using pk_product_options
on product_options opt2
(actual time=0.013..0.101 rows=279 loops=216)
Index Cond: (brand_id = 10 AND line_code = 'ABC'
AND model_year = 'YR' AND model_code = 'ABC'
AND version_code = 'VER-001')
Planning Time: 4.259 ms
Execution Time: 70.053 ms
## EXPLAIN ANALYZE output — PG 18
Nested Loop Left Join (cost=4.80..13.99 rows=1 width=618)
(actual time=8.639..1439.956 rows=216 loops=1)
Buffers: shared hit=2863489
-> Nested Loop (cost=4.37..11.31 rows=1 width=528)
(actual time=8.630..1436.943 rows=216 loops=1)
Buffers: shared hit=2862852
Join Filter: (pricelist_options.option_type =
product_options.option_type AND ...)
Rows Removed by Join Filter: 34033
-> Nested Loop Left Join (cost=3.94..8.64 rows=1 width=416)
(actual time=8.608..1422.736 rows=216 loops=1)
Buffers: shared hit=2836032
Join Filter: (pricelist_options.option_code =
option_rules.option_code AND ...)
Rows Removed by Join Filter: 17220
-> Merge Right Join (cost=3.09..3.31 rows=1 width=220)
(actual time=0.145..0.305 rows=165 loops=1)
Buffers: shared hit=12
-> Index Scan using pk_tax_codes on tax_codes
(actual time=0.017..0.020 rows=9 loops=1)
Buffers: shared hit=3
-> Sort (cost=2.67..2.68 rows=1 width=90)
(actual time=0.121..0.158 rows=165 loops=1)
Sort Key: pricelist_options.tax_id
Sort Method: quicksort Memory: 45kB
Buffers: shared hit=9
-> Index Scan using ix_pricelist_options_1
on pricelist_options
(actual time=0.023..0.073 rows=165 loops=1)
Index Cond: (brand_id = 10
AND line_code = 'ABC'
AND model_year = 'YR'
AND model_code = 'ABC'
AND version_code = 'VER-001'
AND pricelist_id = 100)
Buffers: shared hit=9
-- *** opt2 joined INSIDE the option_rules loop ***
-> Nested Loop Left Join (cost=0.85..5.32 rows=1 width=196)
(actual time=0.060..8.612 rows=105 loops=165)
Buffers: shared hit=2836020
-> Index Scan using pk_option_rules on option_rules
(actual time=0.015..0.043 rows=105 loops=165)
Index Cond: (brand_id = 10 AND line_code = 'ABC'
AND model_year = 'YR' AND model_code = 'ABC'
AND version_code = 'VER-001')
Index Searches: 165
Buffers: shared hit=10560
-- *** THIS IS THE PROBLEM NODE ***
-> Index Scan using pk_product_options
on product_options opt2
(actual time=0.013..0.061 rows=209 loops=17325)
Index Cond: (brand_id = 10 AND line_code = 'ABC'
AND model_year = 'YR' AND model_code = 'ABC'
AND version_code = 'VER-001')
Index Searches: 17325
Buffers: shared hit=2825460
-> Index Scan using pk_product_options on product_options
(actual time=0.013..0.049 rows=158 loops=216)
Index Cond: (brand_id = 10 AND line_code = 'ABC'
AND model_year = 'YR' AND model_code = 'ABC'
AND version_code = 'VER-001')
Index Searches: 216
Buffers: shared hit=26820
-> Index Scan using ix_pricelist_options_1
on pricelist_options pl_check
(actual time=0.013..0.013 rows=0.49 loops=216)
Index Cond: (brand_id = option_rules.brand_id
AND line_code = option_rules.line_code
AND model_year = option_rules.model_year
AND model_code = option_rules.model_code
AND version_code = option_rules.version_code
AND pricelist_id = 100)
Filter: (option_type = option_rules.sec_option_type
AND option_code = option_rules.sec_option_code)
Rows Removed by Filter: 49
Index Searches: 105
Buffers: shared hit=637
Planning Time: 4.293 ms
Execution Time: 1440.238 ms
## Questions for the community
1. Was there a specific commit in the PG 16 cycle that changed how
the planner evaluates join orderings for LEFT JOINs, possibly
related to the Right Anti Join work or outer join commutation?
2. Given that extended statistics (ndistinct, mcv) on the composite
key didn't improve the row estimates, is there a planned
improvement to how the planner estimates multi-table join
cardinality with correlated composite keys?
3. Is there a way to improve the planner's estimate without
resorting to join_collapse_limit = 1?
Thank you for your time. Happy to provide any additional information.
Best regards,
Mauro Gatti
[email protected]
reproducer_pg16_join_regression.sql
Description: Binary data
