On Mon, Dec 15, 2025 at 9:06 PM Robert Haas <[email protected]> wrote:
>
> Here's v7.
[..]
OK, so I've tested today from Your's branch directly, so I hope that
was also v7. Given the following q20 query:
SELECT s_name, s_address
FROM supplier, nation
WHERE s_suppkey in
(SELECT ps_suppkey
FROM partsupp
WHERE ps_partkey in
(SELECT p_partkey
FROM part
WHERE p_name LIKE 'forest%' )
AND ps_availqty >
(SELECT 0.5 * sum(l_quantity)
FROM lineitem
WHERE l_partkey = ps_partkey
AND l_suppkey = ps_suppkey
AND l_shipdate >= DATE '1994-01-01'
AND l_shipdate < DATE '1994-01-01' + INTERVAL '1' year ) )
AND s_nationkey = n_nationkey
AND n_name = 'CANADA'
ORDER BY s_name;
in normal conditions (w/o advice) the above query generates:
Sort (cost=1010985030.44..1010985030.59 rows=61 width=51)
Sort Key: supplier.s_name
-> Nested Loop (cost=0.42..1010985028.63 rows=61 width=51)
Join Filter: (nation.n_nationkey = supplier.s_nationkey)
-> Seq Scan on nation (cost=0.00..1.31 rows=1 width=4)
Filter: (n_name = 'CANADA'::bpchar)
-> Nested Loop Semi Join (cost=0.42..1010985008.29
rows=1522 width=55)
Join Filter: (partsupp.ps_suppkey = supplier.s_suppkey)
-> Seq Scan on supplier (cost=0.00..249.30 rows=7730 width=59)
-> Materialize (cost=0.42..1010755994.57 rows=1973 width=4)
-> Nested Loop (cost=0.42..1010755984.71
rows=1973 width=4)
-> Seq Scan on part (cost=0.00..4842.25
rows=1469 width=4)
Filter: ((p_name)::text ~~ 'forest%'::text)
-> Index Scan using pk_partsupp on
partsupp (cost=0.42..688053.87 rows=1 width=8)
Index Cond: (ps_partkey = part.p_partkey)
Filter: ((ps_availqty)::numeric >
(SubPlan expr_1))
SubPlan expr_1
-> Aggregate
(cost=172009.42..172009.44 rows=1 width=32)
-> Seq Scan on lineitem
(cost=0.00..172009.42 rows=1 width=5)
Filter: ((l_shipdate >=
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
without time zone) AND (l_partkey = partsupp.ps_partkey) AND
(l_suppkey = partsupp.ps_suppkey))
Generated Plan Advice:
JOIN_ORDER(nation (supplier (part partsupp)))
NESTED_LOOP_PLAIN(partsupp partsupp) <--- [X]
NESTED_LOOP_MATERIALIZE(partsupp)
SEQ_SCAN(nation supplier part lineitem@expr_1)
INDEX_SCAN(partsupp public.pk_partsupp)
SEMIJOIN_NON_UNIQUE((partsupp part))
NO_GATHER(supplier nation partsupp part lineitem@expr_1)
Please see the - I think it's confusing? -
NESTED_LOOP_MATERIALIZE(partsupp partsupp) - that's 2x the same
string? This causes it to turn into below plan -- I've marked the
problem with [X]
Sort (cost=50035755.50..50035755.66 rows=61 width=51)
Sort Key: supplier.s_name
-> Nested Loop (cost=12562154.32..50035753.70 rows=61 width=51)
Join Filter: (nation.n_nationkey = supplier.s_nationkey)
-> Seq Scan on nation (cost=0.00..1.31 rows=1 width=4)
Filter: (n_name = 'CANADA'::bpchar)
-> Nested Loop Semi Join (cost=12562154.32..50035733.36
rows=1522 width=55)
[X] -- missing Join Filter here
-> Seq Scan on supplier (cost=0.00..249.30 rows=7730 width=59)
[X] -- HJ instead of Materialize+Nested Loop below:
-> Hash Join (cost=12562154.32..12567002.09 rows=1 width=4)
Hash Cond: (part.p_partkey = partsupp.ps_partkey)
-> Seq Scan on part (cost=0.00..4842.25
rows=1469 width=4)
Filter: ((p_name)::text ~~ 'forest%'::text)
-> Hash (cost=12562154.02..12562154.02 rows=24 width=8)
-> Index Scan using pk_partsupp on
partsupp (cost=0.42..12562154.02 rows=24 width=8)
[X] -- wrong Index Cond below
(suppkey instead of partkey)
Index Cond: (ps_suppkey = supplier.s_suppkey)
Filter: ((ps_availqty)::numeric >
(SubPlan expr_1))
SubPlan expr_1
-> Aggregate
(cost=172009.42..172009.44 rows=1 width=32)
-> Seq Scan on lineitem
(cost=0.00..172009.42 rows=1 width=5)
Filter: ((l_shipdate >=
'1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp
without time zone) AND (l_partkey = partsupp.ps_partkey) AND
(l_suppkey = partsupp.ps_suppkey))
Supplied Plan Advice:
SEQ_SCAN(nation) /* matched */
SEQ_SCAN(supplier) /* matched */
SEQ_SCAN(part) /* matched */
SEQ_SCAN(lineitem@expr_1) /* matched */
INDEX_SCAN(partsupp public.pk_partsupp) /* matched */
JOIN_ORDER(nation (supplier (part partsupp))) /* matched, conflicting */
NESTED_LOOP_PLAIN(partsupp) /* matched, conflicting */
NESTED_LOOP_PLAIN(partsupp) /* matched, conflicting */
NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting, failed */
SEMIJOIN_NON_UNIQUE((partsupp part)) /* matched, conflicting */
NO_GATHER(supplier) /* matched */
NO_GATHER(nation) /* matched */
NO_GATHER(partsupp) /* matched */
NO_GATHER(part) /* matched */
NO_GATHER(lineitem@expr_1) /* matched */
So the difference is basically between:
set pg_plan_advice.advice = '[..] NESTED_LOOP_PLAIN(partsupp
partsupp) NESTED_LOOP_MATERIALIZE(partsupp) [..]';
which causes wrong plan and outcome:
NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting, failed */
and apparently proper advice like below which has better yield:
set pg_plan_advice.advice = '[..] NESTED_LOOP_PLAIN(part partsupp)
NESTED_LOOP_MATERIALIZE(partsupp) [..]';
which is not generated , but caused good plan, however it also prints:
NESTED_LOOP_PLAIN(part) /* matched, conflicting, failed */
NESTED_LOOP_MATERIALIZE(partsupp) /* matched, conflicting */
but that seems "failed" there, seems to be untrue?
Another idea is perhaps, we could have some elog(WARNING) - but not
Asserts() - in assert-only enabled build that could alert us in case
of duplicated entries being detected for the same ops in
pg_plan_advice_explain_feedback()?
-J.