On 27 Feb 2026, at 09:15, Andrei Lepikhov <[email protected]> wrote:
>
> -> Hash Right Join (cost=210369.25..210370.30 rows=8 width=99)
> (actual time=150.790..150.853 rows=44.56 loops=21798)
>
> Schema of this part of the query tree is as the following:
>
> Hash Right Join (loops=21798)
> │
> ├─ [Left/Probe] GroupAggregate (loops=14426)
> │ └─ Merge Right Anti Join
> │ └─ Merge Join
> │ └─ Index Only Scan on table_k gkal_2 (loops=14426)
> │
> └─ [Right/Build = Hash] Nested Loop (loops=21798)
> ├─ Index Scan on table_o goftr_1 (loops=21798)
> │ Index Cond: goftr_1.au_id = gauf_1.id <http://gauf_1.id/>
> └─ Index Scan on table_k gkal_1
> Index Cond: gkal_1.oo_id = goftr_1.id <http://goftr_1.id/>
>
> So, the hash table is rebuilt each rescan based on the changed 'gauf_1.id
> <http://gauf_1.id/>' external parameter.
> Without the query, it is hard to say exactly what the trigger of this problem
> is. Having a reproduction, we could use planner advising extensions and see
> how additional knowledge of true cardinalities rebuilds the query plan.
> Sometimes, additional LATERAL restriction, added by the planner to pull-up
> subplan, restricts the join search scope badly, but I doubt if we have this
> type of problem here.
I searched for the condition kal.dp_end_dat < current_date, then realized that
this part of the explain is misleading.
Index Scan using table_k_late_spec_dp_end_dat_key on schema1.table_k kal
(cost=0.28..122468.46 rows=196053 width=24) (actual time=0.039..0.614
rows=471.00 loops=1)
Output: kal.dp_rti_id, kal.art_dp_res, kal.oo_id
Index Cond: (kal.dp_end_dat < ('now'::cstring)::date)
Index Searches: 1
Buffers: shared hit=230 read=49
I/O Timings: shared read=0.142
The definiton of the index table_k_late_spec_dp_end_dat_key is:
CREATE INDEX table_k_late_spec_dp_end_dat_key
ON schema1.table_k
USING btree
(dp_end_dat)
WHERE dp_st_dat IS NOT NULL AND dp_end_dat IS NOT NULL AND dp_status IS NOT
NULL AND dp_status > 0 AND oo_id IS NOT NULL AND COALESCE(art_rtd, 0.0000) <
(COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j, 0.0000));
This, because the where in index corresponds the where in query. so the
simplified query is:
SELECT * FROM schema1.table_k AS kal
WHERE dp_end_dat < current_date AND dp_st_dat IS NOT NULL AND dp_end_dat IS NOT
NULL AND dp_status IS NOT NULL AND dp_status > 0 AND oo_id IS NOT NULL AND
COALESCE(art_rtd, 0.0000) < (COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j,
0.0000));
The surrounding query part of the view is below, where the part with
"dp_end_dat < current_date" is in the "with late as ()":
WITH late AS (
SELECT kal.dp_rti_id AS rti_id,
sum(COALESCE(kal.art_dp_res, 0.0000)) AS sum_art_dp_late
FROM schema1.table_k kal
WHERE kal.dp_status IS NOT NULL AND kal.dp_status > 0 AND
COALESCE(kal.art_rtd, 0.0000) < (COALESCE(kal.art_grt, 0.0000) +
COALESCE(kal.art_grt_j2j, 0.0000)) AND kal.dp_st_dat IS NOT NULL AND
kal.dp_end_dat IS NOT NULL AND kal.dp_end_dat < 'now'::text::date AND kal.oo_id
IS NOT NULL
AND NOT (EXISTS (
SELECT akdt_late.oo_id
FROM schema1.table_k_dtg akdt_late — ------ this is a view
WHERE akdt_late.dp_rti_id::text = kal.dp_rti_id::text AND
akdt_late.oo_id IS NOT NULL
AND (akdt_late.art_prov_res > 0.0000 OR akdt_late.dp_status > 0 AND
akdt_late.art_dp_res > 0.0000)
AND akdt_late.datum >= 'now'::text::date
AND (akdt_late.a_status::text = ANY (ARRAY['d'::character
varying::text, 'v'::character varying::text, 'i'::character varying::text]))
AND akdt_late.ih_flag AND kal.oo_id::text = akdt_late.oo_id::text
))
GROUP BY kal.dp_rti_id
)
SELECT akd.oo_id,
akd.dp_rti_id AS rti_id,
akd.datum,
akd.lgaagng AS auf_lgaagng,
akd.rueday_def,
akd.rettag_def,
COALESCE(min(COALESCE(sum_ast_per_day.sum_per_day, 0.0000)), 0.0000) AS
sum_ast_per_day,
COALESCE(max(COALESCE(sum_red_per_day.sum_per_day, 0.0000)), 0.0000) AS
sum_red_per_day,
CASE
WHEN akd.datum > 'now'::text::date THEN COALESCE(late.sum_art_dp_late,
0.0000)
ELSE 0.0000
END AS sum_art_dp_late
FROM schema1.table_k_future_dt akd — ------ this is a view
LEFT JOIN schema1.dd_ext ext_dd ON ext_dd.id::text = akd.ext::text
LEFT JOIN schema1.dp_epkt ext_dd_dpe ON ext_dd_dpe.id::text =
ext_dd.table_d_id::text
LEFT JOIN late ON late.rti_id::text = akd.dp_rti_id::text
LEFT JOIN LATERAL (
SELECT COALESCE(sum(COALESCE(stk.anz, 0.0000)), 0.0000) AS sum_per_day
FROM schema1.al_ast stk
WHERE stk.rti_id::text = akd.dp_rti_id::text AND stk.von <= akd.datum AND
stk.bis >= akd.datum
GROUP BY akd.datum
) sum_ast_per_day ON (
EXISTS (
SELECT al_ast.rti_id
FROM schema1.al_ast
WHERE al_ast.rti_id::text = akd.dp_rti_id::text
)
)
LEFT JOIN LATERAL (
SELECT COALESCE(sum(COALESCE(alred.anz, 0.0000)), 0.0000) AS sum_per_day
FROM schema1.al_red alred
WHERE alred.rti_id::text = akd.dp_rti_id::text AND alred.von <= akd.datum
AND alred.bis >= akd.datum
GROUP BY akd.datum
) sum_red_per_day ON (
EXISTS (
SELECT al_red.rti_id
FROM schema1.al_red
WHERE al_red.rti_id::text = akd.dp_rti_id::text
)
)
WHERE (ext_dd.table_d_id IS NULL OR ext_dd.table_d_id::text = 'schema1'::text
OR NOT COALESCE(ext_dd_dpe.enabled, false))
AND akd.rti_id::text !~~ 'P%'::text AND akd.dp_rti_id::text !~~ 'P%'::text
AND (akd.art_dp_res > 0.0000 OR akd.art_prov_res > 0.0000 OR akd.art_dp_zm >
0.0000)
AND (akd.lgaagng IS NULL OR akd.lgaagng::date >= 'now'::text::date AND
akd.lgaagng::date <= ('now'::text::date + '3 mons'::interval)::date)
GROUP BY akd.oo_id, akd.dp_rti_id, akd.datum, akd.lgaagng, akd.rueday_def,
akd.rettag_def, late.sum_art_dp_late
"gauf" is in one of the table_k_* views, and looks like below. There are
multiple variants, they differ mostly in "where" part.
SELECT gdt.datum,
gkal.rti_id,
gdt.au_id,
gkal.oo_id,
gkal.id AS kal_id,
gauf.status AS a_status,
goftr_1.token AS ih_flag,
gdt.prov,
gdt.def,
gkal.dp_status,
gkal.ext,
gdt.rueday_def,
gdt.rettag_def,
gdt.rueday_prov,
gdt.rettag_prov,
gauf.lgaagng,
gauf.lgaein,
gkal.art_dp_res,
gkal.art_prov_res,
gkal.art_dp_zm,
gkal.rti,
gkal.art_dp_extern,
gkal.dp_rti_id,
gkal.art_dp_lga,
gkal.set_fix_vkpt
FROM schema1.table_a_dtg gdt
LEFT JOIN schema1.table_a gauf ON gauf.id::text = gdt.au_id::text
LEFT JOIN schema1.auf_oos goftr_1 ON goftr_1.au_id::text = gauf.id::text
LEFT JOIN schema1.table_k gkal ON gkal.oo_id::text = goftr_1.id::text
WHERE gdt.datum >= ('now'::text::date - '7 days'::interval)::date;;
I tried to change the statistics of dp_end_dat and also all of the fields in
"where" from 10 to 1500 increased in increments of 10.
One field at once then all fields together. the estimate got not better, actual
rows 471, planned rows somewhere between 180000 and 195000.
then i checked the same query on pg 14, the estimate is the same as on pg 18.
ALTER TABLE scema1.table_k ALTER dp_end_dat SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER dp_status SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER oo_id SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER art_rtd SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER art_grt SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER art_grt_j2j SET STATISTICS 140;
ALTER TABLE
ANALYZE scema1.table_k;
pg14 at statistics 140:
EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT * FROM schema1.table_k AS kal WHERE
dp_end_dat < current_date AND dp_st_dat IS NOT NULL AND dp_end_dat IS NOT NULL
AND dp_status IS NOT NULL AND dp_status > 0 AND oo_id IS NOT NULL AND
COALESCE(art_rtd, 0.0000) < (COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j,
0.0000));
Index Scan using table_k_late_spec_dp_end_dat_key on table_k kal
(cost=0.28..122750.89 rows=193091 width=614) (actual time=0.010..0.261 rows=471
loops=1)
Index Cond: (dp_end_dat < CURRENT_DATE)
Buffers: shared hit=279
Settings: hash_mem_multiplier = '2.5', jit = 'off', max_parallel_workers =
'4', max_parallel_workers_per_gather = '4', random_page_cost = '1.2',
temp_buffers = '512MB', work_mem = '768MB'
Planning:
Buffers: shared hit=1459
Planning Time: 3.101 ms
Execution Time: 0.325 ms
pg18 at statistics 140:
EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT * FROM schema1.table_k AS kal
WHERE dp_end_dat < current_date AND dp_st_dat IS NOT NULL AND dp_end_dat IS NOT
NULL AND dp_status IS NOT NULL AND dp_status > 0 AND oo_id IS NOT NULL AND
COALESCE(art_rtd, 0.0000) < (COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j,
0.0000));
Index Scan using table_k_late_spec_dp_end_dat_key on table_k kal
(cost=0.28..122561.69 rows=195550 width=624) (actual time=0.021..0.514
rows=471.00 loops=1)
Index Cond: (dp_end_dat < CURRENT_DATE)
Index Searches: 1
Buffers: shared hit=279
Settings: temp_buffers = '512MB', work_mem = '768MB', hash_mem_multiplier =
'2.5', jit = 'off', max_parallel_workers_per_gather = '4', max_parallel_workers
= '4', random_page_cost = '1.2'
Planning:
Buffers: shared hit=1508
Planning Time: 3.123 ms
Execution Time: 0.639 ms
(9 rows)
I hope I have selected the correct parts of the query, as it is not really
possible to share the entire query with all its dependencies.
The first version of this query was written for PostgreSQL 8.3; since then, it
has been refactored and optimized a few times when necessary. I will check if
it is possible to reorder the query without rewriting everything.
I dont know the inner workings of analyze, is that normal that executing
analyze on unchanged data can flip the plan? Does analyze select a random set
of rows?
Thanks.
regards,
Attila