Hi,
Thanks for taking a look.
> On 30 Sep 2024, at 14:14, Ashutosh Bapat <[email protected]> wrote:
>
> Hi Michal,
> It is difficult to understand the exact problem from your description.
> Can you please provide EXPLAIN outputs showing the expected plan and
> the unexpected plan; plans on the node where the query is run and
> where the partitions are located.
The table structure is as follows:
CREATE TABLE tbl (…) PARTITION BY RANGE year(col02_date)
CREATE TABLE tbl_2015 PARTITION OF tbl FOR VALUES BETWEEN (2023) AND (2024)
PARTITION BY HASH (col01_no)
… subsequent years
CREATE TABLE tbl_2021_32_9 PARTITION OF tbl_2021 FOR VALUES WITH (MODULUS 32
REMAINDER 9)
…
CREATE FOREIGN TABLE tbl_2022_16_9 PARTITION OF tbl_2022 FOR VALUES WITH
(MODULUS 32 REMAINDER 9)
All tables are freshly ANALYZEd.
I have a function:
CREATE FUNCTION report(col01 text, year_from, month_from, year_to, month_to)
RETURNS … LANGUAGE sql
$$
SELECT
col01_no, year(col02_date), month(col02_date), sum(col03) FROM tbl WHERE
col02_no = col02 AND (col02_date conditions) GROUP BY 1, 2, 3
$$
EXPLAIN (…) SELECT * FROM report(…);
gives
Plan 1 (performs pushdown of aggregates):
Append (cost=9.33..76322501.41 rows=3 width=58) (actual time=5.051..6.414
rows=21 loops=1)
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001
rows=1 loops=1)
Output: '2021-01-01'::date
InitPlan 2 (returns $1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001
rows=1 loops=1)
Output: '2023-12-31'::date
-> GroupAggregate (cost=9.31..9.85 rows=1 width=58) (actual
time=0.073..0.074 rows=0 loops=1)
Output: op.col01_no, (year(op.col02_date)), (month(op.col02_date)),
sum(op.col03) FILTER (WHERE (op.debit_flag = '0'::bpchar)), sum(op.col03)
FILTER (WHERE (op.debit_flag <> '0'::bpchar)), count(1)
Group Key: (year(op.col02_date)), (month(op.col02_date))
-> Sort (cost=9.31..9.32 rows=1 width=44) (actual time=0.072..0.073
rows=0 loops=1)
Output: (year(op.col02_date)), (month(op.col02_date)),
op.col01_no, op.col03, op.debit_flag
Sort Key: (year(op.col02_date)), (month(op.col02_date))
Sort Method: quicksort Memory: 25kB
-> Index Only Scan using
tbl_2021_32_9_universal_gist_idx_3a2df25af5bc48a on cbt.tbl_2021_32_9 op
(cost=0.28..9.30 rows=1 width=44) (actual time=0.063..0.063 rows=0 loops=1)
Output: year(op.col02_date), month(op.col02_date),
op.col01_no, op.col03, op.debit_flag
Index Cond: ((op.col01_no =
'22109020660000000110831697'::text) AND (op.col02_date >= $0) AND
(op.col02_date <= $1))
Filter: ((year(op.col02_date) >= 2021) AND
(year(op.col02_date) <= 2023))
Heap Fetches: 0
-> Async Foreign Scan (cost=100.02..76322480.36 rows=1 width=58) (actual
time=0.753..0.755 rows=11 loops=1)
Output: op_1.col01_no, (year(op_1.col02_date)),
(month(op_1.col02_date)), (sum(op_1.col03) FILTER (WHERE (op_1.debit_flag =
'0'::bpchar))), (sum(op_1.col03) FILTER (WHERE (op_1.debit_flag <>
'0'::bpchar))), ((count(1))::double precision)
Relations: Aggregate on
(cbt_c61d467d1b5fd1d218d9e6e7dd44a333.tbl_2022_16_9 op_1)
Remote SQL: SELECT col01_no, cbt.year(col02_date),
cbt.month(col02_date), sum(col03) FILTER (WHERE (debit_flag = '0')), sum(col03)
FILTER (WHERE (debit_flag <> '0')), count(1) FROM cbt.tbl_2022_16_9 WHERE
((cbt.year(col02_date) >= 2021)) AND ((cbt.year(col02_date) <= 2023)) AND
((col02_date >= $1::date)) AND ((col02_date <= $2::date)) AND ((col01_no =
'22109020660000000110831697')) GROUP BY 1, 2, 3
-> GroupAggregate (cost=10.63..11.17 rows=1 width=58) (actual
time=4.266..4.423 rows=10 loops=1)
Output: op_2.col01_no, (year(op_2.col02_date)),
(month(op_2.col02_date)), sum(op_2.col03) FILTER (WHERE (op_2.debit_flag =
'0'::bpchar)), sum(op_2.col03) FILTER (WHERE (op_2.debit_flag <> '0'::bpchar)),
count(1)
Group Key: (year(op_2.col02_date)), (month(op_2.col02_date))
-> Sort (cost=10.63..10.64 rows=1 width=44) (actual time=4.238..4.273
rows=735 loops=1)
Output: (year(op_2.col02_date)), (month(op_2.col02_date)),
op_2.col01_no, op_2.col03, op_2.debit_flag
Sort Key: (year(op_2.col02_date)), (month(op_2.col02_date))
Sort Method: quicksort Memory: 82kB
-> Index Only Scan using
tbl_2023_128_9_universal_gist_idx_3a2df25af5bc48a on cbt.tbl_2023_128_9 op_2
(cost=0.54..10.62 rows=1 width=44) (actual time=0.295..4.059 rows=735 loops=1)
Output: year(op_2.col02_date), month(op_2.col02_date),
op_2.col01_no, op_2.col03, op_2.debit_flag
Index Cond: ((op_2.col01_no =
'22109020660000000110831697'::text) AND (op_2.col02_date >= $0) AND
(op_2.col02_date <= $1))
Filter: ((year(op_2.col02_date) >= 2021) AND
(year(op_2.col02_date) <= 2023))
Heap Fetches: 0
BUT after I perform
CREATE OR REPLACE report(…)
STRICT TO … AS … (same code)
The plan (as reported by auto_explain) changes to:
Plan 2 (no pushdown):
GroupAggregate (cost=1781983216.68..1781983324.62 rows=200 width=58) (actual
time=16.065..16.432 rows=21 loops=1)
Output: op.col01_no, (year(op.col02_date)), (month(op.col02_date)),
sum(op.col03) FILTER (WHERE (op.debit_flag = '0'::bpchar)), sum(op.col03)
FILTER (WHERE (op.debit_flag <> '0'::bpchar)), count(1)
Group Key: (year(op.col02_date)), (month(op.col02_date))
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
rows=1 loops=1)
Output: make_date($2, $3, 1)
InitPlan 2 (returns $1)
-> Result (cost=0.00..0.02 rows=1 width=4) (actual time=0.002..0.003
rows=1 loops=1)
Output: (((make_date($4, $5, 1) + '1 mon'::interval) - '1
day'::interval))::date
-> Sort (cost=1781983216.65..1781983217.33 rows=272 width=44) (actual
time=16.041..16.117 rows=1564 loops=1)
Output: (year(op.col02_date)), (month(op.col02_date)), op.col01_no,
op.col03, op.debit_flag
Sort Key: (year(op.col02_date)), (month(op.col02_date))
Sort Method: quicksort Memory: 171kB
-> Append (cost=0.55..1781983205.65 rows=272 width=44) (actual
time=1.013..15.445 rows=1564 loops=1)
Subplans Removed: 269
-> Index Only Scan using
accoper_2021_32_9_universal_gist_idx_3a2df25af5bc48a on cbt.accoper_2021_32_9
op_1 (cost=0.28..9.30 rows=1 width=44) (actual time=0.084..0.084 rows=0
loops=1)
Output: year(op_1.col02_date), month(op_1.col02_date),
op_1.col01_no, op_1.col03, op_1.debit_flag
Index Cond: ((op_1.col01_no = $1) AND (op_1.col02_date >=
$0) AND (op_1.col02_date <= $1))
Filter: ((year(op_1.col02_date) >= $2) AND
(year(op_1.col02_date) <= $4))
Heap Fetches: 0
-> Async Foreign Scan on
cbt_c61d467d1b5fd1d218d9e6e7dd44a333.accoper_2022_16_9 op_2
(cost=100.00..76322479.83 rows=1 width=44) (actual time=0.658..3.870 rows=829
loops=1)
Output: year(op_2.col02_date), month(op_2.col02_date),
op_2.col01_no, op_2.col03, op_2.debit_flag
Remote SQL: SELECT col03, col02_date, debit_flag, col01_no
FROM cbt.accoper_2022_16_9 WHERE ((col02_date >= $1::date)) AND ((col02_date <=
$2::date)) AND ((col01_no = $3::text)) AND ((cbt.year(col02_date) >=
$4::integer)) AND ((cbt.year(col02_date) <= $5::integer))
-> Index Only Scan using
accoper_2023_128_9_universal_gist_idx_3a2df25af5bc48a on cbt.accoper_2023_128_9
op_3 (cost=0.54..10.62 rows=1 width=44) (actual time=0.361..4.043 rows=735
loops=1)
Output: year(op_3.col02_date), month(op_3.col02_date),
op_3.col01_no, op_3.col03, op_3.debit_flag
Index Cond: ((op_3.col01_no = $1) AND (op_3.col02_date >=
$0) AND (op_3.col02_date <= $1))
Filter: ((year(op_3.col02_date) >= $2) AND
(year(op_3.col02_date) <= $4))
Heap Fetches: 0
I understand wrong rows estimates ( =1 ) are due to missing statistics on
expressions year() and month().
But why plans are different?
BTW. I tried to add extended statistics on the above expressions but ANALYZE
didn’t seem to update any values (as seen in pg_stat_ext) - ndistinct is NULL
for example.
Thanks,
--
Michal