On Tue, Mar 9, 2021 at 6:53 PM Tom Lane <[email protected]> wrote:
> Radoslav Nedyalkov <[email protected]> writes:
> > On Tue, Mar 9, 2021 at 6:03 PM Tom Lane <[email protected]> wrote:
> >> Perhaps the accumulation is happening on the client side? libpq doesn't
> >> have any provision for spilling a result set to disk.
>
> > Ah, I named it result set wrongly perhaps.
> > These are queries , part of a larger ETL function or statement which at
> the
> > end just write to a table.
> > The huge join is an intermediate step.
>
> Hm. What's the query plan look like?
>
> The only thing I'm aware of that could consume unbounded memory
> on the server side is hash aggregation. (v13 has improved that
> situation, but I'm guessing you are running some older version.)
> The planner won't choose hash aggregation if it estimates that
> the hash table would need to exceed work_mem, but its estimate
> could be wrong. If that's the scenario, you could back off
> work_mem until hash aggregation isn't used, or you could try to
> improve the planner's estimates. If your ETL process involves
> intermediate tables, you might need to explicitly ANALYZE those
> after you fill them so that the planner has a better idea how
> to plan the later steps.
>
> regards, tom lanea
Thanks Tom,
Query and plan attached.
Rado
db=# EXPLAIN
db-# CREATE TEMP TABLE gross_set_merchants AS (
db(# WITH gs_merchants as (
db(# SELECT
db(# u.merchant_id,
db(# uel.user_id
db(# FROM public.user_event_logs as uel
db(# JOIN users as u ON uel.user_id = u.id
db(# where
db(# --u.merchant_id = 1030616
db(# uel.created_at >= '2018-01-01'
db(# AND uel.category = 'settings'
db(# AND uel.description IN ('Gross settlement
enabled','Gross settlement disabled')
db(# GROUP BY 1,2
db(# ), setting_change as (
db(# SELECT
db(# gsm.merchant_id,
db(# cer.date,
db(# SUM(CASE
db(# WHEN uel.description = 'Gross
settlement enabled' THEN 1
db(# WHEN uel.description = 'Gross
settlement disabled' THEN -1
db(# ELSE 0
db(# END) change_setting
db(# FROM external.currency_exchange_rates as cer
db(# CROSS JOIN gs_merchants as gsm --full join, having
all days for all gs merchants
db(# LEFT JOIN public.user_event_logs as uel ON
uel.created_at::date = cer.date
db(# AND gsm.user_id = uel.user_id
db(# AND uel.created_at >= '2018-01-01'
db(# AND uel.category = 'settings'
db(# AND uel.description IN ('Gross settlement
enabled','Gross settlement disabled')
db(# WHERE cer.currency = 'GBP'
db(# AND cer.date >= '2018-01-01'
db(# GROUP BY 1,2
db(#
db(# )
db(#
db(# SELECT
db(# sc.merchant_id,
db(# sc.date,
db(# SUM(change_setting) OVER (PARTITION BY
sc.merchant_id ORDER BY date) as gs_on
db(# FROM setting_change as sc);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=183885316.97..183890372.97 rows=252800 width=40)
CTE gs_merchants
-> Group (cost=2475689.43..2534384.63 rows=567218 width=8)
Group Key: u.merchant_id, uel.user_id
-> Gather Merge (cost=2475689.43..2532021.22 rows=472682 width=8)
Workers Planned: 2
-> Group (cost=2474689.41..2476461.97 rows=236341 width=8)
Group Key: u.merchant_id, uel.user_id
-> Sort (cost=2474689.41..2475280.26 rows=236341
width=8)
Sort Key: u.merchant_id, uel.user_id
-> Parallel Hash Join
(cost=265272.50..2453595.37 rows=236341 width=8)
Hash Cond: (uel.user_id = u.id)
-> Parallel Seq Scan on user_event_logs uel
(cost=0.00..2187702.48 rows=236341 width=4)
Filter: ((created_at >= '2018-01-01
00:00:00'::timestamp without time zone) AND ((description)::text = ANY
('{"Gross settlement enabled","Gross settlement disabled"}'::text[])) AND
((category)::text = 'settings'::text))
-> Parallel Hash
(cost=209818.33..209818.33 rows=4436333 width=8)
-> Parallel Seq Scan on users u
(cost=0.00..209818.33 rows=4436333 width=8)
CTE setting_change
-> HashAggregate (cost=181320662.52..181323190.52 rows=252800 width=16)
Group Key: gsm.merchant_id, cer.date
-> Merge Left Join (cost=161250580.17..170174828.82 rows=891666696
width=71)
Merge Cond: ((gsm.user_id = uel_1.user_id) AND (cer.date =
((uel_1.created_at)::date)))
-> Sort (cost=158950948.16..161180114.90 rows=891666696
width=12)
Sort Key: gsm.user_id, cer.date
-> Nested Loop (cost=0.29..11157645.37 rows=891666696
width=12)
-> CTE Scan on gs_merchants gsm
(cost=0.00..11344.36 rows=567218 width=8)
-> Materialize (cost=0.29..471.24 rows=1572
width=4)
-> Index Only Scan using
currency_exchange_rates_pk on currency_exchange_rates cer (cost=0.29..463.38
rows=1572 width=4)
Index Cond: ((date >=
'2018-01-01'::date) AND (currency = 'GBP'::text))
-> Sort (cost=2299632.01..2301050.06 rows=567218 width=75)
Sort Key: uel_1.user_id, ((uel_1.created_at)::date)
-> Gather (cost=1000.00..2245424.28 rows=567218
width=75)
Workers Planned: 2
-> Parallel Seq Scan on user_event_logs uel_1
(cost=0.00..2187702.48 rows=236341 width=75)
Filter: ((created_at >= '2018-01-01
00:00:00'::timestamp without time zone) AND ((description)::text = ANY
('{"Gross settlement enabled","Gross settlement disabled"}'::text[])) AND
((category)::text = 'settings'::text))
-> Sort (cost=27741.81..28373.81 rows=252800 width=16)
Sort Key: sc.merchant_id, sc.date
-> CTE Scan on setting_change sc (cost=0.00..5056.00 rows=252800
width=16)
(37 rows)
Time: 15.080 ms
db=#