Daniel,
You have a commit [1] that MIGHT fix this.
I have a script that recreates the problem, using random data in pg_temp.
And a nested cursor.
It took me a few days to reduce this from actual code that was
experiencing this. If I turn off JIT, the problem goes away. (if I don't
FETCH the first row, the memory loss does not happen. Maybe because
opening a cursor is more decoration/prepare)
I don't have an easy way to test this script right now against the commit.
I am hopeful that your fix fixes this.
This was my first OOM issue in PG in 3yrs of working with it.
The problem goes away if the TABLE is analyzed, or JIT is disabled.
The current script, if run, will consume about 25% of my system memory
(10GB).
Just call the function below until it dies if that's what you need. The
only way to get the memory back down is to close the connection.
SELECT pg_temp.fx(497);
Surprisingly, to me, the report from pg_get_backend_memory_contexts()
doesn't really show "missing memory", which I thought it would. (FWIW, we
caught this with multiple rounds of testing our code, slowing down, then
crashing... Is there ANY way to interrogate that we are above X% of system
memory so we know to let this backend go?)
It takes about 18 minutes to run on my 4 CPU VM.
For now, we are going to add some ANALYZE statements to our code.
We will consider disabling JIT.
Thanks,
Kirk
[1] = 2cf50585e54a7b0c6bc62a087c69043ae57e4252
<https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2cf50585e54a7b0c6bc62a087c69043ae57e4252>
CREATE TABLE pg_temp.parts
(
seid bigint,
r_field_name_1 smallint,
fr_field_name smallint NOT NULL,
p1_field_name varchar(4),
qty_field_name integer,
p5_field_name varchar(30),
partnum varchar(30),
st_field_name smallint DEFAULT 0 NOT NULL
); -- drop table pg_temp.parts;
INSERT INTO pg_temp.parts (seid, partnum, qty_field_name, fr_field_name,
st_field_name)
SELECT (RANDOM() * 3821 + 1)::bigint AS seid,
(RANDOM() * 123456789)::text AS
partnum,
CASE
WHEN q.rnd BETWEEN 0 AND 0.45 THEN FLOOR(RANDOM() * 900) + 100 --
Random number in the range [100, 999]
WHEN q.rnd BETWEEN 0.46 AND 0.96 THEN LEAST(TRUNC(FLOOR(RANDOM() *
999999) + 1000)::int, 999999::int) -- Random number in the range [1000, 9999]
ELSE FLOOR(RANDOM() * 9000000) + 1000000 -- Random number in the
range [100000, 999999]
END AS
qty_field_name,
CASE WHEN RANDOM() < 0.72 THEN 0::smallint ELSE 1::smallint END AS
fr_field_name,
CASE WHEN RANDOM() < 0.46 THEN 1::smallint ELSE 2::smallint END AS
st_field_name
FROM (SELECT RANDOM() AS rnd, x FROM GENERATE_SERIES(1, 90_000_000) x) q;
CREATE INDEX idx_parts_supid ON pg_temp.parts USING btree (seid, p1_field_name,
partnum, st_field_name, r_field_name_1, qty_field_name);
CREATE INDEX idx_parts_p5 ON pg_temp.parts USING btree (p5_field_name, seid,
st_field_name, r_field_name_1, p1_field_name);
CREATE INDEX idx_parts_partnum ON pg_temp.parts USING btree (partnum, seid,
st_field_name, r_field_name_1, p1_field_name);
CREATE OR REPLACE FUNCTION pg_temp.fx(asupplier bigint = 497 )
RETURNS void
LANGUAGE plpgsql
AS
$function$
DECLARE
supplier_parts CURSOR (sid bigint) FOR -- Again, selecting with
COUNT() would reduce 1 query per row!
SELECT
partnum, qty_field_name, st_field_name, sum(qty_field_name) as qty
FROM pg_temp.parts
WHERE seid = sid AND (st_field_name = 1)
GROUP BY partnum, qty_field_name, st_field_name
ORDER BY partnum, qty_field_name, st_field_name;
supplier_part_qty_matches CURSOR (sid bigint, pnum varchar(30), pqty
bigint) FOR
SELECT DISTINCT
seid, fr_field_name, partnum, st_field_name
FROM pg_temp.parts
WHERE seid <> sid AND partnum = pnum AND qty_field_name = pqty
ORDER BY seid, partnum;
a_partnum varchar(30);
a_qty integer;
a_st smallint;
a_cnt integer = 0;
b_partnum varchar(30);
b_fr smallint;
b_seid bigint;
b_st smallint;
b_cnt bigint = 0;
BEGIN
RAISE NOTICE '%', (SELECT (PG_SIZE_PRETTY(SUM(used_bytes)),
PG_SIZE_PRETTY(SUM(total_bytes)), PG_SIZE_PRETTY(SUM(free_bytes))) FROM
pg_get_backend_memory_contexts());
OPEN supplier_parts (asupplier);
LOOP
FETCH supplier_parts INTO a_partnum, a_qty, a_st, a_qty;
EXIT WHEN NOT FOUND;
a_cnt := a_cnt + 1;
OPEN supplier_part_qty_matches (sid := asupplier, pnum := a_partnum,
pqty := a_qty);
LOOP
FETCH supplier_part_qty_matches INTO b_seid, b_fr, b_partnum, b_st;
b_cnt := b_cnt + 1;
EXIT WHEN TRUE; -- no Need to loop here One FETCH per query
triggers the losses.
END LOOP;
CLOSE supplier_part_qty_matches;
END LOOP;
CLOSE supplier_parts;
RAISE NOTICE '-----------after close, Count a: %, count b: %', a_cnt, b_cnt;
RAISE NOTICE '%', (SELECT (PG_SIZE_PRETTY(SUM(used_bytes)),
PG_SIZE_PRETTY(SUM(total_bytes)), PG_SIZE_PRETTY(SUM(free_bytes))) FROM
pg_get_backend_memory_contexts());
--perform meminfo();
END;
$function$;
-- This will use JIT until the table is analyzed, which causes the problem
explain SELECT DISTINCT seid, fr_field_name, st_field_name
FROM pg_temp.parts
WHERE seid <> 497 AND partnum >= '1'
ORDER BY seid;
-- But using JIT results in loss of some bytes.
SELECT pg_temp.fx(497);