Hi hackers,,

I have a question regarding the behavior of the auto VACUUM in PostgreSQL in 
the context of using stored procedures with sub-transactions.


As I understand it, the parameters that control the initiation of VACUUM are 
set in the configuration file, such as autovacuum_vacuum_threshold, 
autovacuum_vacuum_scale_factor, and are stored in the system tables 
pg_stat_user_tables or pg_class (please correct me if I'm wrong). These system 
tables are updated after each completed transaction, and VACUUM analyzes them 
to determine whether to clean up dead rows, depending on the configured 
thresholds.


Here is the scenario: we have several stored procedures that modify or update 
table data. These procedures use sub-transactions, which are committed via 
COMMIT. However, from my understanding, the system table, which VACUUM checks, 
is not updated until the main (outermost) transaction completes. This means 
that during the execution of the procedures, a significant number of dead rows 
may accumulate, and only after the final COMMIT of the main transaction do 
these dead rows become visible for VACUUM.


As a result, there could be a sharp spike in CPU usage when VACUUM runs after 
the completion of the procedures, as it begins to clean up a large number of 
accumulated dead rows.

I would like to know if this behavior is expected and correct? Or could there 
be a potential issue or bug in this scenario?


To illustrate the issue, here's an example:

CREATE TABLE bloat
(
    id integer generated always as identity,
    d timestamptz
);

CREATE OR REPLACE PROCEDURE update_multiple_bloat()
LANGUAGE plpgsql
AS $$
DECLARE
    row_to_update RECORD;
BEGIN
    FOR row_to_update IN SELECT * FROM bloat
    LOOP
        UPDATE bloat SET d = CURRENT_TIMESTAMP WHERE d = row_to_update.d;
        COMMIT;
    END LOOP;
END;
$$;

CREATE OR REPLACE PROCEDURE insert_multiple_into_bloat(num_records integer)
LANGUAGE plpgsql
AS $$
DECLARE
    i integer := 1;
BEGIN
    LOOP
        EXIT WHEN i > num_records;
        INSERT INTO bloat (d) VALUES (CURRENT_TIMESTAMP);
        i := i + 1;
        COMMIT;
    END LOOP;
END;
$$;

DO $$
DECLARE
    row_data RECORD;
    counter INT := 0;
BEGIN

    BEGIN
        INSERT INTO bloat (d) VALUES (CURRENT_TIMESTAMP);
        COMMIT;
    END;?

    BEGIN
        call insert_multiple_into_bloat(100);
    END;

    BEGIN
        call update_multiple_bloat();
    END;

END $$;

Thank you in advance for your help!

With Regards,
Vyacheslav Kirillov!

Reply via email to