On Fri, Nov 18, 2016 at 1:11 PM, Robert Haas <robertmh...@gmail.com> wrote:
> So now I think that we probably need to make this logic a bit smarter.
> Add all of the OIDs that need to be dropped to a list.  Then have a
> loop prior to the main loop (where it says "Perform operations on
> collected tables.") which iterates over that list and drops those
> tables one by one, starting a transaction every (say) 100 tables or
> after an error.  For bonus points, if a transaction fails, put all of
> the OIDs except the one that provoked the failure back into the list
> of OIDs to be dropped, so that we still make a progress even if some
> DROPs are failing for some reason.


> That might sound adding unnecessary work just for the sake of
> paranoia, but I don't think it is.  Failures here won't be common, but
> since they are entirely automated there will be no human intelligence
> available to straighten things out.  Barring considerable caution,
> we'll just enter a flaming death spiral.

Thinking more paranoid, an extra way to enter in this flaming death
spiral is to not limit the maximum number of failures authorized when
dropping a set of orphaned tables and transactions fail multiple
times. This is basically not important as the relation on which the
drop failed gets dropped from the list but failing on each one of them
is a good way to slow down autovacuum, so putting a limit of say 10
transactions failing is I think really important.

I have played with what you suggested, and finished with the patch
attached. I have run some tests using this function to create some
temp tables with several backends to be sure that multiple backend IDs
are used:

CREATE FUNCTION create_temp_tables(i int) RETURNS void
AS $$
FOR i IN 1..i LOOP
  EXECUTE 'CREATE TEMP TABLE aa' || i || ' (a int);';
$$ LANGUAGE plpgsql;

Then I killed the instance. At restart I could see a bunch of temp
tables in pg_class, and I let autovacuum do the cleanup after restart.
I have tested as well the error code path in the PG_TRY() block by
enforcing manually a elog(ERROR) to be sure that the maximum number of
failures is correctly handled, better safe than sorry.

Attachment: autovacuum-orphan-cleanup-v3.patch
Description: application/download

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to