Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

Virendra Kumar Thu, 14 May 2020 13:46:27 -0700

You might have index fragmentation and possibly reindexing them conncurrently 
on PG12, should do that. As everyone agreed most of space will be marked for 
re-use later for table segments but indices in your case could be problem. On 
previous versions you can use pg_repack to do index rebuilds which will help 
reduce space consumed.

Regards,
Virendra 

    On Thursday, May 14, 2020, 1:20:41 PM PDT, Eduard Rozenberg 
<[email protected]> wrote:  

 @Adrian thanks again.

I read the postgres docs the same way - that previously used space is marked as 
available and therefore no need for vacuum full. Previously used = now 
available space, which gets re-used, in theory.

And yet this same DB is growing at 30-50 GB weekly, despite the fact that 2.4 
TB of previously used space is clearly available ("clearly available" as proven 
by vacuum full shrinking the DB space usage by 2.4 TB). I did verify 
postgresql.conf has always been properly configured re: autovacuum:  
'autovacuum = on'and 'track_counts = on'

I'm not planning on running VACUUM FULL regularly, just "this one time". And I 
was trying to to parallelize VACUUM FULL and minimize downtime of the 
production DB caused by table locking. And then I found the option of using 
"vacuumdb --jobs" which sounded like the perfect solution except for "well you 
can't actually use --jobs because you'll run into a deadlock and everybody 
knows that and nobody has a (good) solution for it" :).

--Ed

> On May 14, 2020, at 11:46, Adrian Klaver <[email protected]> wrote:
> 
> On 5/14/20 11:27 AM, Eduard Rozenberg wrote:
>> @Adrian thanks.
>> I did a test vacuum full the "normal" non-parallel way (VACUUM FULL sql) 
>> previously on a test db copy and saw the DB size (postgres 9.6) shrink from 
>> 6.8 TB to 4.4 TB of actual, real disk space usage ($ df -d 1).
>> I don't know the reason so much space was "locked up" (other than there is a 
>> lot of data "churn" from daily updates). But I definitely do need to do the 
>> vac full on the production db to get down to the smaller size - cannot 
>> afford the 2.4 TB of "wasted" space on an ongoing basis.
> 
> It may not be wasted space. A regular VACUUM marks space within a table 
> available for reuse(and not returned to OS) when it removes unneeded tuples. 
> It then fills that space up with new tuples, roughly speaking. So if the 
> vacuuming is working properly you will reach a steady state where space is 
> reused and the database on disk size grows slowly as reusable space is 
> occupied. I would monitor the database size on a regular basis. My guess is 
> that the VACUUM FULL is dropping the OS used space and then it fills up again 
> as the database does those updates.
> 
>> Based on your email it sounds like the vacuuming of those pg_catalog.* 
>> tables is done regardless, as a normal part of doing vac full on my own 
>> database.
>> Unfortunately I still don't see an ideal way to run vacuum full in parallel 
>> via vacuumdb without running into the expected and documented deadlock. Only 
>> method I'm aware of is to list each table individually with "-t table1 -t 
>> table2..." to "vacuum db --jobs" which is not pleasant and not exceedingly 
>> beautiful.
>> Thanks.
> 
> 
> -- 
> Adrian Klaver
> [email protected]

Re: vacuumdb --jobs deadlock: how to avoid pg_catalog ?

Reply via email to