On Thu, Oct 6, 2016 at 12:11 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Wed, Oct 5, 2016 at 7:28 AM, Masahiko Sawada <sawada.m...@gmail.com>
> wrote:
>> Hi all,
>> I found the kind of strange behaviour of the autovacuum launcher
>> process when XID anti-wraparound vacuum.
>> Suppose that a database (say test_db) whose age of frozenxid is about
>> to reach max_autovacuum_max_age has three tables T1 and T2.
>> T1 is very large and is frequently updated, so vacuum takes long time
>> for vacuum.
>> T2 is static and already frozen table, thus vacuum can skip to vacuum
>> whole table.
>> And anti-wraparound vacuum was already executed on other databases.
>> Once the age of datfrozenxid of test_db exceeded
>> max_autovacuum_max_age, autovacuum launcher launches worker process in
>> order to do anti-wraparound vacuum on testdb.
>> A worker process assigned to test_db begins to vacuum T1, it takes long
>> time.
>> Meanwhile another worker process is assigned to test_db and completes
>> to vacuum on T2 and exits.
>> After for while, the autovacuum launcher launches new worker again and
>> worker is assigned to test_db again.
>> But that worker exits quickly because there is no table we need to
>> vacuum. (T1 is being vacuumed by another worker process).
>> When new worker process starts, worker process sends SIGUSR2 signal to
>> launcher process to wake up him.
>> Although the launcher process executes WaitLatch() after launched new
>> worker, it is woken up and launches another new worker process soon
>> again.
> See also this thread, which was never resolved:
> https://www.postgresql.org/message-id/flat/CAMkU%3D1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta%3DYPyFPQ%40mail.gmail.com#CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=ypy...@mail.gmail.com
>> As a result, launcher process launches new worker process at extremely
>> high frequency regardless of autovacuum_naptime, which increase cpu
>> use rate.
>> Why does auto vacuum worker need to wake up launcher process after
>> started?
>> autovacuum.c:L1604
>>          /* wake up the launcher */
>>         if (AutoVacuumShmem->av_launcherpid != 0)
>>             kill(AutoVacuumShmem->av_launcherpid, SIGUSR2);
> I think that that is so that the launcher can launch multiple workers in
> quick succession if it has fallen behind schedule. It can't launch them in a
> tight loop, because its signals to the postmaster would get merged into one
> signal, so it has to wait for one to get mostly set-up before launching the
> next.
> But it doesn't make any real difference to your scenario, as the short-lived
> worker will wake the launcher up a few microseconds later anyway, when it
> realizes it has no work to do and so exits.

Thank you for the reply.

I also thought that it's better to have information about how many
tables there are in each database and not been vacuumed yet.
But I'm not sure how to implement that and  the current optimistic
logic is more safe in most situation.


Masahiko Sawada
NTT Open Source Software Center

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to