On Wed, Oct 5, 2016 at 7:28 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> Hi all, > > I found the kind of strange behaviour of the autovacuum launcher > process when XID anti-wraparound vacuum. > > Suppose that a database (say test_db) whose age of frozenxid is about > to reach max_autovacuum_max_age has three tables T1 and T2. > T1 is very large and is frequently updated, so vacuum takes long time > for vacuum. > T2 is static and already frozen table, thus vacuum can skip to vacuum > whole table. > And anti-wraparound vacuum was already executed on other databases. > > Once the age of datfrozenxid of test_db exceeded > max_autovacuum_max_age, autovacuum launcher launches worker process in > order to do anti-wraparound vacuum on testdb. > A worker process assigned to test_db begins to vacuum T1, it takes long > time. > Meanwhile another worker process is assigned to test_db and completes > to vacuum on T2 and exits. > > After for while, the autovacuum launcher launches new worker again and > worker is assigned to test_db again. > But that worker exits quickly because there is no table we need to > vacuum. (T1 is being vacuumed by another worker process). > When new worker process starts, worker process sends SIGUSR2 signal to > launcher process to wake up him. > Although the launcher process executes WaitLatch() after launched new > worker, it is woken up and launches another new worker process soon > again. > See also this thread, which was never resolved: https://www.postgresql.org/message-id/flat/CAMkU%3D1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta%3DYPyFPQ%40mail.gmail.com#CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=ypy...@mail.gmail.com > As a result, launcher process launches new worker process at extremely > high frequency regardless of autovacuum_naptime, which increase cpu > use rate. > > Why does auto vacuum worker need to wake up launcher process after started? > > autovacuum.c:L1604 > /* wake up the launcher */ > if (AutoVacuumShmem->av_launcherpid != 0) > kill(AutoVacuumShmem->av_launcherpid, SIGUSR2); > I think that that is so that the launcher can launch multiple workers in quick succession if it has fallen behind schedule. It can't launch them in a tight loop, because its signals to the postmaster would get merged into one signal, so it has to wait for one to get mostly set-up before launching the next. But it doesn't make any real difference to your scenario, as the short-lived worker will wake the launcher up a few microseconds later anyway, when it realizes it has no work to do and so exits. Cheers, Jeff