Re: POC: Parallel processing of indexes in autovacuum

Masahiko Sawada Thu, 02 Apr 2026 16:31:38 -0700

On Thu, Apr 2, 2026 at 4:02 AM Alexander Korotkov <[email protected]> wrote:
>
> Hi!
>
> On Wed, Apr 1, 2026 at 9:55 PM Masahiko Sawada <[email protected]> wrote:
> >
> > On Mon, Mar 30, 2026 at 5:14 PM SATYANARAYANA NARLAPURAM
> > <[email protected]> wrote:
> > >
> > > Hi
> > >
> > > On Mon, Mar 30, 2026 at 1:44 AM Daniil Davydov <[email protected]> 
> > > wrote:
> > >>
> > >> Hi,
> > >>
> > >> On Mon, Mar 30, 2026 at 7:17 AM SATYANARAYANA NARLAPURAM
> > >> <[email protected]> wrote:
> > >> >
> > >> > Thank you for working on this, very useful feature. Sharing a few 
> > >> > thoughts:
> > >> >
> > >> > 1. Shouldn't we also cap by max_parallel_workers to avoid wasting DSM 
> > >> > resources in parallel_vacuum_compute_workers?
> > >>
> > >> Actually, autovacuum_max_parallel_workers is already limited by
> > >> max_parallel_workers. It is not clear for me why we allow setting this 
> > >> GUC
> > >> higher than max_parallel_workers, but if this happens, I think it is a 
> > >> user's
> > >> misconfiguration.
> > >>
> > >> > 2. Is it intentional that other autovacuum workers not yield cost 
> > >> > limits to the parallel auto vacuum workers? Cost limits are 
> > >> > distributed first equally to the autovacuum workers.
> > >> > and then they share that. Therefore, parallel workers will be heavily 
> > >> > throttled. IIUC, this problem doesn't exist with manual vacuum.
> > >> >  If we don't fix this, at least we should document this.
> > >>
> > >> Parallel a/v workers inherit cost based parameters (including the
> > >> vacuum_cost_limit) from the leader worker. Do you mean that this can be 
> > >> too
> > >> low value for parallel operation? If so, user can manually increase the
> > >> vacuum_cost_limit reloption for those tables, where parallel a/v sleeps 
> > >> too
> > >> much (due to cost delay).
> > >>
> > >> BTW, describing the cost limit propagation to the parallel a/v workers is
> > >> worth mentioning in the documentation. I'll add it in the next patch 
> > >> version.
> > >>
> > >> > 3. Additionally, is there a point where, based on the cost limits, 
> > >> > launching additional workers becomes counterproductive compared to 
> > >> > running fewer workers and preventing it?
> > >>
> > >> I don't think that we can possibly find a universal limit that will be
> > >> appropriate for all possible configurations. By now we are using a pretty
> > >> simple formula for parallel degree calculation. Since user have several 
> > >> ways
> > >> to affect this formula, I guess that there will be no problems with it 
> > >> (except
> > >> my concerns about opt-out style).
> > >>
> > >> > 4. Would it make sense to add a table level override to disable 
> > >> > parallelism or set parallel worker count?
> > >>
> > >> We already have the "autovacuum_parallel_workers" reloption that is used 
> > >> as
> > >> an additional limit for the number of parallel workers. In particular, 
> > >> this
> > >> reloption can be used to disable parallelism at all.
> > >>
> > >> >
> > >> > I ran some perf tests to show the improvements with parallel vacuum 
> > >> > and shared below.
> > >>
> > >> Thank you very much!
> > >>
> > >> > Observations:
> > >> >
> > >> > 1. Parallel autovacuum provides consistent speedup. With 
> > >> > cost_limit=200 and
> > >> >    7 workers, vacuum completes 1.41x faster (71s -> 50s). With 
> > >> > cost_limit=60,
> > >> >    the speedup is 1.25x (194s -> 154s).
> > >> > 2. I see the benefit comes from parallelizing index vacuum. With 8 
> > >> > indexes totaling
> > >> >    ~530 MB, parallel workers scan indexes concurrently instead of the 
> > >> > leader
> > >> >    scanning them one by one. The leader's CPU user time drops from ~3s 
> > >> > to
> > >> >    ~0.8s as index work is offloaded
> > >> >
> > >>
> > >> 1.41 speedup with 7 parallel workers may not seem like a great win, but 
> > >> it is
> > >> a whole time of autovacuum operation (not only index bulkdel/cleanup) 
> > >> with
> > >> pretty small indexes.
> > >>
> > >> May I ask you to run the same test with a higher table's size (several 
> > >> dozen
> > >> gigabytes)? I think the results will be more "expressive".
> > >
> > >
> > > I ran it with a Billion rows in a table with 8 indexes. The improvement 
> > > with 7 workers is 1.8x.
> > > Please note that there is a fixed overhead in other vacuum steps, for 
> > > example heap scan.
> > > In the environments where cost-based delay is used (the default), 
> > > benefits will be modest
> > > unless vacuum_cost_delay is set to sufficiently large value.
> > >
> > > Hardware:
> > >   CPU:     Intel Xeon Platinum 8573C, 1 socket × 8 cores × 2 threads = 16 
> > > vCPUs
> > >   RAM:     128 GB (131,900 MB)
> > >   Swap:    None
> > >
> > > Workload Description
> > >
> > > Table Schema:
> > >   CREATE TABLE avtest (
> > >       id       bigint PRIMARY KEY,
> > >       col1     int,           -- random()*1e9
> > >       col2     int,           -- random()*1e9
> > >       col3     int,           -- random()*1e9
> > >       col4     int,           -- random()*1e9
> > >       col5     int,           -- random()*1e9
> > >       col6     text,          -- 'text_' || random()*1e6  (short text ~10 
> > > chars)
> > >       col7     timestamp,     -- now() - random()*365 days
> > >       padding  text           -- repeat('x', 50)
> > >   ) WITH (fillfactor = 90);
> > >
> > > Indexes (8 total):
> > >   avtest_pkey   — btree on (id)        bigint
> > >   idx_av_col1   — btree on (col1)      int
> > >   idx_av_col2   — btree on (col2)      int
> > >   idx_av_col3   — btree on (col3)      int
> > >   idx_av_col4   — btree on (col4)      int
> > >   idx_av_col5   — btree on (col5)      int
> > >   idx_av_col6   — btree on (col6)      text
> > >   idx_av_col7   — btree on (col7)      timestamp
> > >
> > > Dead Tuple Generation:
> > >   DELETE FROM avtest WHERE id % 5 IN (1, 2);
> > >   This deletes exactly 40% of rows, uniformly distributed across all 
> > > pages.
> > >
> > > Vacuum Trigger:
> > >   Autovacuum is triggered naturally by lowering the threshold to 0 and 
> > > setting
> > >   scale_factor to a value that causes immediate launch after the DELETE.
> > >
> > > Worker Configurations Tested:
> > >   0 workers  — leader-only vacuum (baseline, no parallelism)
> > >   2 workers  — leader + 2 parallel workers (3 processes total)
> > >   4 workers  — leader + 4 parallel workers (5 processes total)
> > >   7 workers  — leader + 7 parallel workers (8 processes total, 1 per 
> > > index)
> > >
> > > Dataset:
> > >   Rows:         1,000,000,000
> > >   Heap size:    139 GB
> > >   Total size:   279 GB (heap + 8 indexes)
> > >   Dead tuples:  400,000,000 (40%)
> > >
> > > Index Sizes:
> > >   avtest_pkey    21 GB   (bigint)
> > >   idx_av_col7    21 GB   (timestamp)
> > >   idx_av_col1    18 GB   (int)
> > >   idx_av_col2    18 GB   (int)
> > >   idx_av_col3    18 GB   (int)
> > >   idx_av_col4    18 GB   (int)
> > >   idx_av_col5    18 GB   (int)
> > >   idx_av_col6     7 GB   (text — shorter keys, smaller index)
> > >   Total indexes: 139 GB
> > >
> > > Server Settings:
> > >   shared_buffers                = 96GB
> > >   maintenance_work_mem          = 1GB
> > >   max_wal_size                  = 100GB
> > >   checkpoint_timeout            = 1h
> > >   autovacuum_vacuum_cost_delay  = 0ms (NO throttling)
> > >   autovacuum_vacuum_cost_limit  = 1000
> > >
> > >
> > > Summary:
> > >
> > > Workers  Avg(s)    Min(s)    Max(s)    Speedup   Time Saved
> > > -------  ------    ------    ------    -------   ----------
> > > 0        1645.93   1645.01   1646.84    1.00x          —
> > > 2        1276.35   1275.64   1277.05    1.29x     369.58s (6.2 min)
> > > 4        1052.62   1048.92   1056.32    1.56x     593.31s (9.9 min)
> > > 7         892.23    886.59    897.86    1.84x     753.70s (12.6 min)
> > >
> >
> > Thank you for sharing the performance test results!
> >
> > While the benchmark results look good to me, have you compared the
> > performance differences between parallel vacuum in the VACUUM command
> > (with the PARALLEL option) and parallel vacuum in autovacuum? Since
> > parallel autovacuum introduces some logic to check for delay parameter
> > updates, I thought it was worth verifying if this adds any overhead.
> >
> > BTW, in my view, the most challenging part of this patch is the
> > propagation logic for vacuum delay parameters. This propagation is
> > necessary because, unlike manual VACUUM, autovacuum workers can reload
> > their configuration during operation. We must ensure that parallel
> > workers stay synchronized with these updated parameters.
> >
> > The current patch implements this in vacuumparallel.c: the leader
> > shares delay parameters in DSM and updates them (if any vacuum delay
> > parameters are updated) after a config reload, while workers poll for
> > updates at every vacuum_delay_point() call to refresh their local
> > variables.
> >
> > Another possible approach would be an event-driven model where the
> > leader notifies workers after updating shared parameters—for example,
> > by adding a shm_mq between the leader (as the sender) and each worker
> > (as the receiver).
> >
> > I've compared these two ideas and opted for the former (polling).
> > While a polling approach could theoretically be costly, the current
> > implementation is self-contained within the parallel vacuum logic and
> > does not touch the core parallel query infrastructure. The
> > notification approach might look more elegant, but I'm concerned it
> > adds unnecessary complexity just for the autovacuum case. Since the
> > polling is essentially just checking an atomic variable, the overhead
> > should be negligible.
> >
> > To verify this, I conducted benchmarks comparing the whole execution
> > time and index vacuuming duration.
> >
> > Setup:
> >
> > - Disabled (auto) vacuum delays and buffer usage limits.
> > - Parallel autovacuum with 1 worker on a table with 2 indexes (approx.
> > 4 GB each).
> > - 5 runs.
> >
> > Case 1: The latest patch (with polling)
> >
> > Average: 3.95s (Index: 1.54s)
> > Median: 3.62s (Index: 1.37s)
> >
> > Case 2: The latest patch without polling
> >
> > Average: 3.98s (Index: 1.56s)
> > Median: 3.70s (Index: 1.40s)
> >
> > Note that in order to simulate the code that doesn't have the polling,
> > I reverted the following change:
> >
> > -   if (InterruptPending ||
> > -       (!VacuumCostActive && !ConfigReloadPending))
> > +   if (InterruptPending)
> > +       return;
> > +
> > +   if (IsParallelWorker())
> > +   {
> > +       /*
> > +        * Update cost-based vacuum delay parameters for a parallel 
> > autovacuum
> > +        * worker if any changes are detected.
> > +        */
> > +       parallel_vacuum_update_shared_delay_params();
> > +   }
> > +
> > +   if (!VacuumCostActive && !ConfigReloadPending)
> >
> > The parallel vacuum workers don't check the shared vacuum delay
> > parameter at all, which is still fine as I disabled vacuum delays.
> >
> > Overall, the results show no noticeable overhead from the polling approach.
>
> I would say this polling approach is very cheap.  When there are no
> updates, it only has to check a single 32-bit value from shared
> memory.  And that value doesn't get updated frequently; it's good for
> caching.  No wonder we see no measurable overhead.


Thank you for the comments!

>
> Regarding the event-driven approach, given that the parallel worker
> process is busy with other jobs (doing actual vacuuming), it would
> anyway have to poll for new events from time to time.  Thus, I don't
> think it's possible to organize polling for new events any cheaper
> than the current approach of polling for updates in shmem.

What do you think about the idea of using proc signals like the patch
I've sent recently[1]? With that approach, workers have to check the
local variable. It seems slightly cheaper and can use the existing
logic.

[1] 
https://www.postgresql.org/message-id/CAD21AoBm0cxQjtWuY0f7%2BaT4UiRV%2B%2BaFKkzjj6vmERTj_UFnxA%40mail.gmail.com

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: Parallel processing of indexes in autovacuum

Reply via email to