Re: POC: Parallel processing of indexes in autovacuum

Masahiko Sawada Wed, 25 Feb 2026 16:00:04 -0800

On Wed, Feb 11, 2026 at 12:04 AM Daniil Davydov <[email protected]> wrote:
>
> On Thu, Jan 22, 2026 at 5:29 AM Masahiko Sawada <[email protected]> wrote:
> >
> > On Sat, Jan 17, 2026 at 6:52 AM Daniil Davydov <[email protected]> wrote:
> > >
> > > I will keep the 'max_worker_processes' limit, so autovacuum will not
> > > waste time initializing a parallel context if there is no chance that
> > > the request will succeed.
> > > But it's worth remembering that actually the
> > > 'autovacuum_max_parallel_workers' parameter will always be implicitly
> > > capped by 'max_parallel_workers'.
> >
> > It doesn't make sense to me that we limit
> > autovacuum_max_parallel_workers by max_worker_processes TBH. When
> > users want to have more parallel vacuum workers for autovacuum and the
> > VACUUM command, they would have to consider max_worker_processes,
> > max_parallel_workers, and max_parallel_maintenance_workers separately.
> > Given that max_parallel_workers is controlling the number of
> > max_worker_processes that can be used in parallel operations, I
> > believe that parallel vacuum workers for autovacuum should also be
> > taken from that pool.
>
> Maybe I don't quite understand the meaning of "limited by". For example,
> we have a max_parallel_workers_per_gather parameter, which is limited
> by max_parallel_workers. But actually we can set this parameter to a value
> that is higher than max_parallel_workers. The limitation is that for Gather
> node we cannot request more workers than are available in bgworkers pool
> (where number of free workers is always <= max_parallel_workers). Thus,
> limitation actually exists only for bgworkers pool, on which other parallel
> operations depend. In particular, whatever values we set for the
> autovacuum_max_parallel_workers parameter, it always will depend only
> on bgworkers pool.


Right, parallel workers are actually taken from bgworkers pool.

>
> I'll give in to your opinion and add a limitation by max_parallel_workers.
> But I still don't understand where the point is in explicit limitation by
> max_parallel_workers, if we already have this limitation implicitly?
> It seems a bit redundant for me. I hope I've conveyed my point correctly.

max_worker_processes controls the number of available bgworkers in the
database cluster and bg workers are used for parallel queries, logical
replication, or any other extensions as well. Also, it requires a
server restart to change. max_parallel_workers controls "how many
bgworkers can be used for parallel queries in total?" and is a
PGC_USERSET parameter. I think it's easier for users to tune parallel
query related parameters since all bgworkers for parallel queries
(i.e., parallel workers) are taken from max_parallel_workers pool. For
example, if users want to disable all parallel queries, they can do
that by setting max_parallel_workers to 0. If parallel vacuum workers
for autovacuums are taken from max_worker_processes pool (i.e.,
without max_paralle_workers limit), users would need to set both
max_parallel_workers and autovacuum_max_parallel_workers to 0.

> --
>
> > +   /*
> > +    * If 'true' then we are running parallel autovacuum. Otherwise, we are
> > +    * running parallel maintenence VACUUM.
> > +    */
> > +   bool        am_parallel_autovacuum;
> >
> > How about renaming it to use_shared_delay_params? I think it conveys
> > better what the field is used for.
>
> I think that we should leave this name, because in the future some other
> behavior differences may occur between manual VACUUM and autovacuum.
> If so, we will already have an "am_autovacuum" field which we can use in
> the code.
> The existing logic with the "am_autovacuum" name is also LGTM - we should
> use shared delay params only because we are running parallel autovacuum.

It may occur but we can change the field name when it really comes.

I'm slightly concerned that we've been using am_xxx variables in a
different way. For instance, am_walsender is a global variable that is
set to true only in wal sender processes. Also we have a bunch of
AmXXProcess() macros that checks the global variable MyBackendType, to
check the kinds of the current process. That is, the subject of 'am'
is typically the process, I guess. On the other hand,
am_parallel_autovacuum is stored in DSM space and indicates whether a
parallel vacuum is invoked by manual VACUUM or autovacuum.

>
> > Truncating all logs every after test would decrease the debuggability
> > much. We can pass the offset as the start point to wait for the
> > contents.
> >
>
> I've combined two of your above comments purposely. I agree that truncating
> all logs is a bad decision and we need to solve this in a different way. But 
> the
> problem will occur If we want to 1) use logging instead of a test-only 
> function
> and 2) use offsets as the start point to wait for the contents in the logfile.
>
> Imagine that we (using the described approach) need to wait until the end of
> parallel index processing and determine the current number of free parallel
> workers.
>
> IIUC, It'll look like this :
> wait_for_av_log("autovacuum processing finished");
> wait_for_av_log("number of free workers = N");
>
> But when we call wait_for_av_log first time, we will advance "offset" to the
> end of logfile and thus we will miss the "number of free workers = N". The
> only way to avoid it is to write a function that will determine the exact
> position of "autovacuum processing finished" in the logfile. Isn't it too 
> much?
>
> I think that using wait_for_av_log("autovacuum processing finished"); +
> SELECT get_parallel_autovacuum_free_workers(); will be much more
> demonstrably and simply.
>
> Moreover, the AutoVacuumGetFreeParallelWorkers function doesn't
> seem completely useless in isolation from tests. I suggest leaving
> this function and its usage in the tests. I can remove the "For testing
> purpose only!" comment, so everyone will be free to use this function
> in the future.

Agreed. The updated test scenario looks reasonable to me.

>
> 1)
> Test 5 can be stabilized as follows :
> We can attach to the "autovacuum-start-parallel-vacuum" injection point in
> the "wait" mode. Thereby when we terminate the first a/v leader, we are
> guaranteed that no other a/v leader will reach release/reserve functions.
> And then we are free to call the get_parallel_autovacuum_free_workers
> function. I'll additionally describe this logic in the test.
>
> 2)
> In the test 4 I found another problem : when a/v leader errors out, it will
> exit() pretty soon. And during exit() it will call the before_shmem_exit hook.
> Thus, we cannot be sure that parallel workers has been released exactly
> in the try/catch block. In order to guarantee it, I think that we should log
> something inside the try/catch block. I added a pretty controversial loggin
> code for it, but it is the best I came up with.
>
> In the test 4 the above idea will look something like this:
> $log_start = $node->wait_for_log(
>     qr/error triggered for injection point / .
>     qr/autovacuum-leader-before-indexes-processing/,
>     $log_start
> );
> $log_start = $node->wait_for_log(
>     qr/2 parallel autovacuum workers has been released after occured error/,
>     $log_start
> );
>
> Above I described a problem that may occur when we advance
> "logfile offset" too far after the first wait_for_log call. Here, this problem
> doesn't occur because the autovacuum launcher infinitely tries to
> vacuum the table, so other "N workers released" messages occur.

If we write the log "%d parallel autovacuum workers have been
released" in AutoVacuumReleaseParallelWorkres(), can we simplify both
tests (4 and 5) further?

I've reviewed all patches. The 0001 patch looks good to me.

0002 patch:

+                                       /* Worker usage stats for
parallel autovacuum. */
+                                       appendStringInfo(&buf,
+
  _("parallel index vacuum: %d workers were planned, %d workers were
reserved and %d workers were launched in total\n"),
+
  vacrel->workers_usage.vacuum.nplanned,
+
  vacrel->workers_usage.vacuum.nreserved,
+
  vacrel->workers_usage.vacuum.nlaunched);

These log messages need to take care of plural forms but it seems to
be too long if we use errmsg_plural() for each number. So how about
something like:

parallel workers: index: %d planned, %d reserved, %d launched in total
parallel workers: cleanup %d planned, %d reserved, %d launched

(Index cleanup is executed at most once so we don't need "in total" in
the message.)

0003 patch:

+typedef struct CostParamsData
+{
+        double         cost_delay;
+        int                    cost_limit;
+        int                    cost_page_dirty;
+        int                    cost_page_hit;
+        int                    cost_page_miss;
+} CostParamsData;

The name CostParamsData sounds too generic and I guess it could
conflict with optimizer-related struct names in the future. How about
renaming it to VacuumDelayParams?

---
+        SpinLockAcquire(&pv_shared_cost_params->mutex);
+
+        shared_params_data = pv_shared_cost_params->params_data;
+
+        VacuumCostDelay = shared_params_data.cost_delay;
+        VacuumCostLimit = shared_params_data.cost_limit;
+        VacuumCostPageDirty = shared_params_data.cost_page_dirty;
+        VacuumCostPageHit = shared_params_data.cost_page_hit;
+        VacuumCostPageMiss = shared_params_data.cost_page_miss;
+
+        SpinLockRelease(&pv_shared_cost_params->mutex);

If we copy the shared values in pv_shared_cost_params, we should
release the spinlock earlier, i.e., before updating VacuumCostXXX
variables. But I don't think we would even need to set these values in
the local variables in this case as updating 4 local variables is
fairly cheap.

---
+        FillCostParamsData(&local_params_data);
+        SpinLockAcquire(&pv_shared_cost_params->mutex);
+
+        if (CostParamsDataEqual(pv_shared_cost_params->params_data,
+                                                        local_params_data))
+        {

IIUC it stores cost-based vacuum delay parameters into the
local_params_data only for using CostParamsDataEqual() macro. I think
it's better to directly compare values in pv_shared_cost_params and
the cost-based vacuum delay parameters.

0004 patch:

+        if (nworkers > 0)
+
INJECTION_POINT("autovacuum-leader-before-indexes-processing", NULL);

I think it's better to use #ifdef USE_INJECTION_POINTS here.

---
+#ifdef USE_INJECTION_POINTS
+/*
+ * Log values of the related to cost-based delay parameters. It is used for

s/values of the related to/values related to/

---
+ * testing purpose.
+ */
+static void
+parallel_vacuum_report_cost_based_params(void)
+{
+       StringInfoData buf;
+
+       /* Simulate config reload during normal processing */
+       pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+       vacuum_delay_point(false);
+       pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);

Calling vacuum_delay_point() here feels a bit arbitrary to me. Since
parallel vacuum workers are calling
parallel_vacuum_report_cost_based_params() after
parallel_vacuum_process_safe_indexes(), I think we don't necessarily
call vacuum_delay_point() here.

---
+       appendStringInfo(&buf, "Vacuum cost-based delay parameters of
parallel worker:\n");
+       appendStringInfo(&buf, "vacuum_cost_limit = %d\n",vacuum_cost_limit);
+       appendStringInfo(&buf, "vacuum_cost_delay = %g\n", vacuum_cost_delay);
+       appendStringInfo(&buf, "vacuum_cost_page_miss = %d\n",
VacuumCostPageMiss);
+       appendStringInfo(&buf, "vacuum_cost_page_dirty = %d\n",
VacuumCostPageDirty);
+       appendStringInfo(&buf, "vacuum_cost_page_hit = %d\n",
VacuumCostPageHit);

I'd write these messages directly in elog() instead of using
StringInfoData, which is simpler and can save palloc()/pfree().

---
+       ereport(DEBUG2, errmsg("%s", buf.data));

Let's use elog() instead of ereport().

---
+# Create role with pg_signal_autovacuum_worker for terminating
autovacuum worker.
+$node->safe_psql('postgres', qq{
+        CREATE ROLE regress_worker_role;
+        GRANT pg_signal_autovacuum_worker TO regress_worker_role;
+        SET ROLE regress_worker_role;
+});
+
+$node->safe_psql('postgres', qq{
+        SELECT pg_terminate_backend('$av_pid');
+});

These two safe_psql calls use separate connections, meaning that
pg_terminate_backend() is executed by the superuser rather than
regress_worker_role. I think we don't need to create the
regrss_worker_role and we can use the superuser in this test case.

---
We would add more autovacuum related tests to the test_autovacuum
directory in the future. Given that the 001_basic.pl focuses on
parallel vacuum tests, how about renaming it to 001_parallel_vacuum.pl
or something?

> This time I'll try something experimental - besides the patches I'll also
> post differences between corresponding patches from v20 and v21.
> I.e. you can apply v20--v21-diff-for-0001 on the v20-0001 patch and
> get the v21-0001 patch. There are a lot of changes, so I guess it will
> help you during review.  Please, let me know whether it is useful for you.

It was helpful to easily see the changes from the previous version. Thank you!

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: POC: Parallel processing of indexes in autovacuum

Reply via email to