Re: POC: Parallel processing of indexes in autovacuum

Masahiko Sawada Mon, 02 Mar 2026 14:26:43 -0800

On Sun, Mar 1, 2026 at 6:46 AM Daniil Davydov <[email protected]> wrote:
>
> Hi,
>
> On Sat, Feb 28, 2026 at 8:57 AM Masahiko Sawada <[email protected]> wrote:
> >
> > IIUC earlier patches defined autovacuum_max_parallel_workers with the
> > limit by max_worker_processes. Suppose we set:
> >
> > - max_worker_processes = 8
> > - autovacuum_max_parallel_workers = 4
> > - max_parallel_workers = 4
> >
> > If we want to disable all parallel operations, we would need to set
> > max_parallel_workers to 0 as well as either
> > autovacuum_max_parallel_workers to 0, no? This is because if we set
> > only max_parallel_workers to 0, autovacuum workers still can take
> > parallel vacuum workers from the max_worker_processes pool. I might be
> > missing something though.
> >
>
> Even if av_max_parallel_workers is limited by max_worker_processes,
> it is enough to set max_parallel_workers to 0 to disable parallel
> autovacuum.
>
> When a/v leader wants to create supportive workers, it calls
> "RegisterDynamicBackgroundWorker" function, which contain following
> logic :
> /*
>  * If this is a parallel worker, check whether there are already too many
>  * parallel workers; if so, don't register another one.
>  */
> if (parallel && (BackgroundWorkerData->parallel_register_count -
>                        BackgroundWorkerData->parallel_terminate_count) >=
>     max_parallel_workers)
> {
> ....
> }
>
> Thus, a/v leader cannot launch any workers if max_parallel_workers is set to 
> 0.


Right. But this fact would actually support that limiting
autovacuum_max_parallel_workers by max_parallel_workers is more
appropriate, no?

>
> > > > If we write the log "%d parallel autovacuum workers have been
> > > > released" in AutoVacuumReleaseParallelWorkres(), can we simplify both
> > > > tests (4 and 5) further?
> > > >
> > >
> > > It won't help the 4th test, because ReleaseParallelWorkers is called
> > > due to both ERROR and shmem_exit, but we want to be sure that
> > > workers are released in the try/catch block (i.e. before the shmem_exit).
> >
> > We already call AutoVacuumReleaseAllParallelWorker() in the PG_CATCH()
> > block in do_autovacuum(). If we write the log in
> > AutoVacuumReleaseParallelWorkers(), the tap test is able to check the
> > log, no?
> >
>
> Not quite. Assume that we add "%d workers have been released" log to the
> ReleaseAllParallelWorkers. Then we trigger an error for a/v leader and wait
> for this log (we are expecting that workers will be released inside the
> try/catch block).
>
> Even if there is a bug in the code and a/v leader cannot release parallel
> workers due to occured error, one day it will finish vacuuming and call
> "proc_exit". During "proc_exit" the "before_shmem_exit_hook" along with
> the "ReleaseAllParallelWorkers" will be called.

What bugs are you concerned about in this case? I'm not sure what you
meant by "a/v leader cannot release parallel workers due to occured
error". It sounds like you mentioned a case where there is a bug in
AutoVacuumReleaseParallelWorkers() but if there is the bug and the
leader failed to release parallel workers, we would end up not writing
these elogs in either case.

>
> > > Also, I don't know whether the 5th test needs this log at all, because in
> > > the end we are checking the number of free parallel workers. If a killed
> > > a/v leader doesn't release parallel workers, we'll notice it.
> >
> > If we can check the log written at process shutdown time, I think we
> > can somewhat simplify the test 5 logic by not attaching
> > 'autovacuum-start-parallel-vacuum' injection point.
> >
> > 1. attach 'autovacuum-leader-before-indexes-processing' injection point.
> > 2. wait for an av worker to stop at the injection point.
> > 3. terminate the av worker.
> > 4. verify from the log if the workers have been released.
> > 5. disable parallel autovacuum.
> > 6. check the free workers (should be 10).
> >
> > Step 5 and 6 seems to be optional though.
>
> OK, I see your point. But I'm afraid that the "%d released" log can't help
>  us here for the reason I described above :
> "%d released" can be called from several places and we cannot be sure
> which one has emitted this log.
>
> I suppose to do the same as we did for try/catch block - add logging inside
> the "autovacuum_worker_before_shmem_exit" with some unique message.
> Thus, we will be sure that the workers are released precisely in the
> "before_shmem_exit_hook".
>
> The alternative is to pass some additional information to the
> "ReleaseAllParallelWorkers" function (to supplement the log it emits), but it
> doesn't seem like a good solution to me.

I'm not sure if it's important to check how
AutoVacuumReleaseAllParallelWorkers() has been called (either in
PG_CATCH() block or by autovacuum_worker_before_shmem_exit()). We
would end up having to add a unique message to each caller of
AutoVacuumReleaseAllParallelWorkers() in the future. I guess it's more
important to make sure that all workers have been released in the end.

In that sense, it would make more sense to check that all workers have
actually been released (i.e., checking by
get_parallel_autovacuum_free_workers()) after a parallel vacuum
instead of checking workers being released by debug logs. That is, we
can check at each test end if get_parallel_autovacuum_free_workers()
returns the expected number after disabling parallel autovacuum.

>
> > +               if (AmAutoVacuumWorkerProcess())
> > +               {
> > +                   /* Worker usage stats for parallel autovacuum. */
> > +                   appendStringInfo(&buf,
> > +                                    _("parallel workers: index
> > vacuum: %d planned, %d reserved, %d launched in total\n"),
> > +                                    vacrel->workers_usage.vacuum.nplanned,
> > +                                    vacrel->workers_usage.vacuum.nreserved,
> > +                                    
> > vacrel->workers_usage.vacuum.nlaunched);
> > +               }
> > +               else
> > +               {
> > +                   /* Worker usage stats for manual VACUUM (PARALLEL). */
> > +                   appendStringInfo(&buf,
> > +                                    _("parallel workers: index
> > vacuum: %d planned, %d launched in total\n"),
> > +                                    vacrel->workers_usage.vacuum.nplanned,
> > +                                    
> > vacrel->workers_usage.vacuum.nlaunched);
> > +               }
> > +           }
> >
> > These comments are very obvious so I don't think we need them.
>
> I agree.
>
> > Instead, I think it would be good to explain why we don't need to
> > report "reserved" numbers in the manual vacuum cases.
> >
>
> I think that we can clarify somewhere why the "reserved" statistic
> is collected only for autovacuum. PVWorkersStats is an appropriate
> place for it. Thus, there will be no need to write something during
> constructing the log.

On second thoughts on the "planned" and "reserved", can we consider
what the patch implemented as "reserved" as the "planned" in
autovacuum cases? That is, in autovacuum cases, the "planned" number
considers the number of parallel degrees based on the number of
indexes (or autovacuum_parallel_workers value) as well as the number
of workers that have actually been reserved. In cases of
autovacuum_max_parallel_workers shortage, users would notice by seeing
logs that enough workers are not planned in the first place against
the number of indexes on the table. That might be less confusing for
users rather than introducing a new "reserved" concept in the vacuum
logs. Also, it slightly helps simplify the codes.

>
> **Comments on the 0003 patch**
>
> >
> > +#define VacCostParamsEquals(params) \
> > +   (vacuum_cost_delay == (params).cost_delay && \
> > +    vacuum_cost_limit == (params).cost_limit && \
> > +    VacuumCostPageDirty == (params).cost_page_dirty && \
> > +    VacuumCostPageHit == (params).cost_page_hit && \
> > +    VacuumCostPageMiss == (params).cost_page_miss)
> >
> > I'm not sure this macro helps reduce lines of code or improve
> > readability as it's used only once and it's slightly unnatural to me
> > that *Equals macro takes only one argument.
> >
>
> I agree, it looks a bit odd. I'll remove it.
> Moreover, this shmem state can be updated only by the a/v leader worker,
> so I'll allow it to read shared variables without holding a spinlock.
> It seems pretty reliable, what do you think?

Right. It's safe for the leader to read these fields without locks.


> > ---
> > +   ereport(DEBUG2,
> > +           (errmsg("number of free parallel autovacuum workers is set
> > to %u due to config reload",
> > +                   AutoVacuumShmem->av_freeParallelWorkers),
> > +            errhidecontext(true)));
> >
> > Why do we need to add errhidecontext(true) here?
> >
>
> I thought we don't need to write redundant info to the logfile. But I
> don't see that other DEBUG2 messages are hiding context, so
> I'll remove it.
>
> BTW, do we want to use "elog" here too?

+1

Here are some comments:

* 0001 patch:

  * of the worker list (see above).
@@ -299,6 +308,8 @@ typedef struct
        WorkerInfo      av_startingWorker;
        AutoVacuumWorkItem av_workItems[NUM_WORKITEMS];
        pg_atomic_uint32 av_nworkersForBalance;
+       uint32          av_freeParallelWorkers;
+       uint32          av_maxParallelWorkers;
 } AutoVacuumShmemStruct;

We should use int32 instead of uint32.

* 0003 patch:

I've attached the proposed changes to the 0003 patch, which includes:

- removal of VacuumCostParams as it's not necessary.
- comment updates.
- other cosmetic updates.

* 0004 patch:

+#ifdef USE_INJECTION_POINTS
+   /*
+    * If we are parallel autovacuum worker, we can consume delay parameters
+    * during index processing (via vacuum_delay_point call). This logging
+    * allows tests to ensure this.
+    */
+   if (shared->is_autovacuum)
+       elog(DEBUG2,
+            "parallel autovacuum worker cost params: cost_limit=%d,
cost_delay=%g, cost_page_miss=%d, cost_page_dirty=%d,
cost_page_hit=%d",
+            vacuum_cost_limit,
+            vacuum_cost_delay,
+            VacuumCostPageMiss,
+            VacuumCostPageDirty,
+            VacuumCostPageHit);
+#endif

While it's true that we use these logs only during the regression
tests that are enabled only when injection points are also enabled,
these logs themselves are not related to the injection points. I'd
recommend writing these logs when the worker refreshes its local delay
parameters (i.e., in parallel_vacuum_update_shared_delay_params()).

---
+$node->append_conf('postgresql.conf', qq{
+   max_worker_processes = 20
+   max_parallel_workers = 20
+   max_parallel_maintenance_workers = 20
+   autovacuum_max_parallel_workers = 20
+   log_min_messages = debug2
+   log_autovacuum_min_duration = 0
+   autovacuum_naptime = '1s'
+   min_parallel_index_scan_size = 0
+   shared_preload_libraries=test_autovacuum
+});

It would be better to set log_autovacuum_min_duration = 0 to the
specific table instead of setting globally.

---
+   uint32      nfree_workers;
+
+#ifndef USE_INJECTION_POINTS
+   ereport(ERROR, errmsg("injection points not supported"));
+#endif
+
+   nfree_workers = AutoVacuumGetFreeParallelWorkers();
+
+   PG_RETURN_UINT32(nfree_workers);
+}

As I commented above, I think we should use int32 for the number of
parallel free workers. So let's change it here too.

---
+PG_FUNCTION_INFO_V1(get_parallel_autovacuum_free_workers);
+Datum
+get_parallel_autovacuum_free_workers(PG_FUNCTION_ARGS)
+{
+   uint32      nfree_workers;
+
+#ifndef USE_INJECTION_POINTS
+   ereport(ERROR, errmsg("injection points not supported"));
+#endif
+

I think we don't necessarily need to check the USE_INJECTION_POINTS in
this function as we already have the check in the tap tests. The
function itself is actually workable even without injection points.

---
+# Copyright (c) 2024-2025, PostgreSQL Global Development Group
+

Please update the copyright year here too.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

commit b269c5eb3ca039f3a9b7b59878b1a575a97ba607
Author: Masahiko Sawada <[email protected]>
Date:   Mon Mar 2 14:05:02 2026 -0800

    Proposed changes for 0003 patch.

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 70882544d05..644739483c8 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2436,10 +2436,8 @@ vacuum_delay_point(bool is_analyze)
 	if (IsParallelWorker())
 	{
 		/*
-		 * Possibly update cost-based delay parameters.
-		 *
-		 * Do it before checking VacuumCostActive, because its value might be
-		 * changed after calling this function.
+		 * Update cost-based delay parameters for a parallel autovacuum worker
+		 * if any changes are detected.
 		 */
 		parallel_vacuum_update_shared_delay_params();
 	}
@@ -2460,8 +2458,8 @@ vacuum_delay_point(bool is_analyze)
 		VacuumUpdateCosts();
 
 		/*
-		 * If we are parallel autovacuum leader and some of cost-based
-		 * parameters had changed, let other parallel workers know.
+		 * Propagate cost-based parameters to shared memory if any of them
+		 * have changed during the config reload.
 		 */
 		parallel_vacuum_propagate_shared_delay_params();
 	}
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 80b57bf9da3..c411ded2e7f 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -18,6 +18,13 @@
  * the parallel context is re-initialized so that the same DSM can be used for
  * multiple passes of index bulk-deletion and index cleanup.
  *
+ * For parallel autovacuum, we need to propagate cost-based delay parameters
+ * from the leader to its workers, as the leader's parameters can change
+ * even while processing a table (e.g., due to a config reload).
+ * The PVSharedCostParams struct manages these parameters using a
+ * generation counter. Each parallel worker polls this shared state and
+ * refreshes its local delay parameters whenever a change is detected.
+ *
  * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -54,26 +61,6 @@
 #define PARALLEL_VACUUM_KEY_WAL_USAGE		4
 #define PARALLEL_VACUUM_KEY_INDEX_STATS		5
 
-/*
- * Helper for the PVSharedCostParams structure (see below), to avoid
- * repetition.
- */
-typedef struct VacuumCostParams
-{
-	double		cost_delay;
-	int			cost_limit;
-	int			cost_page_dirty;
-	int			cost_page_hit;
-	int			cost_page_miss;
-} VacuumCostParams;
-
-#define	FillVacCostParams(cost_params) \
-	(cost_params)->cost_delay = vacuum_cost_delay; \
-	(cost_params)->cost_limit = vacuum_cost_limit; \
-	(cost_params)->cost_page_dirty = VacuumCostPageDirty; \
-	(cost_params)->cost_page_hit = VacuumCostPageHit; \
-	(cost_params)->cost_page_miss = VacuumCostPageMiss
-
 /*
  * Struct for cost-based vacuum delay related parameters to share among an
  * autovacuum worker and its parallel vacuum workers.
@@ -81,23 +68,22 @@ typedef struct VacuumCostParams
 typedef struct PVSharedCostParams
 {
 	/*
-	 * Each time leader worker updates its parameters, it must increase
-	 * generation. Every parallel worker keeps the generation
-	 * (shared_params_local_generation) at which it had last time received
-	 * parameters from the leader.
-	 *
-	 * It is enough for worker to compare it's local_generation with the field
-	 * below to determine whether it needs to receive new parameters' values.
+	 * The generation counter is incremented by the leader process each time
+	 * it updates the shared cost-based parameters. Paralell vacuum workers
+	 * compare this with their local generation,
+	 * shared_params_generation_localto, detect if they need to refresh their
+	 * local parameter copies.
 	 */
 	pg_atomic_uint32 generation;
 
 	slock_t		mutex;			/* protects all fields below */
 
-	/*
-	 * Copies of the corresponding cost-based vacuum delay parameters from
-	 * autovacuum leader process.
-	 */
-	VacuumCostParams params_data;
+	/* Parameters to share with parallel workers */
+	double		cost_delay;
+	int			cost_limit;
+	int			cost_page_dirty;
+	int			cost_page_hit;
+	int			cost_page_miss;
 } PVSharedCostParams;
 
 /*
@@ -285,7 +271,7 @@ struct ParallelVacuumState
 
 static PVSharedCostParams *pv_shared_cost_params = NULL;
 
-/* See comments for the PVSharedCostParams structure for the explanation. */
+/* See comments in the PVSharedCostParams for deatils */
 static uint32 shared_params_generation_local = 0;
 
 static int	parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
@@ -299,6 +285,7 @@ static void parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation
 static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
 												   bool vacuum);
 static void parallel_vacuum_error_callback(void *arg);
+static inline void parallel_vacuum_set_cost_parameters(PVSharedCostParams *params);
 
 /*
  * Try to enter parallel mode and create a parallel context.  Then initialize
@@ -461,9 +448,10 @@ parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
 
 	shared->is_autovacuum = AmAutoVacuumWorkerProcess();
 
+	/* Initialize shared cost-based parameters if it's for autovacuum */
 	if (shared->is_autovacuum)
 	{
-		FillVacCostParams(&shared->cost_params.params_data);
+		parallel_vacuum_set_cost_parameters(&shared->cost_params);
 		pg_atomic_init_u32(&shared->cost_params.generation, 0);
 		SpinLockInit(&shared->cost_params.mutex);
 
@@ -615,10 +603,21 @@ parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tup
 }
 
 /*
- * If we are parallel *autovacuum* worker, check whether related to cost-based
- * vacuum delay parameters had changed in the leader worker. If so,
- * corresponding parameters will be updated to the values which leader worker
- * is operating on.
+ * Set cost-based delay parameter values to the given 'params'.
+ */
+static inline void
+parallel_vacuum_set_cost_parameters(PVSharedCostParams *params)
+{
+	params->cost_delay = vacuum_cost_delay;
+	params->cost_limit = vacuum_cost_limit;
+	params->cost_page_dirty = VacuumCostPageDirty;
+	params->cost_page_hit = VacuumCostPageHit;
+	params->cost_page_miss = VacuumCostPageMiss;
+}
+
+/*
+ * Updates the cost-based vacuum delay parameters for parallel vacuum workers
+ * launched by an autovacuum worker.
  *
  * For non-autovacuum parallel worker this function will have no effect.
  */
@@ -629,7 +628,7 @@ parallel_vacuum_update_shared_delay_params(void)
 
 	Assert(IsParallelWorker());
 
-	/* Check whether we are running parallel autovacuum */
+	/* Quick return if the wokrer is not running for the autovacuum */
 	if (pv_shared_cost_params == NULL)
 		return;
 
@@ -641,13 +640,11 @@ parallel_vacuum_update_shared_delay_params(void)
 		return;
 
 	SpinLockAcquire(&pv_shared_cost_params->mutex);
-
-	VacuumCostDelay = pv_shared_cost_params->params_data.cost_delay;
-	VacuumCostLimit = pv_shared_cost_params->params_data.cost_limit;
-	VacuumCostPageDirty = pv_shared_cost_params->params_data.cost_page_dirty;
-	VacuumCostPageHit = pv_shared_cost_params->params_data.cost_page_hit;
-	VacuumCostPageMiss = pv_shared_cost_params->params_data.cost_page_miss;
-
+	VacuumCostDelay = pv_shared_cost_params->cost_delay;
+	VacuumCostLimit = pv_shared_cost_params->cost_limit;
+	VacuumCostPageDirty = pv_shared_cost_params->cost_page_dirty;
+	VacuumCostPageHit = pv_shared_cost_params->cost_page_hit;
+	VacuumCostPageMiss = pv_shared_cost_params->cost_page_miss;
 	SpinLockRelease(&pv_shared_cost_params->mutex);
 
 	VacuumUpdateCosts();
@@ -656,46 +653,41 @@ parallel_vacuum_update_shared_delay_params(void)
 }
 
 /*
- * Function to be called from parallel autovacuum leader in order to propagate
- * some cost-based vacuum delay parameters to the supportive workers.
+ * Store the cost-based vacuum delay parameters on the shared memory so that
+ * parallel vacuum workers can reflect them (see
+ * parallel_vacuum_update_shared_delay_params()).
  */
 void
 parallel_vacuum_propagate_shared_delay_params(void)
 {
-	VacuumCostParams *params_data;
-
 	Assert(AmAutoVacuumWorkerProcess());
 
-	/* Check whether we are running parallel autovacuum */
+	/*
+	 * Quick return if the leader process is not shareing the delay
+	 * parameters.
+	 */
 	if (pv_shared_cost_params == NULL)
 		return;
 
 	/*
-	 * Only leader worker can modify this shared structure, so we can read it
-	 * without acquiring a lock.
+	 * Check if any delay parameters has changed. We can read them without
+	 * locks as only the leader can modify them.
 	 */
-	params_data = &pv_shared_cost_params->params_data;
-
-	if (vacuum_cost_delay == params_data->cost_delay &&
-		vacuum_cost_limit == params_data->cost_limit &&
-		VacuumCostPageDirty == params_data->cost_page_dirty &&
-		VacuumCostPageHit == params_data->cost_page_hit &&
-		VacuumCostPageMiss == params_data->cost_page_miss)
-	{
-		/*
-		 * We don't need to update shared cost-based vacuum delay params if
-		 * they haven't changed.
-		 */
+	if (vacuum_cost_delay == pv_shared_cost_params->cost_delay &&
+		vacuum_cost_limit == pv_shared_cost_params->cost_limit &&
+		VacuumCostPageDirty == pv_shared_cost_params->cost_page_dirty &&
+		VacuumCostPageHit == pv_shared_cost_params->cost_page_hit &&
+		VacuumCostPageMiss == pv_shared_cost_params->cost_page_miss)
 		return;
-	}
 
+	/* Update the shared delay parameters */
 	SpinLockAcquire(&pv_shared_cost_params->mutex);
-	FillVacCostParams(&pv_shared_cost_params->params_data);
+	parallel_vacuum_set_cost_parameters(&pv_shared_cost_params);
 	SpinLockRelease(&pv_shared_cost_params->mutex);
 
 	/*
-	 * Increase generation of the parameters, i.e. let parallel workers know
-	 * that they should re-read shared cost params.
+	 * Increment the generation of the parameters, i.e. let parallel workers
+	 * know that they should re-read shared cost params.
 	 */
 	pg_atomic_fetch_add_u32(&pv_shared_cost_params->generation, 1);
 }
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index de9f576e0f3..1120646f2c8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3250,7 +3250,6 @@ VacAttrStatsP
 VacDeadItemsInfo
 VacErrPhase
 VacOptValue
-VacuumCostParams
 VacuumParams
 VacuumRelation
 VacuumStmt

Re: POC: Parallel processing of indexes in autovacuum

Reply via email to