On Tue, Apr 07, 2020 at 09:57:46AM +0530, Amit Kapila wrote:
> On Mon, Mar 23, 2020 at 10:34 AM Amit Kapila <[email protected]> wrote:
> > On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <[email protected]> wrote:
> > >
> > > Original, long thread
> > > https://www.postgresql.org/message-id/flat/CAA4eK1%2Bnw1FBK3_sDnW%2B7kB%2Bx4qbDJqetgqwYW8k2xv82RZ%2BKw%40mail.gmail.com#b1745ee853b137043e584b500b41300f
> > >
> >
> > I have comments/questions on the patches:
> > doc changes
> > -------------------------
> > 1.
> > <para>
> > - Perform vacuum index and cleanup index phases of
> > <command>VACUUM</command>
> > + Perform vacuum index and index cleanup phases of
> > <command>VACUUM</command>
> >
> > Why is the proposed text improvement over what is already there?
>
> I have kept the existing text as it is for this case.
Probably it should say "index vacuum and index cleanup", which is also what the
comment in vacuumlazy.c says.
> > 2.
> > If the
> > - <literal>PARALLEL</literal> option is omitted, then
> > - <command>VACUUM</command> decides the number of workers based on
> > number
> > - of indexes that support parallel vacuum operation on the relation
> > which
> > - is further limited by <xref
> > linkend="guc-max-parallel-workers-maintenance"/>.
> > - The index can participate in a parallel vacuum if and only if the
> > size
> > + <literal>PARALLEL</literal> option is omitted, then the number of
> > workers
> > + is determined based on the number of indexes that support parallel
> > vacuum
> > + operation on the relation, and is further limited by <xref
> > + linkend="guc-max-parallel-workers-maintenance"/>.
> > + An index can participate in parallel vacuum if and only if the size
> > of the index is more than <xref
> > linkend="guc-min-parallel-index-scan-size"/>.
> >
> > Here, it is not clear to me why the proposed text is better than what
> > we already have?
> Changed as per your suggestion.
To answer your question:
"VACUUM decides" doesn't sound consistent with docs.
"based on {+the+} number"
=> here, you're veritably missing an article...
"relation which" technically means that the *relation* is "is further limited
by"...
> > Patch changes
> > -------------------------
> > 1.
> > - * and index cleanup with parallel workers unless the parallel vacuum is
> > + * and index cleanup with parallel workers unless parallel vacuum is
> >
> > As per my understanding 'parallel vacuum' is a noun phrase, so we
> > should have a determiner before it.
>
> Changed as per your suggestion.
Thanks (I think you meant an "article").
> > - * We allow each worker to update it as and when it has incurred any cost
> > and
> > + * We allow each worker to update it AS AND WHEN it has incurred any cost
> > and
> >
> > I don't see why it is necessary to make this part bold? We are using
> > it at one other place in the code in the way it is used here.
> >
>
> Kept the existing text as it is.
I meant this as a question. I'm not sure what "as and when means" ? "If and
when" ?
> I have combined both of your patches. Let me know if you have any
> more suggestions, otherwise, I am planning to push this tomorrow.
In the meantime, I found some more issues, so I rebased on top of your patch so
you can review it.
--
Justin
>From 3551aba7445b46d60d2199d068c0e06a5828c5c3 Mon Sep 17 00:00:00 2001
From: Amit Kapila <[email protected]>
Date: Tue, 7 Apr 2020 09:48:28 +0530
Subject: [PATCH v3 1/2] Comments and doc fixes for commit 40d964ec99.
Reported-by: Justin Pryzby
Author: Justin Pryzby, with few changes by me
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/ref/vacuum.sgml | 16 +++++++-------
src/backend/access/heap/vacuumlazy.c | 30 +++++++++++++--------------
src/backend/access/transam/parallel.c | 2 +-
src/backend/commands/vacuum.c | 20 +++++++++---------
src/include/commands/vacuum.h | 2 +-
5 files changed, 35 insertions(+), 35 deletions(-)
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 846056a353..b7e0a8af6b 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -234,13 +234,13 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
<para>
Perform vacuum index and cleanup index phases of <command>VACUUM</command>
in parallel using <replaceable class="parameter">integer</replaceable>
- background workers (for the detail of each vacuum phases, please
+ background workers (for the detail of each vacuum phase, please
refer to <xref linkend="vacuum-phases"/>). If the
- <literal>PARALLEL</literal> option is omitted, then
- <command>VACUUM</command> decides the number of workers based on number
- of indexes that support parallel vacuum operation on the relation which
- is further limited by <xref linkend="guc-max-parallel-workers-maintenance"/>.
- The index can participate in a parallel vacuum if and only if the size
+ <literal>PARALLEL</literal> option is omitted, then the number of workers
+ is determined based on the number of indexes that support parallel vacuum
+ operation on the relation, and is further limited by <xref
+ linkend="guc-max-parallel-workers-maintenance"/>.
+ An index can participate in parallel vacuum if and only if the size
of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
Please note that it is not guaranteed that the number of parallel workers
specified in <replaceable class="parameter">integer</replaceable> will
@@ -248,7 +248,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
workers than specified, or even with no workers at all. Only one worker
can be used per index. So parallel workers are launched only when there
are at least <literal>2</literal> indexes in the table. Workers for
- vacuum launches before starting each phase and exit at the end of
+ vacuum are launched before the start of each phase and exit at the end of
the phase. These behaviors might change in a future release. This
option can't be used with the <literal>FULL</literal> option.
</para>
@@ -367,7 +367,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
<command>VACUUM</command> causes a substantial increase in I/O traffic,
which might cause poor performance for other active sessions. Therefore,
it is sometimes advisable to use the cost-based vacuum delay feature. For
- parallel vacuum, each worker sleeps proportional to the work done by that
+ parallel vacuum, each worker sleeps proportionally to the work done by that
worker. See <xref linkend="runtime-config-resource-vacuum-cost"/> for
details.
</para>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f3382d37a4..a757560996 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -208,7 +208,7 @@ typedef struct LVShared
* live tuples in the index vacuum case or the new live tuples in the
* index cleanup case.
*
- * estimated_count is true if the reltuples is an estimated value.
+ * estimated_count is true if reltuples is an estimated value.
*/
double reltuples;
bool estimated_count;
@@ -732,7 +732,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
* to reclaim dead line pointers.
*
* If the table has at least two indexes, we execute both index vacuum
- * and index cleanup with parallel workers unless the parallel vacuum is
+ * and index cleanup with parallel workers unless parallel vacuum is
* disabled. In a parallel vacuum, we enter parallel mode and then
* create both the parallel context and the DSM segment before starting
* heap scan so that we can record dead tuples to the DSM segment. All
@@ -809,8 +809,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
vacrelstats->latestRemovedXid = InvalidTransactionId;
/*
- * Initialize the state for a parallel vacuum. As of now, only one worker
- * can be used for an index, so we invoke parallelism only if there are at
+ * Initialize state for a parallel vacuum. As of now, only one worker can
+ * be used for an index, so we invoke parallelism only if there are at
* least two indexes on a table.
*/
if (params->nworkers >= 0 && vacrelstats->useindex && nindexes > 1)
@@ -837,7 +837,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
}
/*
- * Allocate the space for dead tuples in case the parallel vacuum is not
+ * Allocate the space for dead tuples in case parallel vacuum is not
* initialized.
*/
if (!ParallelVacuumIsActive(lps))
@@ -2215,7 +2215,7 @@ parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
shared_indstats = get_indstats(lvshared, idx);
/*
- * Skip processing indexes that doesn't participate in parallel
+ * Skip processing indexes that don't participate in parallel
* operation
*/
if (shared_indstats == NULL ||
@@ -2328,8 +2328,8 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
shared_indstats->updated = true;
/*
- * Now that the stats[idx] points to the DSM segment, we don't need
- * the locally allocated results.
+ * Now that stats[idx] points to the DSM segment, we don't need the
+ * locally allocated results.
*/
pfree(*stats);
*stats = bulkdelete_res;
@@ -2449,7 +2449,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
* lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
*
* reltuples is the number of heap tuples and estimated_count is true
- * if the reltuples is an estimated value.
+ * if reltuples is an estimated value.
*/
static void
lazy_cleanup_index(Relation indrel,
@@ -3050,9 +3050,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
/*
* Compute the number of parallel worker processes to request. Both index
* vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff it's size is greater than
+ * is eligible for parallel vacuum iff its size is greater than
* min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt the performance.
+ * can hurt performance.
*
* nrequested is the number of parallel workers that user requested. If
* nrequested is 0, we compute the parallel degree based on nindexes, that is
@@ -3071,7 +3071,7 @@ compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
int i;
/*
- * We don't allow to perform parallel operation in standalone backend or
+ * We don't allow performing parallel operation in standalone backend or
* when parallelism is disabled.
*/
if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
@@ -3144,7 +3144,7 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum,
}
/*
- * Update index statistics in pg_class if the statistics is accurate.
+ * Update index statistics in pg_class if the statistics are accurate.
*/
static void
update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
@@ -3345,7 +3345,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
/*
* Destroy the parallel context, and end parallel mode.
*
- * Since writes are not allowed during the parallel mode, so we copy the
+ * Since writes are not allowed during parallel mode, copy the
* updated index statistics from DSM in local memory and then later use that
* to update the index statistics. One might think that we can exit from
* parallel mode, update the index statistics and then destroy parallel
@@ -3452,7 +3452,7 @@ skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared)
* Perform work within a launched parallel process.
*
* Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report the progress information.
+ * we don't need to report progress information.
*/
void
parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 29057f389e..14a8690019 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -505,7 +505,7 @@ ReinitializeParallelDSM(ParallelContext *pcxt)
/*
* Reinitialize parallel workers for a parallel context such that we could
- * launch the different number of workers. This is required for cases where
+ * launch a different number of workers. This is required for cases where
* we need to reuse the same DSM segment, but the number of workers can
* vary from run-to-run.
*/
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 3a89f8fe1e..495ac23a26 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2036,23 +2036,23 @@ vacuum_delay_point(void)
/*
* Computes the vacuum delay for parallel workers.
*
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it. We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to allow each
+ * worker to sleep in proportion to the share of work it's done. We achieve this
* by allowing all parallel vacuum workers including the leader process to
* have a shared view of cost related parameters (mainly VacuumCostBalance).
* We allow each worker to update it as and when it has incurred any cost and
* then based on that decide whether it needs to sleep. We compute the time
* to sleep for a worker based on the cost it has incurred
* (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
- * that amount. This avoids letting the workers sleep who have done less or
- * no I/O as compared to other workers and therefore can ensure that workers
- * who are doing more I/O got throttled more.
+ * that amount. This avoids putting to sleep those workers which have done less
+ * I/O than other workers and therefore ensure that workers
+ * which are doing more I/O got throttled more.
*
- * We allow any worker to sleep only if it has performed the I/O above a
- * certain threshold, which is calculated based on the number of active
- * workers (VacuumActiveNWorkers), and the overall cost balance is more than
- * VacuumCostLimit set by the system. The testing reveals that we achieve
- * the required throttling if we allow a worker that has done more than 50%
+ * We allow a worker to sleep only if it has performed I/O above a certain
+ * threshold, which is calculated based on the number of active workers
+ * (VacuumActiveNWorkers), and the overall cost balance is more than
+ * VacuumCostLimit set by the system. Testing reveals that we achieve
+ * the required throttling if we force a worker that has done more than 50%
* of its share of work to sleep.
*/
static double
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 2779bea5c9..a4cd721400 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -225,7 +225,7 @@ typedef struct VacuumParams
/*
* The number of parallel vacuum workers. 0 by default which means choose
- * based on the number of indexes. -1 indicates a parallel vacuum is
+ * based on the number of indexes. -1 indicates parallel vacuum is
* disabled.
*/
int nworkers;
--
2.17.0
>From 3c6314b72d9323dd1554deadedf1c57fe105ad6b Mon Sep 17 00:00:00 2001
From: Justin Pryzby <[email protected]>
Date: Sun, 19 Jan 2020 22:35:01 -0600
Subject: [PATCH v3 2/2] docs/comments review for parallel vacuum
---
doc/src/sgml/ref/vacuum.sgml | 12 ++++++------
src/backend/access/heap/vacuumlazy.c | 22 +++++++++++-----------
src/backend/commands/vacuum.c | 2 +-
3 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index b7e0a8af6b..5dc0787459 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -232,9 +232,9 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
<term><literal>PARALLEL</literal></term>
<listitem>
<para>
- Perform vacuum index and cleanup index phases of <command>VACUUM</command>
+ Perform index vacuum and index cleanup phases of <command>VACUUM</command>
in parallel using <replaceable class="parameter">integer</replaceable>
- background workers (for the detail of each vacuum phase, please
+ background workers (for the details of each vacuum phase, please
refer to <xref linkend="vacuum-phases"/>). If the
<literal>PARALLEL</literal> option is omitted, then the number of workers
is determined based on the number of indexes that support parallel vacuum
@@ -358,16 +358,16 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
</para>
<para>
- The <option>PARALLEL</option> option is used only for vacuum purpose.
- Even if this option is specified with <option>ANALYZE</option> option
- it does not affect <option>ANALYZE</option>.
+ The <option>PARALLEL</option> option is used only for vacuum operations.
+ If specified along with <option>ANALYZE</option>, the behavior during
+ <literal>ANALYZE</literal> is unchanged.
</para>
<para>
<command>VACUUM</command> causes a substantial increase in I/O traffic,
which might cause poor performance for other active sessions. Therefore,
it is sometimes advisable to use the cost-based vacuum delay feature. For
- parallel vacuum, each worker sleeps proportionally to the work done by that
+ parallel vacuum, each worker sleeps in proportion to the work done by that
worker. See <xref linkend="runtime-config-resource-vacuum-cost"/> for
details.
</para>
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a757560996..dfa57df7b6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -232,8 +232,8 @@ typedef struct LVShared
/*
* Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance for a worker to go for the
- * delay.
+ * minimum threshold of the vacuum cost balance before a worker sleeps
+ * for cost-based delay.
*/
pg_atomic_uint32 active_nworkers;
@@ -2312,12 +2312,12 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
/*
* Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first time to get it
- * from them, because they allocate it locally and it's possible that an
- * index will be vacuumed by the different vacuum process at the next
- * time. The copying of the result normally happens only after the first
- * time of index vacuuming. From the second time, we pass the result on
- * the DSM segment so that they then update it directly.
+ * amvacuumcleanup to the DSM segment if it's the first time to get a result
+ * from a worker, because workers allocate BulkDeleteResults locally, and it's possible that an
+ * index will be vacuumed by a different vacuum process the next
+ * time. Copying the result normally happens only the first
+ * time an index is vacuumed. For any additional vacuums, we pass the result on
+ * the DSM segment so that workers then update it directly.
*
* Since all vacuum workers write the bulk-deletion result at different
* slots we can write them without locking.
@@ -3138,7 +3138,7 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum,
if (!can_parallel_vacuum[i])
continue;
- /* Set NOT NULL as this index do support parallelism */
+ /* Set NOT NULL as this index does support parallelism */
lvshared->bitmap[i >> 3] |= 1 << (i & 0x07);
}
}
@@ -3174,7 +3174,7 @@ update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
/*
* This function prepares and returns parallel vacuum state if we can launch
- * even one worker. This function is responsible to enter parallel mode,
+ * even one worker. This function is responsible for entering parallel mode,
* create a parallel context, and then initialize the DSM segment.
*/
static LVParallelState *
@@ -3346,7 +3346,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
* Destroy the parallel context, and end parallel mode.
*
* Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM in local memory and then later use that
+ * updated index statistics from DSM into local memory and then later use that
* to update the index statistics. One might think that we can exit from
* parallel mode, update the index statistics and then destroy parallel
* context, but that won't be safe (see ExitParallelMode).
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 495ac23a26..351d5215a9 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2040,7 +2040,7 @@ vacuum_delay_point(void)
* worker to sleep in proportion to the share of work it's done. We achieve this
* by allowing all parallel vacuum workers including the leader process to
* have a shared view of cost related parameters (mainly VacuumCostBalance).
- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and
* then based on that decide whether it needs to sleep. We compute the time
* to sleep for a worker based on the cost it has incurred
* (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
--
2.17.0