On 14.01.2025 22:51, Melanie Plageman wrote:
On Mon, Jan 13, 2025 at 5:37 PM Alena Rybakina
<a.rybak...@postgrespro.ru> wrote:
Thank you for working on this patch, without this explanation it is difficult
to understand what is happening, to put it mildly.
Thanks for the review! I've incorporated most of them into attached v7.
You are welcome! Thank you too)
The first of them is related to the fact that vacuum will not clean tuples
referenced in indexes, since it was previously unable to take a cleanup lock on
the index. You can look at the increment of missed_dead_tuples and
vacrel->missed_dead_pages in the lazy_scan_noprune function. That is, these are
absolutely dead tuples for vacuum that it simply could not clean.
I had mentioned that if a (non-aggressive) vacuum cannot get a cleanup
lock on a page, it will skip pruning and freezing. I have expanded the
note to mention that this means it will not remove those dead tuples
or index entries.
Secondly, I think it is worth mentioning the moment when vacuum urgently starts
cleaning the heap relationship when there is a threat of a wraparound round. At
this point, it skips the index processing phase and heap relationship
truncation.
I've added failsafe to the list of reasons why we might skip phase II and III.
Yes, I agree with you. After reviewing the patch again, I noticed them.
Thirdly, FreeSpaceMap is updated every time after the complete completion of
index and table cleaning (after the lazy_vacuum function) and after table heap
pruning stage (the lazy_scan_prune function). Maybe you should add it.
I've added a sentence about this. It looks a bit awkward by itself,
but it doesn't really go with the other paragraphs. Anyway, I think it
is probably fine.
I think it is fine.
I think it is possible to add additional information about parallel vacuum -
firstly, workers are generated for each index, which perform their cleaning.
Some indexes are defined by vacuum as unsafe for processing by a parallel
worker and can be processed only by a postmaster (or leader). These are indexes
that do not support parallel bulk-deletion, parallel cleanup (see
parallel_vacuum_index_is_parallel_safe function).
I hesitated to add too much about parallel index vacuuming to
vacuumlazy.c. I have added a line which mentions that manual vacuums
may vacuum indexes in parallel and to look at vacuumparallel.c for
more info.
In my opinion, there is enough information about it. Additionally, the
code for parallel vacuums is located there. I believe a brief
description of what a parallel vacuum is and where to find more
information about it is sufficient, so it is fine as it is now.
I noticed an interesting point, but I don’t know if it is necessary to write about
it, but for me it was not obvious and informative that the buffer and wal statistics
are thrown by the indexes that were processed by workers and are thrown separately in
(pvs->buffer_usage, pvs->wal_usage).
This is interesting, but I think it might belong as commentary in
vacuumparallel.c instead.
I added some description about it, I hope it is fine. I attached
vacuum_description.diff
Thanks again for your close reading and detailed thoughts!
Thank you for your contribution!)
--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 0d92e694d6a..6ac4909794a 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -9,8 +9,12 @@
* In a parallel vacuum, we perform both index bulk deletion and index cleanup
* with parallel worker processes. Individual indexes are processed by one
* vacuum process. ParallelVacuumState contains shared information as well as
- * the memory space for storing dead items allocated in the DSA area. We
- * launch parallel worker processes at the start of parallel index
+ * the memory space for storing dead items allocated in the DSA area. Furthemore
+ * the primary statistical information about indexes gathered during vacuuming
+ * is stored in the IndexBulkDeleteResult structure. In addition, the buffer and
+ * WAL statistics for indexes processed by parallel workers are stored in the
+ * buffer_usage and wal_usage fields of the ParallelVacuumState.
+ * We launch parallel worker processes at the start of parallel index
* bulk-deletion and index cleanup and once all indexes are processed, the
* parallel worker processes exit. Each time we process indexes in parallel,
* the parallel context is re-initialized so that the same DSM can be used for