Re: Eagerly scan all-visible pages to amortize aggressive vacuum

Alena Rybakina Wed, 15 Jan 2025 09:09:10 -0800

On 14.01.2025 22:51, Melanie Plageman wrote:

On Mon, Jan 13, 2025 at 5:37 PM Alena Rybakina
<a.rybak...@postgrespro.ru>  wrote:

Thank you for working on this patch, without this explanation it is difficult 
to understand what is happening, to put it mildly.

Thanks for the review! I've incorporated most of them into attached v7.

You are welcome! Thank you too)

The first of them is related to the fact that vacuum will not clean tuples 
referenced in indexes, since it was previously unable to take a cleanup lock on 
the index. You can look at the increment of missed_dead_tuples and 
vacrel->missed_dead_pages in the lazy_scan_noprune function. That is, these are 
absolutely dead tuples for vacuum that it simply could not clean.

I had mentioned that if a (non-aggressive) vacuum cannot get a cleanup
lock on a page, it will skip pruning and freezing. I have expanded the
note to mention that this means it will not remove those dead tuples
or index entries.

Secondly, I think it is worth mentioning the moment when vacuum urgently starts 
cleaning the heap relationship when there is a threat of a wraparound round. At 
this point, it skips the index processing phase and heap relationship 
truncation.

I've added failsafe to the list of reasons why we might skip phase II and III.

Yes, I agree with you. After reviewing the patch again, I noticed them.

Thirdly, FreeSpaceMap is updated every time after the complete completion of 
index and table cleaning (after the lazy_vacuum function) and after table heap 
pruning stage (the lazy_scan_prune function). Maybe you should add it.

I've added a sentence about this. It looks a bit awkward by itself,
but it doesn't really go with the other paragraphs. Anyway, I think it
is probably fine.

I think it is fine.

I think it is possible to add additional information about parallel vacuum - 
firstly, workers are generated for each index, which perform their cleaning. 
Some indexes are defined by vacuum as unsafe for processing by a parallel 
worker and can be processed only by a postmaster (or leader). These are indexes 
that do not support parallel bulk-deletion, parallel cleanup (see 
parallel_vacuum_index_is_parallel_safe function).

I hesitated to add too much about parallel index vacuuming to
vacuumlazy.c. I have added a line which mentions that manual vacuums
may vacuum indexes in parallel and to look at vacuumparallel.c for
more info.

In my opinion, there is enough information about it. Additionally, thecode for parallel vacuums is located there. I believe a briefdescription of what a parallel vacuum is and where to find moreinformation about it is sufficient, so it is fine as it is now.

I noticed an interesting point, but I don’t know if it is necessary to write about 
it, but for me it was not obvious and informative that the buffer and wal statistics 
are thrown by the indexes that were processed by workers and are thrown separately in 
(pvs->buffer_usage, pvs->wal_usage).

This is interesting, but I think it might belong as commentary in
vacuumparallel.c instead.

I added some description about it, I hope it is fine. I attachedvacuum_description.diff

Thanks again for your close reading and detailed thoughts!

Thank you for your contribution!)

--
Regards,
Alena Rybakina
Postgres Professional

diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 0d92e694d6a..6ac4909794a 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -9,8 +9,12 @@
  * In a parallel vacuum, we perform both index bulk deletion and index cleanup
  * with parallel worker processes.  Individual indexes are processed by one
  * vacuum process.  ParallelVacuumState contains shared information as well as
- * the memory space for storing dead items allocated in the DSA area.  We
- * launch parallel worker processes at the start of parallel index
+ * the memory space for storing dead items allocated in the DSA area. Furthemore
+ * the primary statistical information about indexes gathered during vacuuming
+ * is stored in the IndexBulkDeleteResult structure. In addition, the buffer and
+ * WAL statistics for indexes processed by parallel workers are stored in the
+ * buffer_usage and wal_usage fields of the ParallelVacuumState.
+ * We launch parallel worker processes at the start of parallel index
  * bulk-deletion and index cleanup and once all indexes are processed, the
  * parallel worker processes exit.  Each time we process indexes in parallel,
  * the parallel context is re-initialized so that the same DSM can be used for

Re: Eagerly scan all-visible pages to amortize aggressive vacuum

Reply via email to