On Thu, May 15, 2025 at 10:10 PM Daniil Davydov <3daniss...@gmail.com> wrote: > > Hi, > > On Fri, May 16, 2025 at 4:06 AM Matheus Alcantara > <matheusssil...@gmail.com> wrote: > > I've reviewed the v1-0001 patch, the build on MacOS using meson+ninja is > > failing: > > ❯❯❯ ninja -C build install > > ninja: Entering directory `build' > > [1/126] Compiling C object > > src/backend/postgres_lib.a.p/utils_misc_guc_tables.c.o > > FAILED: src/backend/postgres_lib.a.p/utils_misc_guc_tables.c.o > > ../src/backend/utils/misc/guc_tables.c:3613:4: error: incompatible > > pointer to integer conversion initializing 'int' with an expression of > > type 'void *' [-Wint-conversion] > > 3613 | NULL, > > | ^~~~ > > > > Thank you for reviewing this patch! > > > It seems that the "autovacuum_reserved_workers_num" declaration on > > guc_tables.c has an extra gettext_noop() call? > > Good catch, I fixed this warning in the v2 version. > > > > > One other point is that as you've added TAP tests for the autovacuum I > > think you also need to create a meson.build file as you already create > > the Makefile. > > > > You also need to update the src/test/modules/meson.build and > > src/test/modules/Makefile to include the new test/modules/autovacuum > > path. > > > > OK, I should clarify this moment : modules/autovacuum is not a normal > test but a sandbox - just an example of how we can trigger parallel > index autovacuum. Also it may be used for debugging purposes. > In fact, 001_autovac_parallel.pl is not verifying anything. > I'll do as you asked (add all meson and Make stuff), but please don't > focus on it. The creation of the real test is still in progress. (I'll > try to complete it as soon as possible). > > In this letter I will divide the patch into 2 parts : implementation > and sandbox. What do you think about implementation?
Thank you for updating the patches. I have some comments on v2-0001 patch: + { + {"autovacuum_reserved_workers_num", PGC_USERSET, RESOURCES_WORKER_PROCESSES, + gettext_noop("Number of worker processes, reserved for participation in parallel index processing during autovacuum."), + gettext_noop("This parameter is depending on \"max_worker_processes\" (not on \"autovacuum_max_workers\"). " + "*Only* autovacuum workers can use these additional processes. " + "Also, these processes are taken into account in \"max_parallel_workers\"."), + }, + &av_reserved_workers_num, + 0, 0, MAX_BACKENDS, + check_autovacuum_reserved_workers_num, NULL, NULL + }, I find that the name "autovacuum_reserved_workers_num" is generic. It would be better to have a more specific name for parallel vacuum such as autovacuum_max_parallel_workers. This parameter is related to neither autovacuum_worker_slots nor autovacuum_max_workers, which seems fine to me. Also, max_parallel_maintenance_workers doesn't affect this parameter. Which number does this parameter mean to specify: the maximum number of parallel vacuum workers that can be used during autovacuum or the maximum number of parallel vacuum workers that each autovacuum can use? --- The patch includes the changes to bgworker.c so that we can reserve some slots for autovacuums. I guess that this change is not necessarily necessary because if the user sets the related GUC parameters correctly the autovacuum workers can use parallel vacuum as expected. Even if we need this change, I would suggest implementing it as a separate patch. --- + { + { + "parallel_idx_autovac_enabled", + "Allows autovacuum to process indexes of this table in parallel mode", + RELOPT_KIND_HEAP, + ShareUpdateExclusiveLock + }, + false + }, The proposed reloption name doesn't align with our naming conventions. Looking at our existing reloptions, we typically write out full words rather than using abbreviations like 'autovac' or 'idx'. The new reloption name seems not to follow the conventional naming style for existing reloption. For instance, we don't use abbreviations such as 'autovac' and 'idx'. I guess we can implement this parameter as an integer parameter so that the user can specify the number of parallel vacuum workers for the table. For example, we can have a reloption autovacuum_parallel_workers. Setting 0 (by default) means to disable parallel vacuum during autovacuum, and setting special value -1 means to let PostgreSQL calculate the parallel degree for the table (same as the default VACUUM command behavior). I've also considered some alternative names. If we were to use parallel_maintenance_workers, it sounds like it controls the parallel degree for all operations using max_parallel_maintenance_workers, including CREATE INDEX. Similarly, vacuum_parallel_workers could be interpreted as affecting both autovacuum and manual VACUUM commands, suggesting that when users run "VACUUM (PARALLEL) t", the system would use their specified value for the parallel degree. I prefer autovacuum_parallel_workers or vacuum_parallel_workers. --- + /* + * If we are running autovacuum - decide whether we need to process indexes + * of table with given oid in parallel. + */ + if (AmAutoVacuumWorkerProcess() && + params->index_cleanup != VACOPTVALUE_DISABLED && + RelationAllowsParallelIdxAutovac(rel)) I think that this should be done in autovacuum code. --- +/* + * Minimum number of dead tuples required for the table's indexes to be + * processed in parallel during autovacuum. + */ +#define AV_PARALLEL_DEADTUP_THRESHOLD 1024 + +/* + * How many indexes should process each parallel worker during autovacuum. + */ +#define NUM_INDEXES_PER_PARALLEL_WORKER 30 These fixed values really useful in common cases? I think we already have an optimization where we skip vacuum indexes if the table has fewer dead tuples (see BYPASS_THRESHOLD_PAGES). Given that we rely on users' heuristics which table needs to use parallel vacuum during autovacuum, I think we don't need to apply these conditions. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com