On Wed, Mar 9, 2022 at 12:41 AM Imseih (AWS), Sami <sims...@amazon.com> wrote:
>
> +    +/*
> +    + * vacuum_worker_init --- initialize this module's shared memory hash
> +    + * to track the progress of a vacuum worker
> +    + */
> +    +void
> +    +vacuum_worker_init(void)
> +    +{
> +    +       HASHCTL     info;
> +    +       long        max_table_size = GetMaxBackends();
> +    +
> +    +       VacuumWorkerProgressHash = NULL;
> +    +
> +    +       info.keysize = sizeof(pid_t);
> +    +       info.entrysize = sizeof(VacProgressEntry);
> +    +
> +    +       VacuumWorkerProgressHash = ShmemInitHash("Vacuum Progress Hash",
> +    +
> +                      max_table_size,
> +    +
> +                      max_table_size,
> +    +
> +                      &info,
> +    +
> +                      HASH_ELEM | HASH_BLOBS);
> +    +}
>
> +    It seems to me that creating a shmem hash with max_table_size entries
> +    for parallel vacuum process tracking is too much. IIRC an old patch
> +    had parallel vacuum workers advertise its progress and changed the
> +    pg_stat_progress_vacuum view so that it aggregates the results
> +    including workers' stats. I think it’s better than the current one.
> +    Why did you change that?
>
> +    Regards,
>
> I was trying to avoid a shared memory to track completed indexes, but 
> aggregating stats does not work with parallel vacuums. This is because a 
> parallel worker will exit before the vacuum completes causing the aggregated 
> total to be wrong.
>
> For example
>
> Leader_pid advertises it completed 2 indexes
> Parallel worker advertises it completed 2 indexes
>
> When aggregating we see 4 indexes completed.
>
> After the parallel worker exits, the aggregation will show only 2 indexes 
> completed.

Indeed.

It might have already been discussed but other than using a new shmem
hash for parallel vacuum, I wonder if we can allow workers to change
the leader’s progress information. It would break the assumption that
the backend status entry is modified by its own backend, though. But
it might help for progress updates of other parallel operations too.
This essentially does the same thing as what the current patch does
but it doesn't require a new shmem hash.

Another idea I come up with is that the parallel vacuum leader checks
PVIndStats.status and updates how many indexes are processed to its
progress information. The leader can check it and update the progress
information before and after index vacuuming. And possibly we can add
a callback to the main loop of index AM's bulkdelete and vacuumcleanup
so that the leader can periodically make it up-to-date.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


Reply via email to