Re: parallel vacuum comments

Masahiko Sawada Thu, 04 Nov 2021 22:29:05 -0700

On Fri, Nov 5, 2021 at 4:42 AM Peter Geoghegan <p...@bowt.ie> wrote:
>
> On Wed, Nov 3, 2021 at 10:25 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
> > I've attached a draft patch. The patch incorporated all comments from
> > Andres except for the last comment that moves parallel related code to
> > another file. I'd like to discuss how we split vacuumlazy.c.
>
> This looks great!
>
> I wonder if this is okay, though:
>
> >     /* Process the indexes that can be processed by only leader process */
> > -   do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
> > +   lazy_serial_process_indexes(vacrel);
> >
> >     /*
> > -    * Join as a parallel worker.  The leader process alone processes all 
> > the
> > -    * indexes in the case where no workers are launched.
> > +    * Join as a parallel worker.  The leader process alone processes all
> > +    * parallel-safe indexes in the case where no workers are launched.
> >      */
> > -   do_parallel_processing(vacrel, lps->lvshared);
> > +   lazy_parallel_process_indexes(vacrel, lps->lvshared, 
> > vacrel->lps->lvsharedindstats);
> >
> >     /*
> >      * Next, accumulate buffer and WAL usage.  (This must wait for the 
> > workers
> > @@ -2786,6 +2734,18 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, 
> > int nworkers)
> >             InstrAccumParallelQuery(&lps->buffer_usage[i], 
> > &lps->wal_usage[i]);
> >     }
>
> Since "The leader process alone processes all parallel-safe indexes in
> the case where no workers are launched" (no change there), I wonder:
> how does the leader *know* that it's the leader (and so can always
> process any indexes) inside its call to
> lazy_parallel_process_indexes()? Or, does the leader actually process
> all indexes inside lazy_serial_process_indexes() in this edge case?
> (The edge case where a parallel VACUUM has no workers at all, because
> they couldn't be launched by the core parallel infrastructure.)


lazy_serial_process_indexes() handles only parallel-unsafe (i.g.,
non-parallel-supported or too small indexes) indexes whereas
lazy_parallel_process_indexes() does that only parallel-safe indexes.
Therefore in the edge case, the leader process all indexes by using
both functions.

>
> One small thing: the new "LVSharedIndStats.parallel_safe" field seems
> to be slightly misnamed. Isn't it more like
> "LVSharedIndStats.parallel_workers_can_process"? The index might
> actually be parallel safe *in principle*, while nevertheless being
> deliberately skipped over by workers due to a deliberate up-front
> choice made earlier, in compute_parallel_vacuum_workers(). Most
> obvious example of this is the choice to not use parallelism for a
> smaller index (say a partial index whose size is <
> min_parallel_index_scan_size).

Agreed.

>
> Another small thing, which is closely related to the last one:
>
> >  typedef struct LVSharedIndStats
> >  {
> > -   bool        updated;        /* are the stats updated? */
> > +   LVIndVacStatus status;
> > +
> > +   /*
> > +    * True if both leader and worker can process the index, otherwise only
> > +    * leader can process it.
> > +    */
> > +   bool    parallel_safe;
> > +
> > +   bool    istat_updated;      /* are the stats updated? */
> >     IndexBulkDeleteResult istat;
> >  } LVSharedIndStats;
>
> It would be nice to point out that the new
> "LVSharedIndStats.parallel_safe" field (or whatever we end up calling
> it) had comments that pointed out that it isn't a fixed thing for the
> entire VACUUM operation -- it's only fixed for an individual call to
> parallel_lazy_vacuum_or_cleanup_all_indexes() (i.e., it's only fixed
> for the "ambulkdelete portion" or the "amvacuumcleanup portion" of the
> entire VACUUM).

Agreed.

>
> Alternatively, you could just have two booleans, I think. You know,
> one for the "ambulkdelete portion", another for the "amvacuumcleanup
> portion". As I've said before, it would be nice if we only called
> parallel_vacuum_main() once per VACUUM operation (and not once per
> "portion"), but that's a harder and more invasive change. But I don't
> think you necessarily have to go that far for it to make sense to have
> two bools. Having two might allow you to make both of them immutable,
> which is useful.

If we want to make booleans immutable, we need three booleans since
parallel index cleanup behaves differently depending on whether
bulk-deletion has been called once. Anyway, if I understand your
suggestion correctly, it probably means to have booleans corresponding
to VACUUM_OPTION_PARALLEL_XXX flags. Does the worker itself need to
decide whether to skip conditionally-parallel-index-cleanup-safe
indexes?

>
> > Regarding tests, I’d like to add tests to check if a vacuum with
> > multiple index scans (i.g., due to small maintenance_work_mem) works
> > fine. But a problem is that we need at least about 200,000 garbage
> > tuples to perform index scan twice even with the minimum
> > maintenance_work_mem. Which takes a time to load tuples.
>
> Maybe that's okay. Do you notice that it takes a lot longer now? I did
> try to keep the runtime down when I committed the fixup to the
> parallel VACUUM related bug.

It took 6s on my laptop (was 400ms).

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: parallel vacuum comments

Reply via email to