Re: Add index scan progress to pg_stat_progress_vacuum

Imseih (AWS), Sami Wed, 23 Feb 2022 10:02:35 -0800

    +     <row>
    +      <entry role="catalog_table_entry"><para role="column_definition">
    +       <structfield>indexes_total</structfield> <type>bigint</type>
    +      </para>
    +      <para>
    +       The number of indexes to be processed in the
    +       <literal>vacuuming indexes</literal> or <literal>cleaning up 
indexes</literal> phase
    +       of the vacuum.
    +      </para></entry>
    +     </row>
    +
    +     <row>
    +      <entry role="catalog_table_entry"><para role="column_definition">
    +       <structfield>indexes_processed</structfield> <type>bigint</type>
    +      </para>
    +      <para>
    +       The number of indexes processed in the
    +       <literal>vacuuming indexes</literal> or <literal>cleaning up 
indexes</literal> phase.
    +       At the start of an index vacuum cycle, this value is set to 
<literal>0</literal>.
    +      </para></entry>
    +     </row>


    > Will these be set to 0 for failsafe vacuums and vacuums with INDEX_CLEANUP
    > turned off?

If the failsafe kicks in midway through a vacuum, the number indexes_total will 
not be reset to 0. If INDEX_CLEANUP is turned off, then the value will be 0 at 
the start of the vacuum.

    +typedef struct VacWorkerProgressInfo
    +{
    +    int                     num_vacuums;    /* number of active VACUUMS 
with parallel workers */
    +    int                     max_vacuums;    /* max number of VACUUMS with 
parallel workers */
    +    slock_t         mutex;
    +    VacOneWorkerProgressInfo vacuums[FLEXIBLE_ARRAY_MEMBER];
    +} VacWorkerProgressInfo;

    > max_vacuums appears to just be a local copy of MaxBackends.  Does this
    > information really need to be stored here?  Also, is there a strong reason
    > for using a spinlock instead of an LWLock?

First, The BTVacInfo code in backend/access/nbtree/nbtutils.c inspired this, so 
I wanted to follow this pattern. With that said, I do see max_vacuums being 
redundant here, and I am inclined to replace it with a MaxBackends() call. 

Second, There is no strong reason to use spinlock here except I incorrectly 
assumed it will be better for this case. After reading more about this and 
reading up src/backend/storage/lmgr/README, an LWLock will be better.

    +void
    +vacuum_worker_end(int leader_pid)
    +{
    +    SpinLockAcquire(&vacworkerprogress->mutex);
    +    for (int i = 0; i < vacworkerprogress->num_vacuums; i++)
    +    {
    +        VacOneWorkerProgressInfo *vac = &vacworkerprogress->vacuums[i];
    +
    +        if (vac->leader_pid == leader_pid)
    +        {
    +            *vac = 
vacworkerprogress->vacuums[vacworkerprogress->num_vacuums - 1];
    +            vacworkerprogress->num_vacuums--;
    +            SpinLockRelease(&vacworkerprogress->mutex);
    +            break;
    +        }
    +    }
    +    SpinLockRelease(&vacworkerprogress->mutex);
    +}

  >  I see this loop pattern in a couple of places, and it makes me wonder if
  >  this information would fit more naturally in a hash table.

Followed the pattern in backend/access/nbtree/nbtutils.c for this as well. 
Using dynahash may make sense here if it simplifies the code. Will look.

    +        if (callback)
    +            callback(values, 3);

  >  Why does this need to be set up as a callback function?  Could we just call
  >  the function if cmdtype == PROGRESS_COMMAND_VACUUM?  ISTM that is pretty
  >  much all this is doing.

The intention will be for the caller to set the callback early on in the 
function using the existing " if (pg_strcasecmp(cmd, "VACUUM") == 0), etc." 
statement. This way we avoid having to add another if/else block before 
tuplestore_putvalues is called.

--
Sami Imseih
Amazon Web Services

Re: Add index scan progress to pg_stat_progress_vacuum

Reply via email to