Re: Adding REPACK [concurrently]

Antonin Houska Sat, 13 Dec 2025 11:39:42 -0800

Mihail Nikalayeu <[email protected]> wrote:

> On Tue, Dec 9, 2025 at 7:52 PM Antonin Houska <[email protected]> wrote:
> > Worker makes more sense to me - the initial implementation is in 0005.
> 
> Comments for 0005, so far:


Thanks!

> ---
> > export_initial_snapshot
> 
> Hm, should we use ExportSnapshot instead? And ImportSnapshort to import it.

There is at least one thing that I don't want: ImportSnapshot calls
SetTransactionSnapshot() at the end. I chose the way leader process uses to
serialize and pass snapshot to background workers.

> ---
> > get_initial_snapshot
> 
> Should we check if a worker is still alive while waiting? Also is
> "process_concurrent_changes".

ConditionVariableSleep() should handle that - see the WL_EXIT_ON_PM_DEATH flag
in ConditionVariableTimedSleep().

> And AFAIU RegisterDynamicBackgroundWorker does not guarantee new
> workers to be started (in case of some fork-related issues).

Yes, user will get ERROR in such a case. This is different from parallel
workers in query processing: if parallel worker cannot be started, the leader
(AFAICS) still executes the query. I'm not sure though if we should implement
REPACK (CONCURRENTLY) in such a way that it works even w/o the worker. The
code would be more complex and the behaviour quite different (I mean the
possibly huge amount of unprocessed WAL that you pointed out earlier.)

> ---
> > Assert(res = SHM_MQ_DETACHED);
> 
> ==

Thanks!

> ---
> > /* Wait a bit before we retry reading WAL. */
> > (void) WaitLatch(MyLatch,
> >              WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
> >              1000L,
> >              WAIT_EVENT_REPACK_WORKER_MAIN);
> 
> Looks like we need ResetLatch(MyLatch); here.

You seem to be right.

> ---
> > * - decoding_ctx - logical decoding context, to capture concurrent data
> 
> Need to be removed together with parameters.

Do you mean in 0005? (It'd help if you pasted the hunk headers.) This should
be fixed in v28 [1]

> ---
> > hpm_context = AllocSetContextCreate(TopMemoryContext,
> >                            "ProcessParallelMessages",
> >                            ALLOCSET_DEFAULT_SIZES);
> 
> "ProcessRepacklMessages"

ok, the copy and pasting is a problem that needs to be addressed (mentioned in
the last paragraph of the commit message of 0005).

> ---
> > if (XLogRecPtrIsInvalid(lsn_upto))
> > {
> >    SpinLockAcquire(&shared->mutex);
> >    lsn_upto = shared->lsn_upto;
> >    /* 'done' should be set at the same time as 'lsn_upto' */
> >    done = shared->done;
> >    SpinLockRelease(&shared->mutex);
> >
> >    /* Check if the work happens to be complete. */
> >    continue;
> > }
> 
> May be moved to the start of the loop to avoid duplication.

I found more problems in this part when working on v28, maybe check that.

> ---
> > SpinLockAcquire(&shared->mutex);
> > valid = shared->sfs_valid;
> > SpinLockRelease(&shared->mutex);
> 
> Better to remember last_exported here to avoid any races/misses.

What races/misses exactly?

> ---
> > shared->lsn_upto = InvalidXLogRecPtr;
> 
> I think it is better to clear it once it is read (after removing duplication).

Maybe, I'll think about it.

> ---
> > bool       done;
> 
> bool exit_after_lsn_upto?

Not sure.

> ---
> > bool       sfs_valid;
> 
> Do we really need it? I think it is better to leave only last_exported
> and in process_concurrent_changes wait add argument
> (last_processed_file) and wait for last_exported to become higher.

I'll consider that (The variable is replaced in the 0006 part of v28, but the
idea should still be applicable.)

> ---
> What if we reverse roles of leader-worker?
> 
> Leader gets a snapshot, transfers it to workers (multiple probably for
> parallel scan) using already ready mechanics - workers are processing
> the scan of the table in parallel. Leader decodes the WAL.

Insertion into a table by multiple workers is a special thing, but maybe it'd
be doable in this case, but ...

> Also, workers may be assigned with a list of indexes they need to build.
> 
> Feels like it reuses more from current infrastructure and also needs
> less different synchronization logic. But I'm not sure about the
> indexes phase - maybe it is not so easy to do.

... my feelings were the opposite, i.e. I thought require higher amount of
code rearrangement. Moreover, the part 0006 of v28 (snapshot switching) would
be trickier. It processes one range of blocks after another, and parallelism
would make it more difficult.

> ---
> Also, should we add some kind of back pressure between building
> indexes/new heap and num of WAL we have?
> But probably it is out of scope of the patch.

Do you mean that the decoding worker should be less active if the amount of
WAL doesn't grow too fast?

> ---
> To build N indexes we need to scan table N times. What is about
> building multiple indexes during a single heap scan?

That sounds like a separate feature, and similarly difficult as enhancing
CREATE INDEX so it can create multiple indexes at a time.

> --
> Just a gentle reminder about the XMIN_COMMITTED flag and WAL storm
> after the switch.

ok, I have it in my notes, moved it more to the top :-)

[1] https://www.postgresql.org/message-id/210036.1765651719%40localhost

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Adding REPACK [concurrently]

Reply via email to