Matthias van de Meent <[email protected]> wrote:
> On Thu, 4 Dec 2025 at 09:34, Antonin Houska <[email protected]> wrote:
> >
> > ISTM that what you consider a problem is copying the table using
> > PGPROC-based
> > snapshot and applying logically decoded commits to the result - is that what
> > you mean?
>
> Correct.
>
> > In fact, LR (and also REPACK) uses snapshots generated by the logical
> > decoding
> > system. The information on running/committed transactions is based here on
> > replaying WAL, not on PGPROC.
>
> OK, that's good to know. For reference, do you know where this is
> documented, explained, or implemented?
All my knowledge of these things is from source code.
> I'm asking, because the code that I could find didn't seem use any
> special snapshot (tablesync.c uses
> `PushActiveSnapshot(GetTransactionSnapshot())`),
My understanding is that this is what happens on the subscription side. Some
lines above that however, walrcv_create_slot(..., CRS_USE_SNAPSHOT, ...) is
called which in turn calls CreateReplicationSlot(..., CRS_USE_SNAPSHOT, ...)
on the publication side and it sets that snapshot for the transaction on the
remote (publication) side:
else if (snapshot_action == CRS_USE_SNAPSHOT)
{
Snapshot snap;
snap = SnapBuildInitialSnapshot(ctx->snapshot_builder);
RestoreTransactionSnapshot(snap, MyProc);
}
> and the other
> reference to LR's snapshots (snapbuild.c, and inside
> `GetTransactionSnapshot()`) explicitly said that its snapshots are
> only to be used for catalog lookups, never for general-purpose
> queries.
I think the reason is that snapbuild.c only maintains snapshots for catalog
scans, because in logical decoding you only need to scan catalog tables. This
is especially to find out which tuple descriptor was valid when particular
data change (INSERT / UPDATE / DELETE) was WAL-logged - the output plugin
needs the correct version of tuple descriptor to deform each tuple. However
there is no need to scan non-catalog tables: as long as wal_level=logical, the
WAL records contains all the information needed for logical replication
(including key values). So snapbuild.c only keeps track of transactions that
modify system catalog and uses this information to create the snapshots.
A special case is if you pass need_full_snapshot=true to
CreateInitDecodingContext(). In this case the snapshot builder tracks commits
of all transactions, but only does so until SNAPBUILD_CONSISTENT state is
reached. Thus, just before the actual decoding starts, you can get a snapshot
to scan even non-catalog tables (SnapBuildInitialSnapshot() creates that, like
in the code above). (For REPACK, I'm trying to teach snapbuild.c recognize
that transaction changed one particular non-catalog table, so it can build
snapshots to scan this one table anytime.)
Another reason not to use those snapshots for non-catalog tables is that
snapbuild.c creates snapshots of the kind SNAPSHOT_HISTORIC_MVCC. If you used
this for non-catalog tables, HeapTupleSatisfiesHistoricMVCC() would be used
for visibility checks instead of HeapTupleSatisfiesMVCC(). The latter can
handle tuples surviving from older version of postgres, but the earlier
cannot:
/* Used by pre-9.0 binary upgrades */
if (tuple->t_infomask & HEAP_MOVED_OFF)
No such tuples should appear in the catalog because initdb always creates it
from scratch.
For LR, SnapBuildInitialSnapshot() takes care of the conversion from
SNAPSHOT_HISTORIC_MVCC to SNAPSHOT_MVCC.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com