Hannu Krosing <[email protected]> wrote: > On Fri, Nov 28, 2025 at 6:58 PM Hannu Krosing <[email protected]> wrote: > > > > On Fri, Nov 28, 2025 at 5:58 PM Matthias van de Meent > > <[email protected]> wrote: > > > > > ... > > > I'm a bit worried, though, that LR may lose updates due to commit > > > order differences between WAL and PGPROC. I don't know how that's > > > handled in logical decoding, and can't find much literature about it > > > in the repo either. > > > > Now the reference to logical decoding made me think that maybe to real > > fix for CIC would be to leverage logical decoding for the 2nd pass of > > CIC and not worry about in-page visibilities at all. > > And if we are concerned about having possibly to scan more WAL than we > would have had to scan the table, we can start a > tuple-to-index-collector immediately after starting the CIC. > > For extra efficiency gains the collector itself should have two phases > > 1. While the first pass of CIC is collecting the visible tuple for > index the logical decoding collector also collects any new tuples > added after the CIC start. > 2. When the first pass collection finishes, it also gets the indexes > collected so far by the logical decoding collectoir and adds them to > the first set before the sorting and creating the index. > > 3. once the initial index is created, the CIC just gets whatever else > was collected after 2. and adds these to the index
The core problem here is that the snapshot you need for the first pass restricts VACUUM on all tables in the database. The same problem exists for REPACK (CONCURRENTLY) and we haven't resolved it yet. With logical replication, we cannot really use multiple snapshots as Mihail is proposing elsewhere in the thread, because the logical decoding system only generates the snapshot for non-catalog tables once (LR uses that snapshot for the initial table synchronization). Only snapshots for system catalog tables are then built as the WAL decoding progresses. It can be worked around by considering regular table as catalog during the processing, but it currently introduces quite some overhead: https://www.postgresql.org/message-id/178741.1743514291%40localhost Perhaps we could enhance the logical decoding so that it gathers the information needed to build snapshots (AFAICS it's mostly about the XLOG_HEAP2_NEW_CID record) not only for catalog tables, but also for particular non-catalog table(s). However, for these non-catalog tables, the actual snapshot build should only take place when the snapshot is actually needed. (For catalog tables, each data change triggers the build of a new snapshot.) So in general I agree with what you say elsewhere in the thread that it might be worth to enhance the logical decoding a bit. Transient enabling of the decoding, only for specific tables (i.e. not requiring wal_level=logical), is another problem. I proposed a patch for that, but not sure it has been reviewed yet: https://www.postgresql.org/message-id/152010.1751307725%40localhost (See the 0007 part.) -- Antonin Houska Web: https://www.cybertec-postgresql.com
