Hello, Heikki! On Tue, Dec 16, 2025 at 2:43 PM Heikki Linnakangas <[email protected]> wrote: > Firstly, I think the STIR approach is the right approach at the high > level. I don't like the logical decoding idea, for the reasons Matthias > and Mikhail already mentioned. Maybe there's some synergy with REPACK, > but it feels different enough that I doubt it. Let's focus on the STIR > approach.
Thanks for checking that thread. > In the first transaction that inserts the catalog entry with > indisready=false, also create a shmem struct. In that struct, we can > store information about what state the build is in, and whether > insertions should go to the STIR or to the real index. Yes, it might look simpler, but from other point of view: * we need to check that shmem for each index insert (whenever we build something or not) * or we need to put something into an index list with information "write instead of that index into that shmem" * currently we have some proven mechanics related to transactions, catalog snapshots, relcache, invalidation etc. Some tricky synchronization may be required here (to avoid any drift of way transaction see shmem and relcache). > As one small incremental improvement, we could use the shmem struct to > avoid one of the "wait for all transactions" steps in the current > implementation. In validate_index(), after we mark the index as > 'indisready' we have to wait for all transactions to finish, to ensure > that all subsequent insertions have seen the indisready=true change. We > could avoid that by setting a flag in the shmem struct instead, so that > all backends would see instantly that the flag is flipped. That may be tricky. If I set a flag - what if someone checked it 1ns ago and decided it is not required to write something in the index? How to ensure that now everyone really knows about it without heavy locking? In all current maintenance operations we ensure in some way (by locking\unlocking a relation or waiting for transactions) everyone has fresh enough relcache. Don't think we should involve anything special for the CIC scenario here. But some universal solution (like ensuring that every other transaction that had an outdated relcache is ended) may benefit all related scenarios. > Improved STIR approach > > Here's another proposal using the STIR approach. It's a little different > from the patches so far: > .... > 7. Retail insert all the tuples from the STIR to the index. Hm, that clever idea... At the same time my tests show what index scan is light compared to heap scans (especially second one - it is not paralleled). > Snapshot refreshing > ------------------- > - In step 4, while we are building the index, we can periodically get a > new snapshot, update the cutoff in the shmem struct, and drain the STIR > of the tuples that are already in it. But together with snapshot resetting such an approach is still more effective (in terms of index scan) but feels much more complex, including some complex locking. Need to think a little bit here. Best regards, Mikhail.
