Matthias van de Meent <[email protected]> wrote: > On Thu, 27 Nov 2025 at 17:56, Antonin Houska <[email protected]> wrote: > > > > Michail Nikolaev <[email protected]> wrote: > > > > > I think about revisiting (1) ({CREATE INDEX, REINDEX} CONCURRENTLY > > > improvements) in some lighter way. > > > > I haven't read the whole thread yet, but the effort to minimize the impact > > of > > C/RIC on VACUUM seems to prevail. Following is one more proposal. The core > > idea is that C/RIC should avoid indexing dead tuples, however snapshot is > > not > > necessary to distinguish dead tuple from a live one. And w/o snapshot, the > > backend executing C/RIC does not restrict VACUUM on other tables. > > > > Concurrent (re)build of unique index appears to be another topic of this > > thread, but I think this approach should handle the problem too. The > > workflow > > is: > > > > 1. Create an empty index. > > > > 2. Wait until all transactions are aware of the index, so they take the new > > index into account when deciding on new HOT chains. (This is already > > implemented.) > > > > 3. Set the 'indisready' flag so the index is ready for insertions. > > > > 4. While other transactions can insert their tuples into the index now, > > process the table one page at a time this way: > > > > 4.1 Acquire (shared) content lock on the buffer. > > > > 4.3 Collect the root tuples of HOT chains - these and only these need to > > be > > inserted into the index. > > > > 4.4 Unlock the buffer. > > > > 5. Once the whole table is processed, insert the collected tuples into the > > index. > > > > To avoid insertions of tuples that concurrent transactions have just > > inserted, we'd need something like index.c:validate_index() (i.e. insert > > into the index only the tuples that it does not contain yet), but w/o > > snapshot because we already have the heap tuples collected. > > > > Also it'd make sense to wait for completion of all the transactions that > > currently have the table locked for INSERT/UPDATE: some of these might > > have > > inserted their tuples into the heap, but not yet into the index. If we > > included some of those tuples into our collection and insert them into > > the > > index first, the other transactions could end up with ERROR when > > inserting > > those tuples again. > > > > 6. Set the 'indisvalid' flag so that the index can be used by queries. > > > > Note on pruning: As we only deal with the root tuples of HOT chains (4.3), > > page pruning triggered by queries (heap_page_prune_opt) should not be > > disruptive. Actually C/RIC can do the pruning itself it it appears to be > > useful. For example, if whole HOT chain should be considered DEAD by the > > next > > VACUUM, pruning is likely (depending on the OldestXid) to remove it so that > > we > > do not insert TID of the root tuple into the index unnecessarily. > [...] > > Of course, I could have missed some important point, so please explain why > > this concept is broken :-) Or let me know if something needs to be explained > > more in detail. Thanks. > > 1. When do you select and insert tuples that aren't part of a hot > chain into the index, i.e. tuples that were never updated after they > got inserted into the table? Or is every tuple "part of a hot chain" > even if the tuple wasn't ever updated?
Right, I considered "standalone tuple" a HOT chain of length 1. So it'll be picked too. > 2. HOT chains can be created while the index wasn't yet present, and > thus the indexed attributes of the root tuples can be different from > the most current tuple of a chain. If you only gather root tuples, we > could index incorrect data for that HOT chain. The correct approach > here is to index only the visible tuples, as those won't have been > updated in a non-HOT manner without all indexed attributes being > unchanged. Good point. > 3. Having the index marked indisready before it contains any data is > going to slow down the indexing process significantly: > a. The main index build now must go through shared memory and buffer > locking, instead of being able to use backend-local memory > b. The tuple-wise insertion path (IndexAmRoutine->aminsert) can have a > significantly higher overhead than the bulk insertion logic in > ambuild(); in metrics of WAL, pages accessed (IO), and CPU cycles > spent. > > So, I don't think moving away from ambuild() as basis for initially > building the index this is such a great idea. > > (However, I do think that having an _option_ to build the index using > ambuildempty()+aminsert() instead of ambuild() might be useful, if > only to more easily compare "natural grown" indexes vs freshly built > ones, but that's completely orthogonal to CIC snapshotting > improvements.) The retail insertions are not something this proposal depends on. I think it'd be possible to build a separate index locally and then "merge" it with the regular one. I just tried to propose a solution that does not need snapshots. -- Antonin Houska Web: https://www.cybertec-postgresql.com
