Matthias van de Meent <[email protected]> wrote:

> On Thu, 27 Nov 2025 at 17:56, Antonin Houska <[email protected]> wrote:
> >
> > Michail Nikolaev <[email protected]> wrote:
> >
> > > I think about revisiting (1) ({CREATE INDEX, REINDEX} CONCURRENTLY
> > > improvements) in some lighter way.
> >
> > I haven't read the whole thread yet, but the effort to minimize the impact 
> > of
> > C/RIC on VACUUM seems to prevail. Following is one more proposal. The core
> > idea is that C/RIC should avoid indexing dead tuples, however snapshot is 
> > not
> > necessary to distinguish dead tuple from a live one. And w/o snapshot, the
> > backend executing C/RIC does not restrict VACUUM on other tables.
> >
> > Concurrent (re)build of unique index appears to be another topic of this
> > thread, but I think this approach should handle the problem too. The 
> > workflow
> > is:
> >
> > 1. Create an empty index.
> >
> > 2. Wait until all transactions are aware of the index, so they take the new
> >    index into account when deciding on new HOT chains. (This is already
> >    implemented.)
> >
> > 3. Set the 'indisready' flag so the index is ready for insertions.
> >
> > 4. While other transactions can insert their tuples into the index now,
> >    process the table one page at a time this way:
> >
> >    4.1 Acquire (shared) content lock on the buffer.
> >
> >    4.3 Collect the root tuples of HOT chains - these and only these need to 
> > be
> >        inserted into the index.
> >
> >    4.4 Unlock the buffer.
> 
> 
> > 5. Once the whole table is processed, insert the collected tuples into the
> >    index.
> >
> >    To avoid insertions of tuples that concurrent transactions have just
> >    inserted, we'd need something like index.c:validate_index() (i.e. insert
> >    into the index only the tuples that it does not contain yet), but w/o
> >    snapshot because we already have the heap tuples collected.
> >
> >    Also it'd make sense to wait for completion of all the transactions that
> >    currently have the table locked for INSERT/UPDATE: some of these might 
> > have
> >    inserted their tuples into the heap, but not yet into the index. If we
> >    included some of those tuples into our collection and insert them into 
> > the
> >    index first, the other transactions could end up with ERROR when 
> > inserting
> >    those tuples again.
> >
> > 6. Set the 'indisvalid' flag so that the index can be used by queries.
> >
> > Note on pruning: As we only deal with the root tuples of HOT chains (4.3),
> > page pruning triggered by queries (heap_page_prune_opt) should not be
> > disruptive. Actually C/RIC can do the pruning itself it it appears to be
> > useful. For example, if whole HOT chain should be considered DEAD by the 
> > next
> > VACUUM, pruning is likely (depending on the OldestXid) to remove it so that 
> > we
> > do not insert TID of the root tuple into the index unnecessarily.
> [...]
> > Of course, I could have missed some important point, so please explain why
> > this concept is broken :-) Or let me know if something needs to be explained
> > more in detail. Thanks.
> 
> 1. When do you select and insert tuples that aren't part of a hot
> chain into the index, i.e. tuples that were never updated after they
> got inserted into the table? Or is every tuple "part of a hot chain"
> even if the tuple wasn't ever updated?

Right, I considered "standalone tuple" a HOT chain of length 1. So it'll be
picked too.

> 2. HOT chains can be created while the index wasn't yet present, and
> thus the indexed attributes of the root tuples can be different from
> the most current tuple of a chain. If you only gather root tuples, we
> could index incorrect data for that HOT chain. The correct approach
> here is to index only the visible tuples, as those won't have been
> updated in a non-HOT manner without all indexed attributes being
> unchanged.

Good point.

> 3. Having the index marked indisready before it contains any data is
> going to slow down the indexing process significantly:
> a. The main index build now must go through shared memory and buffer
> locking, instead of being able to use backend-local memory
> b. The tuple-wise insertion path (IndexAmRoutine->aminsert) can have a
> significantly higher overhead than the bulk insertion logic in
> ambuild(); in metrics of WAL, pages accessed (IO), and CPU cycles
> spent.
> 
> So, I don't think moving away from ambuild() as basis for initially
> building the index this is such a great idea.
> 
> (However, I do think that having an _option_ to build the index using
> ambuildempty()+aminsert() instead of ambuild() might be useful, if
> only to more easily compare "natural grown" indexes vs freshly built
> ones, but that's completely orthogonal to CIC snapshotting
> improvements.)

The retail insertions are not something this proposal depends on. I think it'd
be possible to build a separate index locally and then "merge" it with the
regular one. I just tried to propose a solution that does not need snapshots.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com


Reply via email to