On 10/24/21 03:40, Noah Misch wrote:
Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.

CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start.  That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index.  Queries that use the
resulting index can silently fail to find rows.  Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation.  It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).

Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.

Discussion: https://postgr.es/m/[email protected]


Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds. Since this commit, initdb never completes due to infinite retrying over and over (on the first RelationBuildDesc call).

We have a CLOBBER_CACHE_ALWAYS buildfarm machine "avocet", and that currently looks like this (top):

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2626 buildfa+ 20 0 202888 21416 20084 R 98.34 0.531 151507:16 /home/buildfarm/avocet/buildroot/REL9_6_STABLE/pgsql.build/tmp_install/home/buildfarm/avocet/buildroot/REL9_6_STABLE/inst/bin/postgres --boot -x1 -F

Yep, that's 151507 minutes, i.e. 104 days in initdb :-/


I haven't looked at this very closely yet, but it seems the whole problem is we do this at the very beginning:

  in_progress_list[in_progress_offset].invalidated = false;

  /*
   * find the tuple in pg_class corresponding to the given relation id
   */
  pg_class_tuple = ScanPgRelation(targetRelId, true, false);

which seems entirely self-defeating, because ScanPgRelation acquires a lock (on pg_class), which accepts invalidations, which invalidates system caches (in clobber_cache_always), which sets promptly sets

  in_progress_list[in_progress_offset].invalidated = false;

guaranteeing an infinite loop.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to