Tom Lane wrote: > It's still not entirely clear what's happening on okapi, but in the > meantime I've thought of an easily-reproducible way to cause similar > failures in any branch. That is to run CREATE INDEX CONCURRENTLY > with default_transaction_isolation = serializable. Then, snapmgr.c > will set up a transaction snapshot (actually identical to the > "reference snapshot" used by DefineIndex), and that will not get > released, so the process's xmin doesn't get cleared, and we have > a deadlock hazard.
Hah, ouch. > I experimented with running the isolation tests under "alter system set > default_transaction_isolation to serializable". Oddly, multiple-cic > tends to not fail that way for me, though if I reduce the > isolation_schedule file to contain just that one test, it fails nine > times out of ten. Leftover activity from the previous tests must be > messing up the timing somehow. Anyway, the problem is definitely real. > (A couple of the other isolation tests do fail reliably under this > scenario; is it worth hardening them?) Yes, I think it's worth making them pass somehow -- see commits f18795e7b74c, a0eae1a2eeb6. > I thought for a bit about trying to force C.I.C.'s transactions to > be run with a lower transaction isolation level, but that seems messy > and I'm not very sure it wouldn't have bad side-effects. A much simpler > fix is to just start YA transaction before waiting, as in the attached > proposed patch. (With the transaction restart, I feel sufficiently > confident that there should be no open snapshots that it seems okay > to put in the Assert I was previously afraid to add.) Seems like an acceptable fix to me. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services