On Thu, May 26, 2022 at 2:35 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.mu...@gmail.com> writes: > > On a more practical note, I don't have access to the BF database right > > now. Would you mind checking if "latch already owned" has occurred on > > any other animals? > > Looking back 6 months, these are the only occurrences of that string > in failed tests: > > sysname | branch | snapshot | stage | > l > ---------+--------+---------------------+----------------+------------------------------------------------------------------- > gharial | HEAD | 2022-04-28 23:37:51 | Check | 2022-04-28 > 18:36:26.981 MDT [22642:1] ERROR: latch already owned > gharial | HEAD | 2022-05-06 11:33:11 | IsolationCheck | 2022-05-06 > 10:10:52.727 MDT [7366:1] ERROR: latch already owned > gharial | HEAD | 2022-05-24 06:31:31 | IsolationCheck | 2022-05-24 > 02:44:51.850 MDT [13089:1] ERROR: latch already owned > (3 rows)
Thanks. Hmm. So far it's always a parallel worker. The best idea I have is to include the ID of the mystery PID in the error message and see if that provides a clue next time.
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index 78c6a89271..07b8273a7d 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -402,6 +402,8 @@ InitSharedLatch(Latch *latch) void OwnLatch(Latch *latch) { + pid_t previous_owner; + /* Sanity checks */ Assert(latch->is_shared); @@ -410,8 +412,11 @@ OwnLatch(Latch *latch) Assert(selfpipe_readfd >= 0 && selfpipe_owner_pid == MyProcPid); #endif - if (latch->owner_pid != 0) - elog(ERROR, "latch already owned"); + previous_owner = latch->owner_pid; + if (previous_owner != 0) + elog(ERROR, + "latch already owned by PID %lu", + (unsigned long) previous_owner); latch->owner_pid = MyProcPid; }