On Thu, May 26, 2022 at 2:35 PM Tom Lane <[email protected]> wrote:
> Thomas Munro <[email protected]> writes:
> > On a more practical note, I don't have access to the BF database right
> > now. Would you mind checking if "latch already owned" has occurred on
> > any other animals?
>
> Looking back 6 months, these are the only occurrences of that string
> in failed tests:
>
> sysname | branch | snapshot | stage |
> l
> ---------+--------+---------------------+----------------+-------------------------------------------------------------------
> gharial | HEAD | 2022-04-28 23:37:51 | Check | 2022-04-28
> 18:36:26.981 MDT [22642:1] ERROR: latch already owned
> gharial | HEAD | 2022-05-06 11:33:11 | IsolationCheck | 2022-05-06
> 10:10:52.727 MDT [7366:1] ERROR: latch already owned
> gharial | HEAD | 2022-05-24 06:31:31 | IsolationCheck | 2022-05-24
> 02:44:51.850 MDT [13089:1] ERROR: latch already owned
> (3 rows)
Thanks. Hmm. So far it's always a parallel worker. The best idea I
have is to include the ID of the mystery PID in the error message and
see if that provides a clue next time.
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 78c6a89271..07b8273a7d 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -402,6 +402,8 @@ InitSharedLatch(Latch *latch)
void
OwnLatch(Latch *latch)
{
+ pid_t previous_owner;
+
/* Sanity checks */
Assert(latch->is_shared);
@@ -410,8 +412,11 @@ OwnLatch(Latch *latch)
Assert(selfpipe_readfd >= 0 && selfpipe_owner_pid == MyProcPid);
#endif
- if (latch->owner_pid != 0)
- elog(ERROR, "latch already owned");
+ previous_owner = latch->owner_pid;
+ if (previous_owner != 0)
+ elog(ERROR,
+ "latch already owned by PID %lu",
+ (unsigned long) previous_owner);
latch->owner_pid = MyProcPid;
}