Hi. We're getting unit-test failures that look like data-races, which somehow are getting very frequent recently, tripping our CI.
Basically we have a `services` table, for service discovery, which records the backend PID of its main DB Connection. Such that we can account for "stale" / "zombie" services, which are still registered (e.g. crashed), but who's connection is gone, when querying that `services` table, like so: ```sql select s.name, s.backend_pid, ... from services s join pg_stat_activity a on a.pid = s.backend_pid where a.datname = current_database() ``` The unit-test code is "linear", i.e. single-threaded, with pseudo-code: 1) Open Connection outer 2) Open Connection inner 3) register service using inner (i.e. add row in `services` with backend PID of inner) 4) Close Connection inner (calls PQfinish) 5) Run query above on outer. Sometimes still sees that "stale" row, despite closing inner. There's is no question about the ordering above. i.e. PQfinish(inner) returned before the query is run. there's of course the possibility of another (3rd) connection reusing the same backend PID, but that's remote a chance I believe. And I don't know of any other reliable ID for a connection, than its backend PID. So when and how fast does pg_stat_activity update, in the face of a PQfinish? What other scheme to detect "table rows" associated to "stale" connections? Thanks for any insights, --DD