On 24/07/2024 19:12, Heikki Linnakangas wrote:
On 04/07/2024 15:20, Jelte Fennema-Nio wrote:
On Thu, 4 Jul 2024 at 12:32, Heikki Linnakangas <hlinn...@iki.fi> wrote:
We currently don't do any locking on the ProcSignal array. For query
cancellations, that's good because a query cancel packet is processed
without having a PGPROC entry, so we cannot take LWLocks. We could use
spinlocks though. In this patch, I used memory barriers to ensure that
we load/store the pid and the cancellation key in a sensible order, so
that you cannot e.g. send a cancellation signal to a backend that's just
starting up and hasn't advertised its cancellation key in ProcSignal
yet. But I think this might be simpler and less error-prone by just
adding a spinlock to each ProcSignal slot. That would also fix the
existing race condition where we might set the pss_signalFlags flag for
a slot, when the process concurrently terminates and the slot is reused
for a different process. Because of that, we currently have this caveat:
"... all the signals are such that no harm is done if they're mistakenly
fired". With a spinlock, we could eliminate that race.
I think a spinlock would make this thing a whole concurrency stuff a
lot easier to reason about.
+ slot->pss_cancel_key_valid = false;
+ slot->pss_cancel_key = 0;
If no spinlock is added, I think these accesses should still be made
atomic writes. Otherwise data-races on those fields are still
possible, resulting in undefined behaviour. The memory barriers you
added don't prevent that afaict. With atomic operations there are
still race conditions, but no data-races.
Actually it seems like that same argument applies to the already
existing reading/writing of pss_pid: it's written/read using
non-atomic operations so data-races can occur and thus undefined
behaviour too.
Ok, here's a version with spinlocks.
I went back and forth on what exactly is protected by the spinlock. I
kept the "volatile sig_atomic_t" type for pss_signalFlags, so that it
can still be safely read without holding the spinlock in
CheckProcSignal, but all the functions that set the flags now hold the
spinlock. That removes the race condition that you might set the flag
for wrong slot, if the target backend exits and the slot is recycled.
The race condition was harmless and there were comments to note it, but
it doesn't occur anymore with the spinlock.
(Note: Thomas's "Interrupts vs signals" patch will remove ps_signalFlags
altogether. I'm looking forward to that.)
- volatile pid_t pss_pid;
+ pid_t pss_pid;
Why remove the volatile modifier?
Because I introduced a memory barrier to ensure the reads/writes of
pss_pid become visible to other processes in right order. That makes the
'volatile' unnecessary IIUC. With the spinlock, the point is moot however.
I:
- fixed a few comments,
- fixed a straightforward failure with EXEC_BACKEND (ProcSignal needs to
be passed down in BackendParameters now),
- put back the snippet to signal the whole process group if supported,
which you pointed out earlier
and committed this refactoring patch.
Thanks for the review!
--
Heikki Linnakangas
Neon (https://neon.tech)