Hi Thomas,

08.09.2023 22:39, Thomas Munro wrote:
With debugging logging added I see (on 7389aad63~1) that one process
really sends SIGURG to another, and the latter reaches poll(), but it
just got no signal, it's signal handler not called and poll() just waits...
Thanks for working so hard on this Alexander.  That is a surprising
discovery!  So changes to the signal handler arrangements in the
*postmaster* before the child was forked affected this?

Yes, I think we deal with something like that. I can try to deduce a minimum
change that affects reproducing the issue, but may be it's not that important.
Perhaps we now should think of escalating the problem to FreeBSD developers?
I wonder, what kind of reproducer they find acceptable. A standalone C
program only or maybe a script that compiles/installs postgres and runs
our test will do too?

So it looks like the ARM weak memory model is not the root cause of the
issue. But as far as I can see, it's still specific to FreeBSD (but not
specific to a compiler — I used gcc and clang with the same success).
Idea:  FreeBSD 13 introduced a new mechanism called sigfastblock[1],
which lets system libraries control signal blocking with atomic memory
tricks in a word of user space memory.  I have no particular theory
for why it would be going wrong here (I don't expect us to be using
any of the stuff that would use it, though I don't understand it in
detail so that doesn't say much), but it occurred to me that all
reports so far have been on 13.x or 14.  I wonder...  If you have a
good fast recipe for reproducing this, could you also try it on
FreeBSD 12.4?

It was a happy guess! I checked the reproduction on
FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212
and got the same results as on FreeBSD 14:
REL_12_STABLE - failed on iteration 3
REL_15_STABLE - failed on iteration 1
REL_16_STABLE - 10 iterations with no failure

But on FreeBSD 12.4-RELEASE r372781:
REL_12_STABLE - 20 iterations with no failure
REL_15_STABLE - 20 iterations with no failure

BTW, I also retested 7389aad63 on FreeBSD 14 and got no failure for 100
iterations.

Best regards,
Alexander


Reply via email to