Hi, On 2025-11-20 15:45:22 -0500, Greg Burd wrote: > Dave and I have been working together to get ARM64 with MSVC functional. > The attached patches accomplish that. Dave is the author of the first > which addresses some build issues and fixes the spin_delay() semantics, > I did the second which fixes some atomics in this combination.
Thanks for working on this! > This pointed a finger at the atomics, so I started there. We used a few > tools, but worth noting is https://godbolt.org/ where we were able to > quickly see that the MSVC assembly was missing the "dmb" barriers on > this platform. I'm not sure how long this link will be valid, but in > the short term here's our investigation: https://godbolt.org/z/PPqfxe1bn > > > PROBLEM DESCRIPTION > > PostgreSQL test failures occur intermittently on MSVC ARM64 builds, > manifesting as timing-dependent failures in critical sections > protected by spinlocks and atomic variables. The failures are > reproducible when the test suite is compiled with optimization flags > (/O2), particularly in the recovery/027_stream_regress test which > involves WAL replication and standby recovery. > > The root cause has two components: > > 1. Atomic operations lack memory barriers on ARM64 > 2. MSVC spinlock implementation lacks memory barriers on ARM64 > > TECHNICAL ANALYSIS > > PART 1: ATOMIC OPERATIONS MEMORY BARRIERS > > GCC's __atomic_compare_exchange_n() with __ATOMIC_SEQ_CST semantics > generates a call to __aarch64_cas4_acq_rel(), which is a library > function that provides explicit acquire-release memory ordering > semantics through either: > > * LSE path (modern ARM64): Using CASAL instruction with built-in > memory ordering [1][2] > > * Legacy path (older ARM64): Using LDAXR/STLXR instructions with > explicit dmb sy instruction [3] > > MSVC's _InterlockedCompareExchange() intrinsic on ARM64 performs the > atomic operation but does NOT emit the necessary Data Memory Barrier > (DMB) instructions [4][5]. I couldn't reproduce this result when playing around on godbolt. By specifying /arch:armv9.4 msvc can be convinced to emit the code for the intrinsics inline (at least for most of them). And that makes it visible that _InterlockedCompareExchange() results in a "casal" instruction. Looking that up shows: https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions/CASA--CASAL--CAS--CASL--CASAL--CAS--CASL--A64- which includes these two statements: "CASA and CASAL load from memory with acquire semantics." "CASL and CASAL store to memory with release semantics." > Issue 2: S_UNLOCK() uses only a compiler barrier > > _ReadWriteBarrier() is a compiler barrier, NOT a hardware memory > barrier [6]. It prevents the compiler from reordering operations, but > the CPU can still reorder memory operations. This is fundamentally > insufficient for ARM64's weaker memory model. Yea, that seems broken on a non-TSO architecture. Is the problem fixed if you change just this to include a proper barrier? Greetings, Andres Freund
