Good day, hackers. 13.05.2025 19:22, Yura Sokolov wrote: > Good day. > > During conversation about other patch, Andres pointed out SpinLockAcquire > have semantic of full memory barrier [1] . > > spin.h also states: > >> Load and stores operation in calling code are guaranteed not to be > reordered with respect to these operations, because they include a compiler > barrier. > > But on ARM/ARM64 platform tas in s_lock.h is implemented as simple call to > __sync_lock_test_and_set, and . And GCC's documentation states it has only > Acquire semantic, and __sync_lock_release (used as implementation of > SpinLockRelease) has only Release semantic [2]. > > Compiling and disassembling postgresql with -O2 optimization level on arm64 > we can see that SpinLockAquire is compiled down to call to > __aarch64_swp4_sync (in optimistic case) which has disassemble: > > 0000000000662cc0 <__aarch64_swp4_sync>: > 662cc0: d503245f bti c > 662cc4: 90001bf0 adrp x16, 9de000 <hist_start+0x3d18> > 662cc8: 39582210 ldrb w16, [x16, #1544] > 662ccc: 34000070 cbz w16, 662cd8 > 662cd0: b8a08020 swpa w0, w0, [x1] > 662cd4: d65f03c0 ret > 662cd8: 2a0003f0 mov w16, w0 > 662cdc: 885f7c20 ldxr w0, [x1] > 662ce0: 88117c30 stxr w17, w16, [x1] > 662ce4: 35ffffd1 cbnz w17, 662cdc > 662ce8: d5033bbf dmb ish > 662cec: d65f03c0 ret > > Here we see 'swpa' instruction which has only acquire semantic [3]. > > And SpinLockRelease generates 'stlr' instruction, which has release > semantic, iirc. > > Godbolt shows noticable difference between __sync_lock_test_and_set vs > __atomic_exchange_n(ATOMIC_SEQ_CST) (and __sync_lock_release vs > __atomic_store_n(ATOMIC_SEQ_CST)) using GCC even with march=armv7 [4]: > - __atomic_exchange_n and __atomic_store_n has 'dmb ish' instruction both > before and after load/store operations, > - but __sync_lock_test_and_set has 'dmb ish' only after load/store > operations and __sync_lock_release only before store operation. (You can > see same pattern in __aarch64_swp4_sync above in case 'swpa' instruction is > not present). > > Therefore I think neither SpinLockAcquire nor SpinLockRelease have no full > memory barrier semantic on ARM/ARM64 at the moment when compiled with GCC. > > Surprisingly, Clang puts 'dmb ish' both before and after load/store at > __sync_lock_test_and_set, but does not the same for __sync_lock_release. > > As a simple fix I suggest to put __sync_synchronize before > __sync_lock_test_and_set and after __sync_lock_release. > > Initially I wanted to use __atomic_exchange_n and __atomic_store_n. But in > absence of support for atomic instructions, their inner loop doesn't look > to be safe for reordering since it uses ldar+stlr loop: > - looks like previous operations could be reordered after 'ldar'. > - similarly following operations could be reordered before 'stlr'. > There's very relevant discussion at GCC's bugzilla [5]. > Probably this hypothesis is not valid. I believe we have to ask someone > closely familiar with ARM internals to be sure. > > # Alternative. > > As an alternative, we may fix comments about > SpinLockAcquire/SpinLockRelease to state they provide acquire and release > barrier semantics only. And then check all their uses and set appropriate > memory barriers where their usage as full memory barriers is detected (as > in [1]). > > [1] > https://postgr.es/m/5ccgykypol3azijw2chqnpg3rhuwjtwsmbfs3pgcqm7fu6laus%40wppo6zcfszay > [2] > https://gcc.gnu.org/onlinedocs/gcc-15.1.0/gcc/_005f_005fsync-Builtins.html#index-_005f_005fsync_005flock_005ftest_005fand_005fset > [3] > https://developer.arm.com/documentation/100069/0606/Data-Transfer-Instructions/SWPA--SWPAL--SWP--SWPL--SWPAL--SWP--SWPL > [4] https://godbolt.org/z/h5P9fjzWd > [5] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
I've recognized my mistake about SpinLockAcquire/SpinLockRelease requirements. In [1] I've describe my thoughts and suggested to introduce read-write spin lock for the cases when there are a lot of readers and few (or one) writers. [1] https://postgr.es/m/450dafae-b620-4726-abc2-53347c419394%40postgrespro.ru -- regards Yura Sokolov aka funny-falcon