Good day, hackers.

13.05.2025 19:22, Yura Sokolov wrote:
> Good day.
> 
> During conversation about other patch, Andres pointed out SpinLockAcquire
> have semantic of full memory barrier [1] .
> 
> spin.h also states:
> 
>> Load and stores operation in calling code are guaranteed not to be
> reordered with respect to these operations, because they include a compiler
> barrier.
> 
> But on ARM/ARM64 platform tas in s_lock.h is implemented as simple call to
> __sync_lock_test_and_set, and . And GCC's documentation states it has only
> Acquire semantic, and __sync_lock_release (used as implementation of
> SpinLockRelease) has only Release semantic [2].
> 
> Compiling and disassembling postgresql with -O2 optimization level on arm64
> we can see that SpinLockAquire is compiled down to call to
> __aarch64_swp4_sync (in optimistic case) which has disassemble:
> 
> 0000000000662cc0 <__aarch64_swp4_sync>:
>   662cc0:       d503245f        bti     c
>   662cc4:       90001bf0        adrp    x16, 9de000 <hist_start+0x3d18>
>   662cc8:       39582210        ldrb    w16, [x16, #1544]
>   662ccc:       34000070        cbz     w16, 662cd8
>   662cd0:       b8a08020        swpa    w0, w0, [x1]
>   662cd4:       d65f03c0        ret
>   662cd8:       2a0003f0        mov     w16, w0
>   662cdc:       885f7c20        ldxr    w0, [x1]
>   662ce0:       88117c30        stxr    w17, w16, [x1]
>   662ce4:       35ffffd1        cbnz    w17, 662cdc
>   662ce8:       d5033bbf        dmb     ish
>   662cec:       d65f03c0        ret
> 
> Here we see 'swpa' instruction which has only acquire semantic [3].
> 
> And SpinLockRelease generates 'stlr' instruction, which has release
> semantic, iirc.
> 
> Godbolt shows noticable difference between __sync_lock_test_and_set vs
> __atomic_exchange_n(ATOMIC_SEQ_CST) (and __sync_lock_release vs
> __atomic_store_n(ATOMIC_SEQ_CST)) using GCC even with march=armv7 [4]:
> - __atomic_exchange_n and __atomic_store_n has 'dmb ish' instruction both
> before and after load/store operations,
> - but __sync_lock_test_and_set has 'dmb ish' only after load/store
> operations and __sync_lock_release only before store operation. (You can
> see same pattern in __aarch64_swp4_sync above in case 'swpa' instruction is
> not present).
> 
> Therefore I think neither SpinLockAcquire nor SpinLockRelease have no full
> memory barrier semantic on ARM/ARM64 at the moment when compiled with GCC.
> 
> Surprisingly, Clang puts 'dmb ish' both before and after load/store at
> __sync_lock_test_and_set, but does not the same for __sync_lock_release.
> 
> As a simple fix I suggest to put __sync_synchronize before
> __sync_lock_test_and_set and after __sync_lock_release.
> 
> Initially I wanted to use __atomic_exchange_n and __atomic_store_n. But in
> absence of support for atomic instructions, their inner loop doesn't look
> to be safe for reordering since it uses ldar+stlr loop:
> - looks like previous operations could be reordered after 'ldar'.
> - similarly following operations could be reordered before 'stlr'.
> There's very relevant discussion at GCC's bugzilla [5].
> Probably this hypothesis is not valid. I believe we have to ask someone
> closely familiar with ARM internals to be sure.
> 
> # Alternative.
> 
> As an alternative, we may fix comments about
> SpinLockAcquire/SpinLockRelease to state they provide acquire and release
> barrier semantics only. And then check all their uses and set appropriate
> memory barriers where their usage as full memory barriers is detected (as
> in [1]).
> 
> [1]
> https://postgr.es/m/5ccgykypol3azijw2chqnpg3rhuwjtwsmbfs3pgcqm7fu6laus%40wppo6zcfszay
> [2]
> https://gcc.gnu.org/onlinedocs/gcc-15.1.0/gcc/_005f_005fsync-Builtins.html#index-_005f_005fsync_005flock_005ftest_005fand_005fset
> [3]
> https://developer.arm.com/documentation/100069/0606/Data-Transfer-Instructions/SWPA--SWPAL--SWP--SWPL--SWPAL--SWP--SWPL
> [4] https://godbolt.org/z/h5P9fjzWd
> [5] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

I've recognized my mistake about SpinLockAcquire/SpinLockRelease
requirements. In [1] I've describe my thoughts and suggested to introduce
read-write spin lock for the cases when there are a lot of readers and few
(or one) writers.

[1] https://postgr.es/m/450dafae-b620-4726-abc2-53347c419394%40postgrespro.ru

-- 
regards
Yura Sokolov aka funny-falcon


Reply via email to