Hi David,
On Mon, Mar 13, 2023 at 3:07 AM David Holmes <david.hol...@oracle.com> wrote: > Hi Thomas, > > I'm far too rusty of the details to answer most of your questions but: > > > - Do we not need an explicit CLREX after the operation? Or does the > > STREX also clear the hardware monitor? Or does it just not matter? > > STREX clears the reservation so CLREX is not needed. From Arm ARM: > > "A Load-Exclusive instruction marks a small block of memory for > exclusive access. The size of the marked block is IMPLEMENTATION > DEFINED, see Marking and the size of the marked memory block on page > B2-105. A Store-Exclusive instruction to any address in the marked block > clears the marking." > > > - We have VM_Version::supports_ldrex(). Code seems to me sometimes > > guarded by this (e.g MacroAssembler::atomic_cas_bool), sometimes code > > just executes ldrex/strex (e.g. the one-shot path of > > MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now > > generally available? Does ARMv6 mean STREX and LDREX are available? > > I think you will find that where the ldrex guard is missing, the code is > for C2 only and a C2 build is only possible if ldrex is available. (C2 > was only supported on ARMv7+.). > > Also the ARM conditional instructions as used in atomic_cas_bool can > cause confusion when trying to understand the logic. :) > > Thank you, that already helps! Not that rusty, it seems :-) Cheers, Thomas Cheers, > David > ----- > > > On 11/03/2023 8:18 pm, Thomas Stüfe wrote: > > Hi ARM experts, > > > > I am trying to understand how CAS is implemented on arm; in particular, > > "MacroAssembler::atomic_cas_bool": > > > > MacroAssembler::atomic_cas_bool > > > > ``` > > assert_different_registers(tmp_reg, oldval, newval, base); > > Label loop; > > bind(loop); > > A ldrex(tmp_reg, Address(base, offset)); > > B subs(tmp_reg, tmp_reg, oldval); > > C strex(tmp_reg, newval, Address(base, offset), eq); > > D cmp(tmp_reg, 1, eq); > > E b(loop, eq); > > F cmp(tmp_reg, 0); > > if (tmpreg == noreg) { > > pop(tmp_reg); > > } > > ``` > > > > It uses LDREX and STREX to perform a cas of *(base+offset) from oldval > > to newval. It does so in a loop. The code distinguishes two failures: > > STREX failing, and a "semantically failed" CAS. > > > > Here is what I think this code does: > > > > A) LDREX: tmp=*(base+offset) > > B) tmp -= oldvalue > > If *(base+offset) was unchanged, tmp_reg is now 0 and Z is 1 > > C) If Z is 1: STREX the new value: *(base+offset)=newval. Otherwise, > omit. > > After this, if the store succeeded, tmp_reg is 0, if the store > > failed its 1. > > D) Here, tmp_reg is: 0 if the store succeeded, 1 if it failed, 1...n if > > *(base+offset) had been modified before LDREX. > > We now compare with 1 and ... > > E) ...repeat the loop if tmp_reg was 1 > > > > So we loop until either *(base+offset) had been changed to some other > > value concurrently before out LDREX. Or until our store succeeded. > > > > I wondered what the loop guards against. And why it would be okay > > sometimes to omit it. > > > > IIUC, STREX fails if the core did lose its exclusive access to the > > memory location since the LDREX. This can be one of three things, right? > : > > - another core slipped in an LDREX+STREX to the same location between > > our LDREX and STREX > > - Or we context switched to another thread or process. I assume it does > > a CLREX then, right? Because how could you prevent a sequence like > > "LDREX(p1) -> switch -> LDREX(p2) -> switch back STREX(p1)" - if I > > understand the ARM manual [1] correctly, a STREX to a different location > > than the preceding LDREX is undefined. > > - Or we had a signal after LDREX and did a second LDREX in the signal > > handler. Does the kernel do a CLREX when invoking a signal handler? > > > > More questions: > > > > - If I got it right, at (D), tmp_reg value "1" has two meanings: either > > STREX failed or some thread increased the value concurrently by 1. We > > repeat the loop either way. Is this just accepted behavior? Increasing > > by 1 is maybe not that rare. > > > > - If I understood this correctly, the loop guards us mainly against > > context switches. Without the loop a context switch would count as a > > "semantically failed" CAS. Why would that be okay? Should we not do this > > loop always? > > > > - Do we not need an explicit CLREX after the operation? Or does the > > STREX also clear the hardware monitor? Or does it just not matter? > > > > - We have VM_Version::supports_ldrex(). Code seems to me sometimes > > guarded by this (e.g MacroAssembler::atomic_cas_bool), sometimes code > > just executes ldrex/strex (e.g. the one-shot path of > > MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now > > generally available? Does ARMv6 mean STREX and LDREX are available? > > > > > > Thanks a lot! > > > > Cheers, Thomas > > > > > > >