Hi Thomas,
I'm far too rusty of the details to answer most of your questions but:
> - Do we not need an explicit CLREX after the operation? Or does the
> STREX also clear the hardware monitor? Or does it just not matter?
STREX clears the reservation so CLREX is not needed. From Arm ARM:
"A Load-Exclusive instruction marks a small block of memory for
exclusive access. The size of the marked block is IMPLEMENTATION
DEFINED, see Marking and the size of the marked memory block on page
B2-105. A Store-Exclusive instruction to any address in the marked block
clears the marking."
> - We have VM_Version::supports_ldrex(). Code seems to me sometimes
> guarded by this (e.g MacroAssembler::atomic_cas_bool), sometimes code
> just executes ldrex/strex (e.g. the one-shot path of
> MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now
> generally available? Does ARMv6 mean STREX and LDREX are available?
I think you will find that where the ldrex guard is missing, the code is
for C2 only and a C2 build is only possible if ldrex is available. (C2
was only supported on ARMv7+.).
Also the ARM conditional instructions as used in atomic_cas_bool can
cause confusion when trying to understand the logic. :)
Cheers,
David
-----
On 11/03/2023 8:18 pm, Thomas Stüfe wrote:
Hi ARM experts,
I am trying to understand how CAS is implemented on arm; in particular,
"MacroAssembler::atomic_cas_bool":
MacroAssembler::atomic_cas_bool
```
assert_different_registers(tmp_reg, oldval, newval, base);
Label loop;
bind(loop);
A ldrex(tmp_reg, Address(base, offset));
B subs(tmp_reg, tmp_reg, oldval);
C strex(tmp_reg, newval, Address(base, offset), eq);
D cmp(tmp_reg, 1, eq);
E b(loop, eq);
F cmp(tmp_reg, 0);
if (tmpreg == noreg) {
pop(tmp_reg);
}
```
It uses LDREX and STREX to perform a cas of *(base+offset) from oldval
to newval. It does so in a loop. The code distinguishes two failures:
STREX failing, and a "semantically failed" CAS.
Here is what I think this code does:
A) LDREX: tmp=*(base+offset)
B) tmp -= oldvalue
If *(base+offset) was unchanged, tmp_reg is now 0 and Z is 1
C) If Z is 1: STREX the new value: *(base+offset)=newval. Otherwise, omit.
After this, if the store succeeded, tmp_reg is 0, if the store
failed its 1.
D) Here, tmp_reg is: 0 if the store succeeded, 1 if it failed, 1...n if
*(base+offset) had been modified before LDREX.
We now compare with 1 and ...
E) ...repeat the loop if tmp_reg was 1
So we loop until either *(base+offset) had been changed to some other
value concurrently before out LDREX. Or until our store succeeded.
I wondered what the loop guards against. And why it would be okay
sometimes to omit it.
IIUC, STREX fails if the core did lose its exclusive access to the
memory location since the LDREX. This can be one of three things, right? :
- another core slipped in an LDREX+STREX to the same location between
our LDREX and STREX
- Or we context switched to another thread or process. I assume it does
a CLREX then, right? Because how could you prevent a sequence like
"LDREX(p1) -> switch -> LDREX(p2) -> switch back STREX(p1)" - if I
understand the ARM manual [1] correctly, a STREX to a different location
than the preceding LDREX is undefined.
- Or we had a signal after LDREX and did a second LDREX in the signal
handler. Does the kernel do a CLREX when invoking a signal handler?
More questions:
- If I got it right, at (D), tmp_reg value "1" has two meanings: either
STREX failed or some thread increased the value concurrently by 1. We
repeat the loop either way. Is this just accepted behavior? Increasing
by 1 is maybe not that rare.
- If I understood this correctly, the loop guards us mainly against
context switches. Without the loop a context switch would count as a
"semantically failed" CAS. Why would that be okay? Should we not do this
loop always?
- Do we not need an explicit CLREX after the operation? Or does the
STREX also clear the hardware monitor? Or does it just not matter?
- We have VM_Version::supports_ldrex(). Code seems to me sometimes
guarded by this (e.g MacroAssembler::atomic_cas_bool), sometimes code
just executes ldrex/strex (e.g. the one-shot path of
MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now
generally available? Does ARMv6 mean STREX and LDREX are available?
Thanks a lot!
Cheers, Thomas