This whole coding lives in MacroAssembler::atomic_cas_bool, but that function is not always called. There are plenty of direct usages of ldrex+strex. There is even a condition argument for MacroAssembler::cas_for_lock_acquire to either do strex directly or to dive down into atomic_cas_bool.
On Mon, Mar 13, 2023 at 11:03 AM Reingruber, Richard < richard.reingru...@sap.com> wrote: > > Yes, I read it the same way. So we repeat the CAS for lost reservation. > I'm > > > interested in when this could happen and why it would be okay to > sometimes > > > omit this loop and do the "raw" LDREX-STREX sequence. See my original > mail. > > > > Hm, I don't understand. The loop in the sequence A-F is always there. How > is it omitted? > > > > > > *From: *Thomas Stüfe <thomas.stu...@gmail.com> > *Date: *Monday, 13. March 2023 at 10:54 > *To: *Reingruber, Richard <richard.reingru...@sap.com> > *Cc: *porters-dev@openjdk.org <porters-dev@openjdk.org>, > aarch32-port-...@openjdk.org <aarch32-port-...@openjdk.org> > *Subject: *Re: Question about CAS via LDREX/STREX on 32-bit arm > > Hi Richard :) > > > > > > On Mon, Mar 13, 2023 at 10:02 AM Reingruber, Richard < > richard.reingru...@sap.com> wrote: > > > Hi ARM experts, > > > > Hi Thomas, not at all an ARM expert... :) > > but I think I understand the code. > > > > > I am trying to understand how CAS is implemented on arm; in particular, > "MacroAssembler::atomic_cas_bool": > > > > > MacroAssembler::atomic_cas_bool > > > > > ``` > > > assert_different_registers(tmp_reg, oldval, newval, base); > > > Label loop; > > > bind(loop); > > > A ldrex(tmp_reg, Address(base, offset)); > > > B subs(tmp_reg, tmp_reg, oldval); > > > C strex(tmp_reg, newval, Address(base, offset), eq); > > > D cmp(tmp_reg, 1, eq); > > > E b(loop, eq); > > > F cmp(tmp_reg, 0); > > > if (tmpreg == noreg) { > > > pop(tmp_reg); > > > } > > > ``` > > > > > It uses LDREX and STREX to perform a cas of *(base+offset) from oldval > to newval. It does so in a loop. The code distinguishes two failures: STREX > failing, and a "semantically failed" CAS. > > > > > Here is what I think this code does: > > > > > A) LDREX: tmp=*(base+offset) > > > B) tmp -= oldvalue > > > If *(base+offset) was unchanged, tmp_reg is now 0 and Z is 1 > > > C) If Z is 1: STREX the new value: *(base+offset)=newval. Otherwise, > omit. > > > After this, if the store succeeded, tmp_reg is 0, if the store failed > its 1. > > > D) Here, tmp_reg is: 0 if the store succeeded, 1 if it failed, 1...n if > *(base+offset) had been modified before LDREX. > > > We now compare with 1 and ... > > > E) ...repeat the loop if tmp_reg was 1 > > > > > So we loop until either *(base+offset) had been changed to some other > value concurrently before out LDREX. Or until our store succeeded. > > > > > I wondered what the loop guards against. And why it would be okay > sometimes to omit it. > > > > The loop is needed to try again if the reservation was lost until the > STREX succeeds or *(base+offset) != oldvalue. > > > > So there are two cases. The loop is left iff > > > > (1) *(base+offset) != oldvalue > > (2) the STREX succeeded > > > > First it is important to understand that C, D, E are only executed if at > B the eq-condition is set to true. > > This is based on the "Conditional Execution" feature of ARM: execution of > most instructions can be made dependent on a condition (see > https://developer.arm.com/documentation/den0013/d/ARM-Thumb-Unified-Assembly-Language-Instructions/Instruction-set-basics/Conditional-execution?lang=en > <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.arm.com%2Fdocumentation%2Fden0013%2Fd%2FARM-Thumb-Unified-Assembly-Language-Instructions%2FInstruction-set-basics%2FConditional-execution%3Flang%3Den&data=05%7C01%7Crichard.reingruber%40sap.com%7C5e217cf5139c4ca453b608db23a8ef6e%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638142980698821044%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gBV1sR3DFdVAklgjVRdAdb7oIEv8Zwj%2FasRfwEB10vQ%3D&reserved=0> > ) > > > > So in case (1) C, D, E are not executed because B indicates that > *(base+offset) and oldvalue are not-eq and the loop is left. > > > > > > Ah, thanks, that was my thinking error. I did not realize that CMP was > also conditional. I assumed the "eq" in the CMP (D) was a condition for the > CMP. Which makes no sense, as I know now, since CMP just does a sub and > only needs two arguments. So that meant the full branch CDE was controlled > from the subtraction result at B. > > > > That also resolves the "1 has a double meaning" question. It hasn't. > > > > In case (2) C, D, E are executed. At C, if the reservation from A still > exists, tmp_reg will be set to 0 otherwise to 1. At E the branch is taken > if D indicated tmp_reg == 1 (reservation was lost) otherwise the loop is > left. > > > > Yes, I read it the same way. So we repeat the CAS for lost reservation. > I'm interested in when this could happen and why it would be okay to > sometimes omit this loop and do the "raw" LDREX-STREX sequence. See my > original mail. > > > > I suspect it has something to do with context switches. That the kernel > does a CLREX when we switch, so if we switch between LDREX and STREX, the > reservation could be lost. But why would it then be okay to ignore this > sometimes? > > > > Thanks! > > > > Thomas > > > > > > Cheers, Richard. > > > > *From: *porters-dev <porters-dev-r...@openjdk.org> on behalf of Thomas > Stüfe <thomas.stu...@gmail.com> > *Date: *Saturday, 11. March 2023 at 11:19 > *To: *porters-dev@openjdk.org <porters-dev@openjdk.org>, > aarch32-port-...@openjdk.org <aarch32-port-...@openjdk.org> > *Subject: *Question about CAS via LDREX/STREX on 32-bit arm > > Hi ARM experts, > > I am trying to understand how CAS is implemented on arm; in particular, > "MacroAssembler::atomic_cas_bool": > > MacroAssembler::atomic_cas_bool > > > > ``` > > assert_different_registers(tmp_reg, oldval, newval, base); > Label loop; > bind(loop); > A ldrex(tmp_reg, Address(base, offset)); > B subs(tmp_reg, tmp_reg, oldval); > C strex(tmp_reg, newval, Address(base, offset), eq); > D cmp(tmp_reg, 1, eq); > E b(loop, eq); > F cmp(tmp_reg, 0); > if (tmpreg == noreg) { > pop(tmp_reg); > } > ``` > > It uses LDREX and STREX to perform a cas of *(base+offset) from oldval to > newval. It does so in a loop. The code distinguishes two failures: STREX > failing, and a "semantically failed" CAS. > > Here is what I think this code does: > > > > A) LDREX: tmp=*(base+offset) > B) tmp -= oldvalue > If *(base+offset) was unchanged, tmp_reg is now 0 and Z is 1 > C) If Z is 1: STREX the new value: *(base+offset)=newval. Otherwise, omit. > After this, if the store succeeded, tmp_reg is 0, if the store failed > its 1. > D) Here, tmp_reg is: 0 if the store succeeded, 1 if it failed, 1...n if > *(base+offset) had been modified before LDREX. > We now compare with 1 and ... > E) ...repeat the loop if tmp_reg was 1 > > So we loop until either *(base+offset) had been changed to some other > value concurrently before out LDREX. Or until our store succeeded. > > I wondered what the loop guards against. And why it would be okay > sometimes to omit it. > > IIUC, STREX fails if the core did lose its exclusive access to the memory > location since the LDREX. This can be one of three things, right? : > - another core slipped in an LDREX+STREX to the same location between our > LDREX and STREX > - Or we context switched to another thread or process. I assume it does a > CLREX then, right? Because how could you prevent a sequence like "LDREX(p1) > -> switch -> LDREX(p2) -> switch back STREX(p1)" - if I understand the ARM > manual [1] correctly, a STREX to a different location than the preceding > LDREX is undefined. > - Or we had a signal after LDREX and did a second LDREX in the signal > handler. Does the kernel do a CLREX when invoking a signal handler? > > More questions: > > - If I got it right, at (D), tmp_reg value "1" has two meanings: either > STREX failed or some thread increased the value concurrently by 1. We > repeat the loop either way. Is this just accepted behavior? Increasing by 1 > is maybe not that rare. > > - If I understood this correctly, the loop guards us mainly against > context switches. Without the loop a context switch would count as a > "semantically failed" CAS. Why would that be okay? Should we not do this > loop always? > > - Do we not need an explicit CLREX after the operation? Or does the STREX > also clear the hardware monitor? Or does it just not matter? > > - We have VM_Version::supports_ldrex(). Code seems to me sometimes guarded > by this (e.g MacroAssembler::atomic_cas_bool), sometimes code just executes > ldrex/strex (e.g. the one-shot path of > MacroAssembler::cas_for_lock_acquire). Am I mistaken? Or is LDREX now > generally available? Does ARMv6 mean STREX and LDREX are available? > > > Thanks a lot! > > Cheers, Thomas > >