Joel is on vacation so here it is again. > Begin forwarded message: > > From: Alistair Francis <alistair.fran...@wdc.com> > Subject: Re: [j...@sing.id.au: atomic failures on qemu-system-riscv64] > Date: June 5, 2019 at 7:19:53 PM GMT+1 > To: "j...@sing.id.au" <j...@sing.id.au>, "pal...@sifive.com" > <pal...@sifive.com> > Cc: "ma...@decred.org" <ma...@decred.org>, "m...@carlosedp.com" > <m...@carlosedp.com> > > On Fri, 2019-05-31 at 03:21 +1000, Joel Sing wrote: >> I've just sent this to qemu-ri...@nongnu.org - forwarding for >> visibility... > > Hello Joel, > > Can you please send this to the QEMU mailing list? > https://wiki.qemu.org/Contribute/MailingLists > >> >> ----- Forwarded message from Joel Sing <j...@sing.id.au> ----- >> >> Date: Fri, 31 May 2019 03:20:03 +1000 >> From: Joel Sing <j...@sing.id.au> >> To: qemu-ri...@nongnu.org >> Subject: atomic failures on qemu-system-riscv64 >> User-Agent: Mutt/1.10.1 (2018-07-13) >> >> While working on a Go (www.golang.org) port for riscv, I've run >> into issues with atomics (namely LR/SC) on qemu-system-riscv64. >> There are several reproducers for this problem including (using >> gcc builtin atomics): >> >> https://gist.github.com/4a6f656c/8433032a3f70893a278259f8108aad90 >> >> And a version using inline assembly: >> >> https://gist.github.com/4a6f656c/d883091f5ca811822720213be343a75a >> >> Depending on the qemu configuration the number of threads may >> need increasing (to force context switching) and/or run in a >> loop. Go's sync/atomic tests also fail regularly. >> >> Having dug into the qemu code, what I believe is happening is >> along the lines of the following: >> >> 1. Thread 1 runs and executes an LR - this assigns an address >> to load_res and a value to load_val (say 1). >> >> 2. A context switch occurs and thread 2 is now run - it runs >> an LR and SC on the same address modifying the stored value. >> Another LR is executed loading load_val with the current >> value (say 3). >> >> 3. A context switch occurs and thread 1 is now run again, it >> continues on its LR/SC sequence and now runs the SC instruction. >> This is based on the assumption that it had a reservation >> and the SC will fail if the memory has changed. The underlying >> implementation of SC is a cmpxchg with the value in load_val >> - this no longer has the original value and hence successfully >> compares (as does the tcg_gen_setcond_tl() between the returned >> value and load_val) thus the SC succeeds when it should not. >> >> The diff below clears load_res when the mode changes - with this >> applied the reproducers work correctly, as do Go's atomic tests. >> I'm not sure this "fix" is 100% correct, but it certainly verifies >> where the problem lies. It does also fall inline with the RISCV >> spec: >> >> "The SC must fail if there is an observable memory access from >> another hart to the address, or if there is an intervening context >> switch on this hart, or if in the meantime the hart executed a >> privileged exception-return instruction." >> >> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c >> index b17f169681..9875b8e5d3 100644 >> --- a/target/riscv/cpu_helper.c >> +++ b/target/riscv/cpu_helper.c >> @@ -113,6 +113,8 @@ void riscv_cpu_set_mode(CPURISCVState *env, >> target_ulong newpriv) >> } >> /* tlb_flush is unnecessary as mode is contained in mmu_idx */ >> env->priv = newpriv; >> + >> + env->load_res = 0; > > This looks ok to me, I'll read the spec to double check though. Do you > mind adding a comment in the code as to why this is required? > > Alistair > >> } >> >> /* get_physical_address - get the physical address for this virtual >> address >> >> ----- End forwarded message -----
signature.asc
Description: Message signed with OpenPGP