Joel is on vacation so here it is again.

> Begin forwarded message:
> 
> From: Alistair Francis <alistair.fran...@wdc.com>
> Subject: Re: [j...@sing.id.au: atomic failures on qemu-system-riscv64]
> Date: June 5, 2019 at 7:19:53 PM GMT+1
> To: "j...@sing.id.au" <j...@sing.id.au>, "pal...@sifive.com" 
> <pal...@sifive.com>
> Cc: "ma...@decred.org" <ma...@decred.org>, "m...@carlosedp.com" 
> <m...@carlosedp.com>
> 
> On Fri, 2019-05-31 at 03:21 +1000, Joel Sing wrote:
>> I've just sent this to qemu-ri...@nongnu.org - forwarding for
>> visibility...
> 
> Hello Joel,
> 
> Can you please send this to the QEMU mailing list?
> https://wiki.qemu.org/Contribute/MailingLists
> 
>> 
>> ----- Forwarded message from Joel Sing <j...@sing.id.au> -----
>> 
>> Date: Fri, 31 May 2019 03:20:03 +1000
>> From: Joel Sing <j...@sing.id.au>
>> To: qemu-ri...@nongnu.org
>> Subject: atomic failures on qemu-system-riscv64
>> User-Agent: Mutt/1.10.1 (2018-07-13)
>> 
>> While working on a Go (www.golang.org) port for riscv, I've run
>> into issues with atomics (namely LR/SC) on qemu-system-riscv64.
>> There are several reproducers for this problem including (using
>> gcc builtin atomics):
>> 
>>  https://gist.github.com/4a6f656c/8433032a3f70893a278259f8108aad90
>> 
>> And a version using inline assembly:
>> 
>>  https://gist.github.com/4a6f656c/d883091f5ca811822720213be343a75a
>> 
>> Depending on the qemu configuration the number of threads may
>> need increasing (to force context switching) and/or run in a
>> loop. Go's sync/atomic tests also fail regularly.
>> 
>> Having dug into the qemu code, what I believe is happening is
>> along the lines of the following:
>> 
>> 1. Thread 1 runs and executes an LR - this assigns an address
>>   to load_res and a value to load_val (say 1).
>> 
>> 2. A context switch occurs and thread 2 is now run - it runs
>>   an LR and SC on the same address modifying the stored value.
>>   Another LR is executed loading load_val with the current
>>   value (say 3).
>> 
>> 3. A context switch occurs and thread 1 is now run again, it
>>   continues on its LR/SC sequence and now runs the SC instruction.
>>   This is based on the assumption that it had a reservation
>>   and the SC will fail if the memory has changed. The underlying
>>   implementation of SC is a cmpxchg with the value in load_val
>>   - this no longer has the original value and hence successfully
>>   compares (as does the tcg_gen_setcond_tl() between the returned
>>   value and load_val) thus the SC succeeds when it should not.
>> 
>> The diff below clears load_res when the mode changes - with this
>> applied the reproducers work correctly, as do Go's atomic tests.
>> I'm not sure this "fix" is 100% correct, but it certainly verifies
>> where the problem lies. It does also fall inline with the RISCV
>> spec:
>> 
>> "The SC must fail if there is an observable memory access from
>> another hart to the address, or if there is an intervening context
>> switch on this hart, or if in the meantime the hart executed a
>> privileged exception-return instruction."
>> 
>> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
>> index b17f169681..9875b8e5d3 100644
>> --- a/target/riscv/cpu_helper.c
>> +++ b/target/riscv/cpu_helper.c
>> @@ -113,6 +113,8 @@ void riscv_cpu_set_mode(CPURISCVState *env,
>> target_ulong newpriv)
>>     }
>>     /* tlb_flush is unnecessary as mode is contained in mmu_idx */
>>     env->priv = newpriv;
>> +
>> +    env->load_res = 0;
> 
> This looks ok to me, I'll read the spec to double check though. Do you
> mind adding a comment in the code as to why this is required?
> 
> Alistair
> 
>> }
>> 
>> /* get_physical_address - get the physical address for this virtual
>> address
>> 
>> ----- End forwarded message -----

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to