Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-06-14 Thread Alex Bennée

alvise rigo  writes:

> 1. LL(x)   // x requires a flush
> 2. query flush to all the n VCPUs
> 3. exit from the CPU loop and wait until all the flushes are done
> 4. enter the loop to re-execute LL(x). This time no flush is required
>
> Now, points 2. and 3. can be done either with n calls of
> async_safe_run_on_cpu() or with n calls of async_wait_run_on_cpu(). In my
> opinion the former is not really done for this use case since it would call
> n^2 times cpu_exit() and it would not really ensure that the VCPU has
> exited from the guest code to make an iteration of the CPU loop.

async_safe_run_on_cpu maybe should be renamed async_safe_run_on_system()
as all vCPUs are held. You only need one helper to do all the flushing
and bit-setting.

>
> Regards,
> alvise
>
> On Tue, Jun 14, 2016 at 2:00 PM, Alex Bennée  wrote:
>>
>> alvise rigo  writes:
>>
>>> On Fri, Jun 10, 2016 at 5:21 PM, Sergey Fedorov  
>>> wrote:
 On 26/05/16 19:35, Alvise Rigo wrote:
> Using tcg_exclusive_{lock,unlock}(), make the emulation of
> LoadLink/StoreConditional thread safe.
>
> During an LL access, this lock protects the load access itself, the
> update of the exclusive history and the update of the VCPU's protected
> range.  In a SC access, the lock protects the store access itself, the
> possible reset of other VCPUs' protected range and the reset of the
> exclusive context of calling VCPU.
>
> The lock is also taken when a normal store happens to access an
> exclusive page to reset other VCPUs' protected range in case of
> collision.

 I think the key problem here is that the load in LL helper can race with
 a concurrent regular fast-path store. It's probably easier to annotate
 the source here:

  1  WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
  2  TCGMemOpIdx oi, uintptr_t retaddr)
  3  {
  4  WORD_TYPE ret;
  5  int index;
  6  CPUState *this_cpu = ENV_GET_CPU(env);
  7  CPUClass *cc = CPU_GET_CLASS(this_cpu);
  8  hwaddr hw_addr;
  9  unsigned mmu_idx = get_mmuidx(oi);

 10  index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);

 11  tcg_exclusive_lock();

 12  /* Use the proper load helper from cpu_ldst.h */
 13  ret = helper_ld(env, addr, oi, retaddr);

 14  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
 + xlat)
 15   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
 16  hw_addr = (env->iotlb[mmu_idx][index].addr &
 TARGET_PAGE_MASK) + addr;
 17  if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
 TLB_MMIO))) {
 18  /* If all the vCPUs have the EXCL bit set for this page
 there is no need
 19   * to request any flush. */
 20  if (cpu_physical_memory_set_excl(hw_addr)) {
 21  CPUState *cpu;

 22  excl_history_put_addr(hw_addr);
 23  CPU_FOREACH(cpu) {
 24  if (this_cpu != cpu) {
 25  tlb_flush_other(this_cpu, cpu, 1);
 26  }
 27  }
 28  }
 29  /* For this vCPU, just update the TLB entry, no need to
 flush. */
 30  env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
 31  } else {
 32  /* Set a pending exclusive access in the MemoryRegion */
 33  MemoryRegion *mr = iotlb_to_region(this_cpu,
 34
 env->iotlb[mmu_idx][index].addr,
 35
 env->iotlb[mmu_idx][index].attrs);
 36  mr->pending_excl_access = true;
 37  }

 38  cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);

 39  tcg_exclusive_unlock();

 40  /* From now on we are in LL/SC context */
 41  this_cpu->ll_sc_context = true;

 42  return ret;
 43  }


 The exclusive lock at line 11 doesn't help if concurrent fast-patch
 store at this address occurs after we finished load at line 13 but
 before TLB is flushed as a result of line 25. If we reorder the load to
 happen after the TLB flush request we still must be sure that the flush
 is complete before we can do the load safely.
>>>
>>> You are right, the risk actually exists. One solution to the problem
>>> could be to ignore the data acquired by the load and redo the LL after
>>> the flushes have been completed (basically the disas_ctx->pc points to
>>> the LL instruction). This time the LL will happen without flush
>>> requests and the access will be 

Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-06-14 Thread alvise rigo
1. LL(x)   // x requires a flush
2. query flush to all the n VCPUs
3. exit from the CPU loop and wait until all the flushes are done
4. enter the loop to re-execute LL(x). This time no flush is required

Now, points 2. and 3. can be done either with n calls of
async_safe_run_on_cpu() or with n calls of async_wait_run_on_cpu(). In my
opinion the former is not really done for this use case since it would call
n^2 times cpu_exit() and it would not really ensure that the VCPU has
exited from the guest code to make an iteration of the CPU loop.

Regards,
alvise

On Tue, Jun 14, 2016 at 2:00 PM, Alex Bennée  wrote:
>
> alvise rigo  writes:
>
>> On Fri, Jun 10, 2016 at 5:21 PM, Sergey Fedorov  wrote:
>>> On 26/05/16 19:35, Alvise Rigo wrote:
 Using tcg_exclusive_{lock,unlock}(), make the emulation of
 LoadLink/StoreConditional thread safe.

 During an LL access, this lock protects the load access itself, the
 update of the exclusive history and the update of the VCPU's protected
 range.  In a SC access, the lock protects the store access itself, the
 possible reset of other VCPUs' protected range and the reset of the
 exclusive context of calling VCPU.

 The lock is also taken when a normal store happens to access an
 exclusive page to reset other VCPUs' protected range in case of
 collision.
>>>
>>> I think the key problem here is that the load in LL helper can race with
>>> a concurrent regular fast-path store. It's probably easier to annotate
>>> the source here:
>>>
>>>  1  WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>>  2  TCGMemOpIdx oi, uintptr_t retaddr)
>>>  3  {
>>>  4  WORD_TYPE ret;
>>>  5  int index;
>>>  6  CPUState *this_cpu = ENV_GET_CPU(env);
>>>  7  CPUClass *cc = CPU_GET_CLASS(this_cpu);
>>>  8  hwaddr hw_addr;
>>>  9  unsigned mmu_idx = get_mmuidx(oi);
>>>
>>> 10  index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>>
>>> 11  tcg_exclusive_lock();
>>>
>>> 12  /* Use the proper load helper from cpu_ldst.h */
>>> 13  ret = helper_ld(env, addr, oi, retaddr);
>>>
>>> 14  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
>>> + xlat)
>>> 15   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>>> 16  hw_addr = (env->iotlb[mmu_idx][index].addr &
>>> TARGET_PAGE_MASK) + addr;
>>> 17  if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
>>> TLB_MMIO))) {
>>> 18  /* If all the vCPUs have the EXCL bit set for this page
>>> there is no need
>>> 19   * to request any flush. */
>>> 20  if (cpu_physical_memory_set_excl(hw_addr)) {
>>> 21  CPUState *cpu;
>>>
>>> 22  excl_history_put_addr(hw_addr);
>>> 23  CPU_FOREACH(cpu) {
>>> 24  if (this_cpu != cpu) {
>>> 25  tlb_flush_other(this_cpu, cpu, 1);
>>> 26  }
>>> 27  }
>>> 28  }
>>> 29  /* For this vCPU, just update the TLB entry, no need to
>>> flush. */
>>> 30  env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>>> 31  } else {
>>> 32  /* Set a pending exclusive access in the MemoryRegion */
>>> 33  MemoryRegion *mr = iotlb_to_region(this_cpu,
>>> 34
>>> env->iotlb[mmu_idx][index].addr,
>>> 35
>>> env->iotlb[mmu_idx][index].attrs);
>>> 36  mr->pending_excl_access = true;
>>> 37  }
>>>
>>> 38  cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>>
>>> 39  tcg_exclusive_unlock();
>>>
>>> 40  /* From now on we are in LL/SC context */
>>> 41  this_cpu->ll_sc_context = true;
>>>
>>> 42  return ret;
>>> 43  }
>>>
>>>
>>> The exclusive lock at line 11 doesn't help if concurrent fast-patch
>>> store at this address occurs after we finished load at line 13 but
>>> before TLB is flushed as a result of line 25. If we reorder the load to
>>> happen after the TLB flush request we still must be sure that the flush
>>> is complete before we can do the load safely.
>>
>> You are right, the risk actually exists. One solution to the problem
>> could be to ignore the data acquired by the load and redo the LL after
>> the flushes have been completed (basically the disas_ctx->pc points to
>> the LL instruction). This time the LL will happen without flush
>> requests and the access will be actually protected by the lock.
>
> So is just punting the work of to an async safe task and restarting the
> vCPU thread going to be enough?
>
>>
>> Regards,
>> alvise
>>
>>>

 Moreover, adapt target-arm to also cope with the new multi-threaded
 execution.

 Signed-off-by: Alvise Rigo 
 ---

Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-06-14 Thread Alex Bennée

alvise rigo  writes:

> On Fri, Jun 10, 2016 at 5:21 PM, Sergey Fedorov  wrote:
>> On 26/05/16 19:35, Alvise Rigo wrote:
>>> Using tcg_exclusive_{lock,unlock}(), make the emulation of
>>> LoadLink/StoreConditional thread safe.
>>>
>>> During an LL access, this lock protects the load access itself, the
>>> update of the exclusive history and the update of the VCPU's protected
>>> range.  In a SC access, the lock protects the store access itself, the
>>> possible reset of other VCPUs' protected range and the reset of the
>>> exclusive context of calling VCPU.
>>>
>>> The lock is also taken when a normal store happens to access an
>>> exclusive page to reset other VCPUs' protected range in case of
>>> collision.
>>
>> I think the key problem here is that the load in LL helper can race with
>> a concurrent regular fast-path store. It's probably easier to annotate
>> the source here:
>>
>>  1  WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>  2  TCGMemOpIdx oi, uintptr_t retaddr)
>>  3  {
>>  4  WORD_TYPE ret;
>>  5  int index;
>>  6  CPUState *this_cpu = ENV_GET_CPU(env);
>>  7  CPUClass *cc = CPU_GET_CLASS(this_cpu);
>>  8  hwaddr hw_addr;
>>  9  unsigned mmu_idx = get_mmuidx(oi);
>>
>> 10  index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>
>> 11  tcg_exclusive_lock();
>>
>> 12  /* Use the proper load helper from cpu_ldst.h */
>> 13  ret = helper_ld(env, addr, oi, retaddr);
>>
>> 14  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
>> + xlat)
>> 15   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>> 16  hw_addr = (env->iotlb[mmu_idx][index].addr &
>> TARGET_PAGE_MASK) + addr;
>> 17  if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
>> TLB_MMIO))) {
>> 18  /* If all the vCPUs have the EXCL bit set for this page
>> there is no need
>> 19   * to request any flush. */
>> 20  if (cpu_physical_memory_set_excl(hw_addr)) {
>> 21  CPUState *cpu;
>>
>> 22  excl_history_put_addr(hw_addr);
>> 23  CPU_FOREACH(cpu) {
>> 24  if (this_cpu != cpu) {
>> 25  tlb_flush_other(this_cpu, cpu, 1);
>> 26  }
>> 27  }
>> 28  }
>> 29  /* For this vCPU, just update the TLB entry, no need to
>> flush. */
>> 30  env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>> 31  } else {
>> 32  /* Set a pending exclusive access in the MemoryRegion */
>> 33  MemoryRegion *mr = iotlb_to_region(this_cpu,
>> 34
>> env->iotlb[mmu_idx][index].addr,
>> 35
>> env->iotlb[mmu_idx][index].attrs);
>> 36  mr->pending_excl_access = true;
>> 37  }
>>
>> 38  cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>
>> 39  tcg_exclusive_unlock();
>>
>> 40  /* From now on we are in LL/SC context */
>> 41  this_cpu->ll_sc_context = true;
>>
>> 42  return ret;
>> 43  }
>>
>>
>> The exclusive lock at line 11 doesn't help if concurrent fast-patch
>> store at this address occurs after we finished load at line 13 but
>> before TLB is flushed as a result of line 25. If we reorder the load to
>> happen after the TLB flush request we still must be sure that the flush
>> is complete before we can do the load safely.
>
> You are right, the risk actually exists. One solution to the problem
> could be to ignore the data acquired by the load and redo the LL after
> the flushes have been completed (basically the disas_ctx->pc points to
> the LL instruction). This time the LL will happen without flush
> requests and the access will be actually protected by the lock.

So is just punting the work of to an async safe task and restarting the
vCPU thread going to be enough?

>
> Regards,
> alvise
>
>>
>>>
>>> Moreover, adapt target-arm to also cope with the new multi-threaded
>>> execution.
>>>
>>> Signed-off-by: Alvise Rigo 
>>> ---
>>>  softmmu_llsc_template.h | 11 +--
>>>  softmmu_template.h  |  6 ++
>>>  target-arm/op_helper.c  |  6 ++
>>>  3 files changed, 21 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
>>> index 2c4a494..d3810c0 100644
>>> --- a/softmmu_llsc_template.h
>>> +++ b/softmmu_llsc_template.h
>>> @@ -62,11 +62,13 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, 
>>> target_ulong addr,
>>>  hwaddr hw_addr;
>>>  unsigned mmu_idx = get_mmuidx(oi);
>>>
>>> +index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>> +
>>> +tcg_exclusive_lock();
>>> +
>>>  /* Use the proper load helper from cpu_ldst.h */
>>>  ret = helper_ld(env, addr, oi, retaddr);
>>>
>>> -index = (addr 

Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-06-14 Thread Sergey Fedorov
On 14/06/16 11:37, Alex Bennée wrote:
> Alex Bennée  writes:
>
>> Sergey Fedorov  writes:
>>
>>> On 26/05/16 19:35, Alvise Rigo wrote:
 Using tcg_exclusive_{lock,unlock}(), make the emulation of
 LoadLink/StoreConditional thread safe.

 During an LL access, this lock protects the load access itself, the
 update of the exclusive history and the update of the VCPU's protected
 range.  In a SC access, the lock protects the store access itself, the
 possible reset of other VCPUs' protected range and the reset of the
 exclusive context of calling VCPU.

 The lock is also taken when a normal store happens to access an
 exclusive page to reset other VCPUs' protected range in case of
 collision.
>>> I think the key problem here is that the load in LL helper can race with
>>> a concurrent regular fast-path store. It's probably easier to annotate
>>> the source here:
>>>
>>>  1  WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>>  2  TCGMemOpIdx oi, uintptr_t retaddr)
>>>  3  {
>>>  4  WORD_TYPE ret;
>>>  5  int index;
>>>  6  CPUState *this_cpu = ENV_GET_CPU(env);
>>>  7  CPUClass *cc = CPU_GET_CLASS(this_cpu);
>>>  8  hwaddr hw_addr;
>>>  9  unsigned mmu_idx = get_mmuidx(oi);
>>>
>>> 10  index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>>
>>> 11  tcg_exclusive_lock();
>>>
>>> 12  /* Use the proper load helper from cpu_ldst.h */
>>> 13  ret = helper_ld(env, addr, oi, retaddr);
>>>
>>> 14  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
>>> + xlat)
>>> 15   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>>> 16  hw_addr = (env->iotlb[mmu_idx][index].addr &
>>> TARGET_PAGE_MASK) + addr;
>>> 17  if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
>>> TLB_MMIO))) {
>>> 18  /* If all the vCPUs have the EXCL bit set for this page
>>> there is no need
>>> 19   * to request any flush. */
>>> 20  if (cpu_physical_memory_set_excl(hw_addr)) {
>>> 21  CPUState *cpu;
>>>
>>> 22  excl_history_put_addr(hw_addr);
>>> 23  CPU_FOREACH(cpu) {
>>> 24  if (this_cpu != cpu) {
>>> 25  tlb_flush_other(this_cpu, cpu, 1);
>>> 26  }
>>> 27  }
>>> 28  }
>>> 29  /* For this vCPU, just update the TLB entry, no need to
>>> flush. */
>>> 30  env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>>> 31  } else {
>>> 32  /* Set a pending exclusive access in the MemoryRegion */
>>> 33  MemoryRegion *mr = iotlb_to_region(this_cpu,
>>> 34
>>> env->iotlb[mmu_idx][index].addr,
>>> 35
>>> env->iotlb[mmu_idx][index].attrs);
>>> 36  mr->pending_excl_access = true;
>>> 37  }
>>>
>>> 38  cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>>
>>> 39  tcg_exclusive_unlock();
>>>
>>> 40  /* From now on we are in LL/SC context */
>>> 41  this_cpu->ll_sc_context = true;
>>>
>>> 42  return ret;
>>> 43  }
>>>
>>>
>>> The exclusive lock at line 11 doesn't help if concurrent fast-patch
>>> store at this address occurs after we finished load at line 13 but
>>> before TLB is flushed as a result of line 25. If we reorder the load to
>>> happen after the TLB flush request we still must be sure that the flush
>>> is complete before we can do the load safely.
>> I think this can be fixed using async_safe_run_on_cpu and tweaking the
>> ldlink helper.
>>
>>   * Change the helper_ldlink call
>> - pass it offset-of(cpu->reg[n]) so it can store result of load
>> - maybe pass it next-pc (unless there is some other way to know)
>>
>>   vCPU runs until the ldlink instruction occurs and jumps to the helper
>>
>>   * Once in the helper_ldlink
>> - queue up an async helper function with info of offset
>> - cpu_loop_exit_restore(with next PC)
>>
>>   vCPU the issued the ldlink exits immediately, waits until all vCPUs are
>>   out of generated code.
>>
>>   * Once in helper_ldlink async helper
>> - Everything at this point is quiescent, no vCPU activity
>> - Flush all TLBs/set flags
>> - Do the load from memory, store directly into cpu->reg[n]
>>
>> The key thing is once we are committed to load in the async helper
>> nothing else can get in the way. Any stores before we are in the helper
>> happen as normal, once we exit the async helper all potential
>> conflicting stores will slow path.
>>
>> There is a little messing about in knowing the next PC which is simple
>> in the ARM case but gets a bit more complicated for architectures that
>> have deferred jump slots. I haven't looked into this nit yet.
> Thinking on it further the messing about with 

Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-06-14 Thread Alex Bennée

Alex Bennée  writes:

> Sergey Fedorov  writes:
>
>> On 26/05/16 19:35, Alvise Rigo wrote:
>>> Using tcg_exclusive_{lock,unlock}(), make the emulation of
>>> LoadLink/StoreConditional thread safe.
>>>
>>> During an LL access, this lock protects the load access itself, the
>>> update of the exclusive history and the update of the VCPU's protected
>>> range.  In a SC access, the lock protects the store access itself, the
>>> possible reset of other VCPUs' protected range and the reset of the
>>> exclusive context of calling VCPU.
>>>
>>> The lock is also taken when a normal store happens to access an
>>> exclusive page to reset other VCPUs' protected range in case of
>>> collision.
>>
>> I think the key problem here is that the load in LL helper can race with
>> a concurrent regular fast-path store. It's probably easier to annotate
>> the source here:
>>
>>  1  WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>  2  TCGMemOpIdx oi, uintptr_t retaddr)
>>  3  {
>>  4  WORD_TYPE ret;
>>  5  int index;
>>  6  CPUState *this_cpu = ENV_GET_CPU(env);
>>  7  CPUClass *cc = CPU_GET_CLASS(this_cpu);
>>  8  hwaddr hw_addr;
>>  9  unsigned mmu_idx = get_mmuidx(oi);
>>
>> 10  index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>
>> 11  tcg_exclusive_lock();
>>
>> 12  /* Use the proper load helper from cpu_ldst.h */
>> 13  ret = helper_ld(env, addr, oi, retaddr);
>>
>> 14  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
>> + xlat)
>> 15   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>> 16  hw_addr = (env->iotlb[mmu_idx][index].addr &
>> TARGET_PAGE_MASK) + addr;
>> 17  if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
>> TLB_MMIO))) {
>> 18  /* If all the vCPUs have the EXCL bit set for this page
>> there is no need
>> 19   * to request any flush. */
>> 20  if (cpu_physical_memory_set_excl(hw_addr)) {
>> 21  CPUState *cpu;
>>
>> 22  excl_history_put_addr(hw_addr);
>> 23  CPU_FOREACH(cpu) {
>> 24  if (this_cpu != cpu) {
>> 25  tlb_flush_other(this_cpu, cpu, 1);
>> 26  }
>> 27  }
>> 28  }
>> 29  /* For this vCPU, just update the TLB entry, no need to
>> flush. */
>> 30  env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>> 31  } else {
>> 32  /* Set a pending exclusive access in the MemoryRegion */
>> 33  MemoryRegion *mr = iotlb_to_region(this_cpu,
>> 34
>> env->iotlb[mmu_idx][index].addr,
>> 35
>> env->iotlb[mmu_idx][index].attrs);
>> 36  mr->pending_excl_access = true;
>> 37  }
>>
>> 38  cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>
>> 39  tcg_exclusive_unlock();
>>
>> 40  /* From now on we are in LL/SC context */
>> 41  this_cpu->ll_sc_context = true;
>>
>> 42  return ret;
>> 43  }
>>
>>
>> The exclusive lock at line 11 doesn't help if concurrent fast-patch
>> store at this address occurs after we finished load at line 13 but
>> before TLB is flushed as a result of line 25. If we reorder the load to
>> happen after the TLB flush request we still must be sure that the flush
>> is complete before we can do the load safely.
>
> I think this can be fixed using async_safe_run_on_cpu and tweaking the
> ldlink helper.
>
>   * Change the helper_ldlink call
> - pass it offset-of(cpu->reg[n]) so it can store result of load
> - maybe pass it next-pc (unless there is some other way to know)
>
>   vCPU runs until the ldlink instruction occurs and jumps to the helper
>
>   * Once in the helper_ldlink
> - queue up an async helper function with info of offset
> - cpu_loop_exit_restore(with next PC)
>
>   vCPU the issued the ldlink exits immediately, waits until all vCPUs are
>   out of generated code.
>
>   * Once in helper_ldlink async helper
> - Everything at this point is quiescent, no vCPU activity
> - Flush all TLBs/set flags
> - Do the load from memory, store directly into cpu->reg[n]
>
> The key thing is once we are committed to load in the async helper
> nothing else can get in the way. Any stores before we are in the helper
> happen as normal, once we exit the async helper all potential
> conflicting stores will slow path.
>
> There is a little messing about in knowing the next PC which is simple
> in the ARM case but gets a bit more complicated for architectures that
> have deferred jump slots. I haven't looked into this nit yet.

Thinking on it further the messing about with offset and PCs could be
solved just by having a simple flag:

 - First run, no flag set, queue safe work, restart loop at PC
 - Second run, flag set, do 

Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-06-11 Thread Sergey Fedorov
On 10/06/16 19:15, Alex Bennée wrote:
> Sergey Fedorov  writes:
>
>> On 26/05/16 19:35, Alvise Rigo wrote:
>>> Using tcg_exclusive_{lock,unlock}(), make the emulation of
>>> LoadLink/StoreConditional thread safe.
>>>
>>> During an LL access, this lock protects the load access itself, the
>>> update of the exclusive history and the update of the VCPU's protected
>>> range.  In a SC access, the lock protects the store access itself, the
>>> possible reset of other VCPUs' protected range and the reset of the
>>> exclusive context of calling VCPU.
>>>
>>> The lock is also taken when a normal store happens to access an
>>> exclusive page to reset other VCPUs' protected range in case of
>>> collision.
>> I think the key problem here is that the load in LL helper can race with
>> a concurrent regular fast-path store. 
(snip)
> I think this can be fixed using async_safe_run_on_cpu and tweaking the
> ldlink helper.
>
>   * Change the helper_ldlink call
> - pass it offset-of(cpu->reg[n]) so it can store result of load
> - maybe pass it next-pc (unless there is some other way to know)
>
>   vCPU runs until the ldlink instruction occurs and jumps to the helper
>
>   * Once in the helper_ldlink
> - queue up an async helper function with info of offset
> - cpu_loop_exit_restore(with next PC)
>
>   vCPU the issued the ldlink exits immediately, waits until all vCPUs are
>   out of generated code.
>
>   * Once in helper_ldlink async helper
> - Everything at this point is quiescent, no vCPU activity
> - Flush all TLBs/set flags
> - Do the load from memory, store directly into cpu->reg[n]
>
> The key thing is once we are committed to load in the async helper
> nothing else can get in the way. Any stores before we are in the helper
> happen as normal, once we exit the async helper all potential
> conflicting stores will slow path.
>
> There is a little messing about in knowing the next PC which is simple
> in the ARM case but gets a bit more complicated for architectures that
> have deferred jump slots. I haven't looked into this nit yet.


Hmm, this looks pretty similar to what linux-user code actually does,
e.g. in do_strex(). Just restarting the LL instruction as Alvise
suggests may well be an easier approach (or may not).

Kind regards,
Sergey



Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-06-10 Thread Alex Bennée

Sergey Fedorov  writes:

> On 26/05/16 19:35, Alvise Rigo wrote:
>> Using tcg_exclusive_{lock,unlock}(), make the emulation of
>> LoadLink/StoreConditional thread safe.
>>
>> During an LL access, this lock protects the load access itself, the
>> update of the exclusive history and the update of the VCPU's protected
>> range.  In a SC access, the lock protects the store access itself, the
>> possible reset of other VCPUs' protected range and the reset of the
>> exclusive context of calling VCPU.
>>
>> The lock is also taken when a normal store happens to access an
>> exclusive page to reset other VCPUs' protected range in case of
>> collision.
>
> I think the key problem here is that the load in LL helper can race with
> a concurrent regular fast-path store. It's probably easier to annotate
> the source here:
>
>  1  WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>  2  TCGMemOpIdx oi, uintptr_t retaddr)
>  3  {
>  4  WORD_TYPE ret;
>  5  int index;
>  6  CPUState *this_cpu = ENV_GET_CPU(env);
>  7  CPUClass *cc = CPU_GET_CLASS(this_cpu);
>  8  hwaddr hw_addr;
>  9  unsigned mmu_idx = get_mmuidx(oi);
>
> 10  index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>
> 11  tcg_exclusive_lock();
>
> 12  /* Use the proper load helper from cpu_ldst.h */
> 13  ret = helper_ld(env, addr, oi, retaddr);
>
> 14  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
> + xlat)
> 15   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
> 16  hw_addr = (env->iotlb[mmu_idx][index].addr &
> TARGET_PAGE_MASK) + addr;
> 17  if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
> TLB_MMIO))) {
> 18  /* If all the vCPUs have the EXCL bit set for this page
> there is no need
> 19   * to request any flush. */
> 20  if (cpu_physical_memory_set_excl(hw_addr)) {
> 21  CPUState *cpu;
>
> 22  excl_history_put_addr(hw_addr);
> 23  CPU_FOREACH(cpu) {
> 24  if (this_cpu != cpu) {
> 25  tlb_flush_other(this_cpu, cpu, 1);
> 26  }
> 27  }
> 28  }
> 29  /* For this vCPU, just update the TLB entry, no need to
> flush. */
> 30  env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
> 31  } else {
> 32  /* Set a pending exclusive access in the MemoryRegion */
> 33  MemoryRegion *mr = iotlb_to_region(this_cpu,
> 34
> env->iotlb[mmu_idx][index].addr,
> 35
> env->iotlb[mmu_idx][index].attrs);
> 36  mr->pending_excl_access = true;
> 37  }
>
> 38  cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>
> 39  tcg_exclusive_unlock();
>
> 40  /* From now on we are in LL/SC context */
> 41  this_cpu->ll_sc_context = true;
>
> 42  return ret;
> 43  }
>
>
> The exclusive lock at line 11 doesn't help if concurrent fast-patch
> store at this address occurs after we finished load at line 13 but
> before TLB is flushed as a result of line 25. If we reorder the load to
> happen after the TLB flush request we still must be sure that the flush
> is complete before we can do the load safely.

I think this can be fixed using async_safe_run_on_cpu and tweaking the
ldlink helper.

  * Change the helper_ldlink call
- pass it offset-of(cpu->reg[n]) so it can store result of load
- maybe pass it next-pc (unless there is some other way to know)

  vCPU runs until the ldlink instruction occurs and jumps to the helper

  * Once in the helper_ldlink
- queue up an async helper function with info of offset
- cpu_loop_exit_restore(with next PC)

  vCPU the issued the ldlink exits immediately, waits until all vCPUs are
  out of generated code.

  * Once in helper_ldlink async helper
- Everything at this point is quiescent, no vCPU activity
- Flush all TLBs/set flags
- Do the load from memory, store directly into cpu->reg[n]

The key thing is once we are committed to load in the async helper
nothing else can get in the way. Any stores before we are in the helper
happen as normal, once we exit the async helper all potential
conflicting stores will slow path.

There is a little messing about in knowing the next PC which is simple
in the ARM case but gets a bit more complicated for architectures that
have deferred jump slots. I haven't looked into this nit yet.

>
>>
>> Moreover, adapt target-arm to also cope with the new multi-threaded
>> execution.
>>
>> Signed-off-by: Alvise Rigo 
>> ---
>>  softmmu_llsc_template.h | 11 +--
>>  softmmu_template.h  |  6 ++
>>  target-arm/op_helper.c  |  6 ++
>>  3 files changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/softmmu_llsc_template.h 

Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-06-10 Thread alvise rigo
This would require to fill again the whole history which I find very
unlikely. In any case, this has to be documented.

Thank you,
alvise

On Fri, Jun 10, 2016 at 6:00 PM, Sergey Fedorov  wrote:
> On 10/06/16 18:53, alvise rigo wrote:
>> On Fri, Jun 10, 2016 at 5:21 PM, Sergey Fedorov  wrote:
>>> On 26/05/16 19:35, Alvise Rigo wrote:
 Using tcg_exclusive_{lock,unlock}(), make the emulation of
 LoadLink/StoreConditional thread safe.

 During an LL access, this lock protects the load access itself, the
 update of the exclusive history and the update of the VCPU's protected
 range.  In a SC access, the lock protects the store access itself, the
 possible reset of other VCPUs' protected range and the reset of the
 exclusive context of calling VCPU.

 The lock is also taken when a normal store happens to access an
 exclusive page to reset other VCPUs' protected range in case of
 collision.
>>> I think the key problem here is that the load in LL helper can race with
>>> a concurrent regular fast-path store. It's probably easier to annotate
>>> the source here:
>>>
>>>  1  WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>>  2  TCGMemOpIdx oi, uintptr_t retaddr)
>>>  3  {
>>>  4  WORD_TYPE ret;
>>>  5  int index;
>>>  6  CPUState *this_cpu = ENV_GET_CPU(env);
>>>  7  CPUClass *cc = CPU_GET_CLASS(this_cpu);
>>>  8  hwaddr hw_addr;
>>>  9  unsigned mmu_idx = get_mmuidx(oi);
>>>
>>> 10  index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>>
>>> 11  tcg_exclusive_lock();
>>>
>>> 12  /* Use the proper load helper from cpu_ldst.h */
>>> 13  ret = helper_ld(env, addr, oi, retaddr);
>>>
>>> 14  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
>>> + xlat)
>>> 15   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>>> 16  hw_addr = (env->iotlb[mmu_idx][index].addr &
>>> TARGET_PAGE_MASK) + addr;
>>> 17  if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
>>> TLB_MMIO))) {
>>> 18  /* If all the vCPUs have the EXCL bit set for this page
>>> there is no need
>>> 19   * to request any flush. */
>>> 20  if (cpu_physical_memory_set_excl(hw_addr)) {
>>> 21  CPUState *cpu;
>>>
>>> 22  excl_history_put_addr(hw_addr);
>>> 23  CPU_FOREACH(cpu) {
>>> 24  if (this_cpu != cpu) {
>>> 25  tlb_flush_other(this_cpu, cpu, 1);
>>> 26  }
>>> 27  }
>>> 28  }
>>> 29  /* For this vCPU, just update the TLB entry, no need to
>>> flush. */
>>> 30  env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>>> 31  } else {
>>> 32  /* Set a pending exclusive access in the MemoryRegion */
>>> 33  MemoryRegion *mr = iotlb_to_region(this_cpu,
>>> 34
>>> env->iotlb[mmu_idx][index].addr,
>>> 35
>>> env->iotlb[mmu_idx][index].attrs);
>>> 36  mr->pending_excl_access = true;
>>> 37  }
>>>
>>> 38  cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>>
>>> 39  tcg_exclusive_unlock();
>>>
>>> 40  /* From now on we are in LL/SC context */
>>> 41  this_cpu->ll_sc_context = true;
>>>
>>> 42  return ret;
>>> 43  }
>>>
>>>
>>> The exclusive lock at line 11 doesn't help if concurrent fast-patch
>>> store at this address occurs after we finished load at line 13 but
>>> before TLB is flushed as a result of line 25. If we reorder the load to
>>> happen after the TLB flush request we still must be sure that the flush
>>> is complete before we can do the load safely.
>> You are right, the risk actually exists. One solution to the problem
>> could be to ignore the data acquired by the load and redo the LL after
>> the flushes have been completed (basically the disas_ctx->pc points to
>> the LL instruction). This time the LL will happen without flush
>> requests and the access will be actually protected by the lock.
>
> Yes, if some other CPU wouldn't evict an entry with the same address
> from the exclusive history...
>
> Kind regards,
> Sergey



Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-06-10 Thread Sergey Fedorov
On 10/06/16 18:53, alvise rigo wrote:
> On Fri, Jun 10, 2016 at 5:21 PM, Sergey Fedorov  wrote:
>> On 26/05/16 19:35, Alvise Rigo wrote:
>>> Using tcg_exclusive_{lock,unlock}(), make the emulation of
>>> LoadLink/StoreConditional thread safe.
>>>
>>> During an LL access, this lock protects the load access itself, the
>>> update of the exclusive history and the update of the VCPU's protected
>>> range.  In a SC access, the lock protects the store access itself, the
>>> possible reset of other VCPUs' protected range and the reset of the
>>> exclusive context of calling VCPU.
>>>
>>> The lock is also taken when a normal store happens to access an
>>> exclusive page to reset other VCPUs' protected range in case of
>>> collision.
>> I think the key problem here is that the load in LL helper can race with
>> a concurrent regular fast-path store. It's probably easier to annotate
>> the source here:
>>
>>  1  WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>>  2  TCGMemOpIdx oi, uintptr_t retaddr)
>>  3  {
>>  4  WORD_TYPE ret;
>>  5  int index;
>>  6  CPUState *this_cpu = ENV_GET_CPU(env);
>>  7  CPUClass *cc = CPU_GET_CLASS(this_cpu);
>>  8  hwaddr hw_addr;
>>  9  unsigned mmu_idx = get_mmuidx(oi);
>>
>> 10  index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>>
>> 11  tcg_exclusive_lock();
>>
>> 12  /* Use the proper load helper from cpu_ldst.h */
>> 13  ret = helper_ld(env, addr, oi, retaddr);
>>
>> 14  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
>> + xlat)
>> 15   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>> 16  hw_addr = (env->iotlb[mmu_idx][index].addr &
>> TARGET_PAGE_MASK) + addr;
>> 17  if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
>> TLB_MMIO))) {
>> 18  /* If all the vCPUs have the EXCL bit set for this page
>> there is no need
>> 19   * to request any flush. */
>> 20  if (cpu_physical_memory_set_excl(hw_addr)) {
>> 21  CPUState *cpu;
>>
>> 22  excl_history_put_addr(hw_addr);
>> 23  CPU_FOREACH(cpu) {
>> 24  if (this_cpu != cpu) {
>> 25  tlb_flush_other(this_cpu, cpu, 1);
>> 26  }
>> 27  }
>> 28  }
>> 29  /* For this vCPU, just update the TLB entry, no need to
>> flush. */
>> 30  env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>> 31  } else {
>> 32  /* Set a pending exclusive access in the MemoryRegion */
>> 33  MemoryRegion *mr = iotlb_to_region(this_cpu,
>> 34
>> env->iotlb[mmu_idx][index].addr,
>> 35
>> env->iotlb[mmu_idx][index].attrs);
>> 36  mr->pending_excl_access = true;
>> 37  }
>>
>> 38  cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>>
>> 39  tcg_exclusive_unlock();
>>
>> 40  /* From now on we are in LL/SC context */
>> 41  this_cpu->ll_sc_context = true;
>>
>> 42  return ret;
>> 43  }
>>
>>
>> The exclusive lock at line 11 doesn't help if concurrent fast-patch
>> store at this address occurs after we finished load at line 13 but
>> before TLB is flushed as a result of line 25. If we reorder the load to
>> happen after the TLB flush request we still must be sure that the flush
>> is complete before we can do the load safely.
> You are right, the risk actually exists. One solution to the problem
> could be to ignore the data acquired by the load and redo the LL after
> the flushes have been completed (basically the disas_ctx->pc points to
> the LL instruction). This time the LL will happen without flush
> requests and the access will be actually protected by the lock.

Yes, if some other CPU wouldn't evict an entry with the same address
from the exclusive history...

Kind regards,
Sergey



Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-06-10 Thread alvise rigo
On Fri, Jun 10, 2016 at 5:21 PM, Sergey Fedorov  wrote:
> On 26/05/16 19:35, Alvise Rigo wrote:
>> Using tcg_exclusive_{lock,unlock}(), make the emulation of
>> LoadLink/StoreConditional thread safe.
>>
>> During an LL access, this lock protects the load access itself, the
>> update of the exclusive history and the update of the VCPU's protected
>> range.  In a SC access, the lock protects the store access itself, the
>> possible reset of other VCPUs' protected range and the reset of the
>> exclusive context of calling VCPU.
>>
>> The lock is also taken when a normal store happens to access an
>> exclusive page to reset other VCPUs' protected range in case of
>> collision.
>
> I think the key problem here is that the load in LL helper can race with
> a concurrent regular fast-path store. It's probably easier to annotate
> the source here:
>
>  1  WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
>  2  TCGMemOpIdx oi, uintptr_t retaddr)
>  3  {
>  4  WORD_TYPE ret;
>  5  int index;
>  6  CPUState *this_cpu = ENV_GET_CPU(env);
>  7  CPUClass *cc = CPU_GET_CLASS(this_cpu);
>  8  hwaddr hw_addr;
>  9  unsigned mmu_idx = get_mmuidx(oi);
>
> 10  index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>
> 11  tcg_exclusive_lock();
>
> 12  /* Use the proper load helper from cpu_ldst.h */
> 13  ret = helper_ld(env, addr, oi, retaddr);
>
> 14  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
> + xlat)
> 15   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
> 16  hw_addr = (env->iotlb[mmu_idx][index].addr &
> TARGET_PAGE_MASK) + addr;
> 17  if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
> TLB_MMIO))) {
> 18  /* If all the vCPUs have the EXCL bit set for this page
> there is no need
> 19   * to request any flush. */
> 20  if (cpu_physical_memory_set_excl(hw_addr)) {
> 21  CPUState *cpu;
>
> 22  excl_history_put_addr(hw_addr);
> 23  CPU_FOREACH(cpu) {
> 24  if (this_cpu != cpu) {
> 25  tlb_flush_other(this_cpu, cpu, 1);
> 26  }
> 27  }
> 28  }
> 29  /* For this vCPU, just update the TLB entry, no need to
> flush. */
> 30  env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
> 31  } else {
> 32  /* Set a pending exclusive access in the MemoryRegion */
> 33  MemoryRegion *mr = iotlb_to_region(this_cpu,
> 34
> env->iotlb[mmu_idx][index].addr,
> 35
> env->iotlb[mmu_idx][index].attrs);
> 36  mr->pending_excl_access = true;
> 37  }
>
> 38  cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>
> 39  tcg_exclusive_unlock();
>
> 40  /* From now on we are in LL/SC context */
> 41  this_cpu->ll_sc_context = true;
>
> 42  return ret;
> 43  }
>
>
> The exclusive lock at line 11 doesn't help if concurrent fast-patch
> store at this address occurs after we finished load at line 13 but
> before TLB is flushed as a result of line 25. If we reorder the load to
> happen after the TLB flush request we still must be sure that the flush
> is complete before we can do the load safely.

You are right, the risk actually exists. One solution to the problem
could be to ignore the data acquired by the load and redo the LL after
the flushes have been completed (basically the disas_ctx->pc points to
the LL instruction). This time the LL will happen without flush
requests and the access will be actually protected by the lock.

Regards,
alvise

>
>>
>> Moreover, adapt target-arm to also cope with the new multi-threaded
>> execution.
>>
>> Signed-off-by: Alvise Rigo 
>> ---
>>  softmmu_llsc_template.h | 11 +--
>>  softmmu_template.h  |  6 ++
>>  target-arm/op_helper.c  |  6 ++
>>  3 files changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
>> index 2c4a494..d3810c0 100644
>> --- a/softmmu_llsc_template.h
>> +++ b/softmmu_llsc_template.h
>> @@ -62,11 +62,13 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, 
>> target_ulong addr,
>>  hwaddr hw_addr;
>>  unsigned mmu_idx = get_mmuidx(oi);
>>
>> +index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>> +
>> +tcg_exclusive_lock();
>> +
>>  /* Use the proper load helper from cpu_ldst.h */
>>  ret = helper_ld(env, addr, oi, retaddr);
>>
>> -index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>> -
>>  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat)
>>   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>>  hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + addr;
>> @@ -95,6 +97,8 @@ 

Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-06-10 Thread Sergey Fedorov
On 26/05/16 19:35, Alvise Rigo wrote:
> Using tcg_exclusive_{lock,unlock}(), make the emulation of
> LoadLink/StoreConditional thread safe.
>
> During an LL access, this lock protects the load access itself, the
> update of the exclusive history and the update of the VCPU's protected
> range.  In a SC access, the lock protects the store access itself, the
> possible reset of other VCPUs' protected range and the reset of the
> exclusive context of calling VCPU.
>
> The lock is also taken when a normal store happens to access an
> exclusive page to reset other VCPUs' protected range in case of
> collision.

I think the key problem here is that the load in LL helper can race with
a concurrent regular fast-path store. It's probably easier to annotate
the source here:

 1  WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr,
 2  TCGMemOpIdx oi, uintptr_t retaddr)
 3  {
 4  WORD_TYPE ret;
 5  int index;
 6  CPUState *this_cpu = ENV_GET_CPU(env);
 7  CPUClass *cc = CPU_GET_CLASS(this_cpu);
 8  hwaddr hw_addr;
 9  unsigned mmu_idx = get_mmuidx(oi);
  
10  index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
  
11  tcg_exclusive_lock();
  
12  /* Use the proper load helper from cpu_ldst.h */
13  ret = helper_ld(env, addr, oi, retaddr);
  
14  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr
+ xlat)
15   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
16  hw_addr = (env->iotlb[mmu_idx][index].addr &
TARGET_PAGE_MASK) + addr;
17  if (likely(!(env->tlb_table[mmu_idx][index].addr_read &
TLB_MMIO))) {
18  /* If all the vCPUs have the EXCL bit set for this page
there is no need
19   * to request any flush. */
20  if (cpu_physical_memory_set_excl(hw_addr)) {
21  CPUState *cpu;
  
22  excl_history_put_addr(hw_addr);
23  CPU_FOREACH(cpu) {
24  if (this_cpu != cpu) {
25  tlb_flush_other(this_cpu, cpu, 1);
26  }
27  }
28  }
29  /* For this vCPU, just update the TLB entry, no need to
flush. */
30  env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
31  } else {
32  /* Set a pending exclusive access in the MemoryRegion */
33  MemoryRegion *mr = iotlb_to_region(this_cpu,
34
env->iotlb[mmu_idx][index].addr,
35
env->iotlb[mmu_idx][index].attrs);
36  mr->pending_excl_access = true;
37  }
  
38  cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
  
39  tcg_exclusive_unlock();
  
40  /* From now on we are in LL/SC context */
41  this_cpu->ll_sc_context = true;
  
42  return ret;
43  }


The exclusive lock at line 11 doesn't help if concurrent fast-patch
store at this address occurs after we finished load at line 13 but
before TLB is flushed as a result of line 25. If we reorder the load to
happen after the TLB flush request we still must be sure that the flush
is complete before we can do the load safely.

>
> Moreover, adapt target-arm to also cope with the new multi-threaded
> execution.
>
> Signed-off-by: Alvise Rigo 
> ---
>  softmmu_llsc_template.h | 11 +--
>  softmmu_template.h  |  6 ++
>  target-arm/op_helper.c  |  6 ++
>  3 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
> index 2c4a494..d3810c0 100644
> --- a/softmmu_llsc_template.h
> +++ b/softmmu_llsc_template.h
> @@ -62,11 +62,13 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, 
> target_ulong addr,
>  hwaddr hw_addr;
>  unsigned mmu_idx = get_mmuidx(oi);
>  
> +index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> +
> +tcg_exclusive_lock();
> +
>  /* Use the proper load helper from cpu_ldst.h */
>  ret = helper_ld(env, addr, oi, retaddr);
>  
> -index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> -
>  /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat)
>   * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
>  hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + addr;
> @@ -95,6 +97,8 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, 
> target_ulong addr,
>  
>  cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
>  
> +tcg_exclusive_unlock();
> +
>  /* From now on we are in LL/SC context */
>  this_cpu->ll_sc_context = true;
>  
> @@ -114,6 +118,8 @@ WORD_TYPE helper_stcond_name(CPUArchState *env, 
> target_ulong addr,
>   * access as one made by the store conditional wrapper. If the store
>   * 

[Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading

2016-05-26 Thread Alvise Rigo
Using tcg_exclusive_{lock,unlock}(), make the emulation of
LoadLink/StoreConditional thread safe.

During an LL access, this lock protects the load access itself, the
update of the exclusive history and the update of the VCPU's protected
range.  In a SC access, the lock protects the store access itself, the
possible reset of other VCPUs' protected range and the reset of the
exclusive context of calling VCPU.

The lock is also taken when a normal store happens to access an
exclusive page to reset other VCPUs' protected range in case of
collision.

Moreover, adapt target-arm to also cope with the new multi-threaded
execution.

Signed-off-by: Alvise Rigo 
---
 softmmu_llsc_template.h | 11 +--
 softmmu_template.h  |  6 ++
 target-arm/op_helper.c  |  6 ++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h
index 2c4a494..d3810c0 100644
--- a/softmmu_llsc_template.h
+++ b/softmmu_llsc_template.h
@@ -62,11 +62,13 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, 
target_ulong addr,
 hwaddr hw_addr;
 unsigned mmu_idx = get_mmuidx(oi);
 
+index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+
+tcg_exclusive_lock();
+
 /* Use the proper load helper from cpu_ldst.h */
 ret = helper_ld(env, addr, oi, retaddr);
 
-index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
-
 /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat)
  * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */
 hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + addr;
@@ -95,6 +97,8 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong 
addr,
 
 cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE);
 
+tcg_exclusive_unlock();
+
 /* From now on we are in LL/SC context */
 this_cpu->ll_sc_context = true;
 
@@ -114,6 +118,8 @@ WORD_TYPE helper_stcond_name(CPUArchState *env, 
target_ulong addr,
  * access as one made by the store conditional wrapper. If the store
  * conditional does not succeed, the value will be set to 0.*/
 cpu->excl_succeeded = true;
+
+tcg_exclusive_lock();
 helper_st(env, addr, val, oi, retaddr);
 
 if (cpu->excl_succeeded) {
@@ -123,6 +129,7 @@ WORD_TYPE helper_stcond_name(CPUArchState *env, 
target_ulong addr,
 
 /* Unset LL/SC context */
 cc->cpu_reset_excl_context(cpu);
+tcg_exclusive_unlock();
 
 return ret;
 }
diff --git a/softmmu_template.h b/softmmu_template.h
index 76fe37e..9363a7b 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -537,11 +537,16 @@ static inline void 
smmu_helper(do_excl_store)(CPUArchState *env,
 }
 }
 
+/* Take the lock in case we are not coming from a SC */
+tcg_exclusive_lock();
+
 smmu_helper(do_ram_store)(env, little_endian, val, addr, oi,
   get_mmuidx(oi), index, retaddr);
 
 reset_other_cpus_colliding_ll_addr(hw_addr, DATA_SIZE);
 
+tcg_exclusive_unlock();
+
 return;
 }
 
@@ -572,6 +577,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong 
addr, DATA_TYPE val,
 /* Handle an IO access or exclusive access.  */
 if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
 if (tlb_addr & TLB_EXCL) {
+
 smmu_helper(do_excl_store)(env, true, val, addr, oi, index,
retaddr);
 return;
diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
index e22afc5..19ea52d 100644
--- a/target-arm/op_helper.c
+++ b/target-arm/op_helper.c
@@ -35,7 +35,9 @@ static void raise_exception(CPUARMState *env, uint32_t excp,
 cs->exception_index = excp;
 env->exception.syndrome = syndrome;
 env->exception.target_el = target_el;
+tcg_exclusive_lock();
 cc->cpu_reset_excl_context(cs);
+tcg_exclusive_unlock();
 cpu_loop_exit(cs);
 }
 
@@ -58,7 +60,9 @@ void HELPER(atomic_clear)(CPUARMState *env)
 CPUState *cs = ENV_GET_CPU(env);
 CPUClass *cc = CPU_GET_CLASS(cs);
 
+tcg_exclusive_lock();
 cc->cpu_reset_excl_context(cs);
+tcg_exclusive_unlock();
 }
 
 uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def,
@@ -874,7 +878,9 @@ void HELPER(exception_return)(CPUARMState *env)
 
 aarch64_save_sp(env, cur_el);
 
+tcg_exclusive_lock();
 cc->cpu_reset_excl_context(cs);
+tcg_exclusive_unlock();
 
 /* We must squash the PSTATE.SS bit to zero unless both of the
  * following hold:
-- 
2.8.3