alvise rigo <a.r...@virtualopensystems.com> writes: > 1. LL(x) // x requires a flush > 2. query flush to all the n VCPUs > 3. exit from the CPU loop and wait until all the flushes are done > 4. enter the loop to re-execute LL(x). This time no flush is required > > Now, points 2. and 3. can be done either with n calls of > async_safe_run_on_cpu() or with n calls of async_wait_run_on_cpu(). In my > opinion the former is not really done for this use case since it would call > n^2 times cpu_exit() and it would not really ensure that the VCPU has > exited from the guest code to make an iteration of the CPU loop.
async_safe_run_on_cpu maybe should be renamed async_safe_run_on_system() as all vCPUs are held. You only need one helper to do all the flushing and bit-setting. > > Regards, > alvise > > On Tue, Jun 14, 2016 at 2:00 PM, Alex Bennée <alex.ben...@linaro.org> wrote: >> >> alvise rigo <a.r...@virtualopensystems.com> writes: >> >>> On Fri, Jun 10, 2016 at 5:21 PM, Sergey Fedorov <serge.f...@gmail.com> >>> wrote: >>>> On 26/05/16 19:35, Alvise Rigo wrote: >>>>> Using tcg_exclusive_{lock,unlock}(), make the emulation of >>>>> LoadLink/StoreConditional thread safe. >>>>> >>>>> During an LL access, this lock protects the load access itself, the >>>>> update of the exclusive history and the update of the VCPU's protected >>>>> range. In a SC access, the lock protects the store access itself, the >>>>> possible reset of other VCPUs' protected range and the reset of the >>>>> exclusive context of calling VCPU. >>>>> >>>>> The lock is also taken when a normal store happens to access an >>>>> exclusive page to reset other VCPUs' protected range in case of >>>>> collision. >>>> >>>> I think the key problem here is that the load in LL helper can race with >>>> a concurrent regular fast-path store. It's probably easier to annotate >>>> the source here: >>>> >>>> 1 WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr, >>>> 2 TCGMemOpIdx oi, uintptr_t retaddr) >>>> 3 { >>>> 4 WORD_TYPE ret; >>>> 5 int index; >>>> 6 CPUState *this_cpu = ENV_GET_CPU(env); >>>> 7 CPUClass *cc = CPU_GET_CLASS(this_cpu); >>>> 8 hwaddr hw_addr; >>>> 9 unsigned mmu_idx = get_mmuidx(oi); >>>> >>>> 10 index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); >>>> >>>> 11 tcg_exclusive_lock(); >>>> >>>> 12 /* Use the proper load helper from cpu_ldst.h */ >>>> 13 ret = helper_ld(env, addr, oi, retaddr); >>>> >>>> 14 /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr >>>> + xlat) >>>> 15 * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */ >>>> 16 hw_addr = (env->iotlb[mmu_idx][index].addr & >>>> TARGET_PAGE_MASK) + addr; >>>> 17 if (likely(!(env->tlb_table[mmu_idx][index].addr_read & >>>> TLB_MMIO))) { >>>> 18 /* If all the vCPUs have the EXCL bit set for this page >>>> there is no need >>>> 19 * to request any flush. */ >>>> 20 if (cpu_physical_memory_set_excl(hw_addr)) { >>>> 21 CPUState *cpu; >>>> >>>> 22 excl_history_put_addr(hw_addr); >>>> 23 CPU_FOREACH(cpu) { >>>> 24 if (this_cpu != cpu) { >>>> 25 tlb_flush_other(this_cpu, cpu, 1); >>>> 26 } >>>> 27 } >>>> 28 } >>>> 29 /* For this vCPU, just update the TLB entry, no need to >>>> flush. */ >>>> 30 env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL; >>>> 31 } else { >>>> 32 /* Set a pending exclusive access in the MemoryRegion */ >>>> 33 MemoryRegion *mr = iotlb_to_region(this_cpu, >>>> 34 >>>> env->iotlb[mmu_idx][index].addr, >>>> 35 >>>> env->iotlb[mmu_idx][index].attrs); >>>> 36 mr->pending_excl_access = true; >>>> 37 } >>>> >>>> 38 cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE); >>>> >>>> 39 tcg_exclusive_unlock(); >>>> >>>> 40 /* From now on we are in LL/SC context */ >>>> 41 this_cpu->ll_sc_context = true; >>>> >>>> 42 return ret; >>>> 43 } >>>> >>>> >>>> The exclusive lock at line 11 doesn't help if concurrent fast-patch >>>> store at this address occurs after we finished load at line 13 but >>>> before TLB is flushed as a result of line 25. If we reorder the load to >>>> happen after the TLB flush request we still must be sure that the flush >>>> is complete before we can do the load safely. >>> >>> You are right, the risk actually exists. One solution to the problem >>> could be to ignore the data acquired by the load and redo the LL after >>> the flushes have been completed (basically the disas_ctx->pc points to >>> the LL instruction). This time the LL will happen without flush >>> requests and the access will be actually protected by the lock. >> >> So is just punting the work of to an async safe task and restarting the >> vCPU thread going to be enough? >> >>> >>> Regards, >>> alvise >>> >>>> >>>>> >>>>> Moreover, adapt target-arm to also cope with the new multi-threaded >>>>> execution. >>>>> >>>>> Signed-off-by: Alvise Rigo <a.r...@virtualopensystems.com> >>>>> --- >>>>> softmmu_llsc_template.h | 11 +++++++++-- >>>>> softmmu_template.h | 6 ++++++ >>>>> target-arm/op_helper.c | 6 ++++++ >>>>> 3 files changed, 21 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h >>>>> index 2c4a494..d3810c0 100644 >>>>> --- a/softmmu_llsc_template.h >>>>> +++ b/softmmu_llsc_template.h >>>>> @@ -62,11 +62,13 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, >>>>> target_ulong addr, >>>>> hwaddr hw_addr; >>>>> unsigned mmu_idx = get_mmuidx(oi); >>>>> >>>>> + index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); >>>>> + >>>>> + tcg_exclusive_lock(); >>>>> + >>>>> /* Use the proper load helper from cpu_ldst.h */ >>>>> ret = helper_ld(env, addr, oi, retaddr); >>>>> >>>>> - index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); >>>>> - >>>>> /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat) >>>>> * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */ >>>>> hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + >>>>> addr; >>>>> @@ -95,6 +97,8 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, >>>>> target_ulong addr, >>>>> >>>>> cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE); >>>>> >>>>> + tcg_exclusive_unlock(); >>>>> + >>>>> /* From now on we are in LL/SC context */ >>>>> this_cpu->ll_sc_context = true; >>>>> >>>>> @@ -114,6 +118,8 @@ WORD_TYPE helper_stcond_name(CPUArchState *env, >>>>> target_ulong addr, >>>>> * access as one made by the store conditional wrapper. If the >>>>> store >>>>> * conditional does not succeed, the value will be set to 0.*/ >>>>> cpu->excl_succeeded = true; >>>>> + >>>>> + tcg_exclusive_lock(); >>>>> helper_st(env, addr, val, oi, retaddr); >>>>> >>>>> if (cpu->excl_succeeded) { >>>>> @@ -123,6 +129,7 @@ WORD_TYPE helper_stcond_name(CPUArchState *env, >>>>> target_ulong addr, >>>>> >>>>> /* Unset LL/SC context */ >>>>> cc->cpu_reset_excl_context(cpu); >>>>> + tcg_exclusive_unlock(); >>>>> >>>>> return ret; >>>>> } >>>>> diff --git a/softmmu_template.h b/softmmu_template.h >>>>> index 76fe37e..9363a7b 100644 >>>>> --- a/softmmu_template.h >>>>> +++ b/softmmu_template.h >>>>> @@ -537,11 +537,16 @@ static inline void >>>>> smmu_helper(do_excl_store)(CPUArchState *env, >>>>> } >>>>> } >>>>> >>>>> + /* Take the lock in case we are not coming from a SC */ >>>>> + tcg_exclusive_lock(); >>>>> + >>>>> smmu_helper(do_ram_store)(env, little_endian, val, addr, oi, >>>>> get_mmuidx(oi), index, retaddr); >>>>> >>>>> reset_other_cpus_colliding_ll_addr(hw_addr, DATA_SIZE); >>>>> >>>>> + tcg_exclusive_unlock(); >>>>> + >>>>> return; >>>>> } >>>>> >>>>> @@ -572,6 +577,7 @@ void helper_le_st_name(CPUArchState *env, >>>>> target_ulong addr, DATA_TYPE val, >>>>> /* Handle an IO access or exclusive access. */ >>>>> if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { >>>>> if (tlb_addr & TLB_EXCL) { >>>>> + >>>>> smmu_helper(do_excl_store)(env, true, val, addr, oi, index, >>>>> retaddr); >>>>> return; >>>>> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c >>>>> index e22afc5..19ea52d 100644 >>>>> --- a/target-arm/op_helper.c >>>>> +++ b/target-arm/op_helper.c >>>>> @@ -35,7 +35,9 @@ static void raise_exception(CPUARMState *env, uint32_t >>>>> excp, >>>>> cs->exception_index = excp; >>>>> env->exception.syndrome = syndrome; >>>>> env->exception.target_el = target_el; >>>>> + tcg_exclusive_lock(); >>>>> cc->cpu_reset_excl_context(cs); >>>>> + tcg_exclusive_unlock(); >>>>> cpu_loop_exit(cs); >>>>> } >>>>> >>>>> @@ -58,7 +60,9 @@ void HELPER(atomic_clear)(CPUARMState *env) >>>>> CPUState *cs = ENV_GET_CPU(env); >>>>> CPUClass *cc = CPU_GET_CLASS(cs); >>>>> >>>>> + tcg_exclusive_lock(); >>>>> cc->cpu_reset_excl_context(cs); >>>>> + tcg_exclusive_unlock(); >>>>> } >>>>> >>>>> uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def, >>>>> @@ -874,7 +878,9 @@ void HELPER(exception_return)(CPUARMState *env) >>>>> >>>>> aarch64_save_sp(env, cur_el); >>>>> >>>>> + tcg_exclusive_lock(); >>>>> cc->cpu_reset_excl_context(cs); >>>>> + tcg_exclusive_unlock(); >>>>> >>>>> /* We must squash the PSTATE.SS bit to zero unless both of the >>>>> * following hold: >>>> >> >> >> -- >> Alex Bennée -- Alex Bennée