Re: [Qemu-devel] [PATCH 26/26] tcg: Use tlb_fill probe from tlb_vaddr_to_host

2019-05-09 Thread Richard Henderson
On 5/9/19 1:56 AM, Peter Maydell wrote:
> On Thu, 9 May 2019 at 06:24, Richard Henderson
>  wrote:
>>
>> On 4/29/19 10:41 AM, Peter Maydell wrote:
>>> On Wed, 3 Apr 2019 at 05:05, Richard Henderson
>>>  wrote:

 Most of the existing users would continue around a loop which
 would fault the tlb entry in via a normal load/store.  But for
 SVE we have a true non-faulting case which requires the new
 probing form of tlb_fill.
>>>
>>> So am I right in thinking that this fixes a bug where we
>>> previously would mark a load as faulted if the memory happened
>>> not to be in the TLB, whereas now we will correctly pull in the
>>> TLB entry and do the load ?
>>
>> Yes.
>>
>>> (Since guest code ought to be handling the "non-first-load
>>> faulted" case by looping round or otherwise arranging to
>>> retry, nothing in practice would have noticed this bug, right?)
>>
>> Yes.
>>
>> The only case with changed behaviour is (expected to be) SVE no-fault, where
>> the loop you mention would have produced different incorrect results.
> 
> OK. If we're fixing a guest-visible bug it would be nice to
> describe that in the commit message.

The commit message now reads, in part,

But for AArch64 SVE we have an existing emulation bug wherein we
would mark the first element of a no-fault vector load as faulted
(within the FFR, not via exception) just because we did not have
its address in the TLB.  Now we can properly only mark it as faulted
if there really is no valid, readable translation, while still not
raising an exception.  (Note that beyond the first element of the
vector, the hardware may report a fault for any reason whatsoever;
with at least one element loaded, forward progress is guaranteed.)


r~



Re: [Qemu-devel] [PATCH 26/26] tcg: Use tlb_fill probe from tlb_vaddr_to_host

2019-05-09 Thread Peter Maydell
On Thu, 9 May 2019 at 06:24, Richard Henderson
 wrote:
>
> On 4/29/19 10:41 AM, Peter Maydell wrote:
> > On Wed, 3 Apr 2019 at 05:05, Richard Henderson
> >  wrote:
> >>
> >> Most of the existing users would continue around a loop which
> >> would fault the tlb entry in via a normal load/store.  But for
> >> SVE we have a true non-faulting case which requires the new
> >> probing form of tlb_fill.
> >
> > So am I right in thinking that this fixes a bug where we
> > previously would mark a load as faulted if the memory happened
> > not to be in the TLB, whereas now we will correctly pull in the
> > TLB entry and do the load ?
>
> Yes.
>
> > (Since guest code ought to be handling the "non-first-load
> > faulted" case by looping round or otherwise arranging to
> > retry, nothing in practice would have noticed this bug, right?)
>
> Yes.
>
> The only case with changed behaviour is (expected to be) SVE no-fault, where
> the loop you mention would have produced different incorrect results.

OK. If we're fixing a guest-visible bug it would be nice to
describe that in the commit message.

thanks
-- PMM



Re: [Qemu-devel] [PATCH 26/26] tcg: Use tlb_fill probe from tlb_vaddr_to_host

2019-05-08 Thread Richard Henderson
On 4/29/19 10:41 AM, Peter Maydell wrote:
> On Wed, 3 Apr 2019 at 05:05, Richard Henderson
>  wrote:
>>
>> Most of the existing users would continue around a loop which
>> would fault the tlb entry in via a normal load/store.  But for
>> SVE we have a true non-faulting case which requires the new
>> probing form of tlb_fill.
> 
> So am I right in thinking that this fixes a bug where we
> previously would mark a load as faulted if the memory happened
> not to be in the TLB, whereas now we will correctly pull in the
> TLB entry and do the load ?

Yes.

> (Since guest code ought to be handling the "non-first-load
> faulted" case by looping round or otherwise arranging to
> retry, nothing in practice would have noticed this bug, right?)

Yes.

The only case with changed behaviour is (expected to be) SVE no-fault, where
the loop you mention would have produced different incorrect results.


r~



Re: [Qemu-devel] [PATCH 26/26] tcg: Use tlb_fill probe from tlb_vaddr_to_host

2019-04-29 Thread Peter Maydell
On Wed, 3 Apr 2019 at 05:05, Richard Henderson
 wrote:
>
> Most of the existing users would continue around a loop which
> would fault the tlb entry in via a normal load/store.  But for
> SVE we have a true non-faulting case which requires the new
> probing form of tlb_fill.

So am I right in thinking that this fixes a bug where we
previously would mark a load as faulted if the memory happened
not to be in the TLB, whereas now we will correctly pull in the
TLB entry and do the load ?

(Since guest code ought to be handling the "non-first-load
faulted" case by looping round or otherwise arranging to
retry, nothing in practice would have noticed this bug, right?)

> Signed-off-by: Richard Henderson 
> ---
>  include/exec/cpu_ldst.h | 40 
>  accel/tcg/cputlb.c  | 69 -
>  target/arm/sve_helper.c |  6 +---
>  3 files changed, 68 insertions(+), 47 deletions(-)
>
> diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
> index d78041d7a0..be8c3f4da2 100644
> --- a/include/exec/cpu_ldst.h
> +++ b/include/exec/cpu_ldst.h
> @@ -440,43 +440,15 @@ static inline CPUTLBEntry *tlb_entry(CPUArchState *env, 
> uintptr_t mmu_idx,
>   * This is the equivalent of the initial fast-path code used by
>   * TCG backends for guest load and store accesses.
>   */

The doc comment which this is the last two lines of needs
updating, I think -- with the changed implementation it's
no longer just the equivalent of the fast-path bit of code,
and it doesn't return NULL on a TLB miss any more.

Otherwise
Reviewed-by: Peter Maydell 

thanks
-- PMM



[Qemu-devel] [PATCH 26/26] tcg: Use tlb_fill probe from tlb_vaddr_to_host

2019-04-02 Thread Richard Henderson
Most of the existing users would continue around a loop which
would fault the tlb entry in via a normal load/store.  But for
SVE we have a true non-faulting case which requires the new
probing form of tlb_fill.

Signed-off-by: Richard Henderson 
---
 include/exec/cpu_ldst.h | 40 
 accel/tcg/cputlb.c  | 69 -
 target/arm/sve_helper.c |  6 +---
 3 files changed, 68 insertions(+), 47 deletions(-)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index d78041d7a0..be8c3f4da2 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -440,43 +440,15 @@ static inline CPUTLBEntry *tlb_entry(CPUArchState *env, 
uintptr_t mmu_idx,
  * This is the equivalent of the initial fast-path code used by
  * TCG backends for guest load and store accesses.
  */
+#ifdef CONFIG_USER_ONLY
 static inline void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
-  int access_type, int mmu_idx)
+  MMUAccessType access_type, int mmu_idx)
 {
-#if defined(CONFIG_USER_ONLY)
 return g2h(addr);
-#else
-CPUTLBEntry *tlbentry = tlb_entry(env, mmu_idx, addr);
-abi_ptr tlb_addr;
-uintptr_t haddr;
-
-switch (access_type) {
-case 0:
-tlb_addr = tlbentry->addr_read;
-break;
-case 1:
-tlb_addr = tlb_addr_write(tlbentry);
-break;
-case 2:
-tlb_addr = tlbentry->addr_code;
-break;
-default:
-g_assert_not_reached();
-}
-
-if (!tlb_hit(tlb_addr, addr)) {
-/* TLB entry is for a different page */
-return NULL;
-}
-
-if (tlb_addr & ~TARGET_PAGE_MASK) {
-/* IO access */
-return NULL;
-}
-
-haddr = addr + tlbentry->addend;
-return (void *)haddr;
-#endif /* defined(CONFIG_USER_ONLY) */
 }
+#else
+void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
+MMUAccessType access_type, int mmu_idx);
+#endif
 
 #endif /* CPU_LDST_H */
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 7f59d815db..959b6d4ded 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1006,6 +1006,16 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry 
*iotlbentry,
 }
 }
 
+static inline target_ulong tlb_read_ofs(CPUTLBEntry *entry, size_t ofs)
+{
+#if TCG_OVERSIZED_GUEST
+return *(target_ulong *)((uintptr_t)entry + ofs);
+#else
+/* ofs might correspond to .addr_write, so use atomic_read */
+return atomic_read((target_ulong *)((uintptr_t)entry + ofs));
+#endif
+}
+
 /* Return true if ADDR is present in the victim tlb, and has been copied
back to the main tlb.  */
 static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
@@ -1016,14 +1026,7 @@ static bool victim_tlb_hit(CPUArchState *env, size_t 
mmu_idx, size_t index,
 assert_cpu_is_self(ENV_GET_CPU(env));
 for (vidx = 0; vidx < CPU_VTLB_SIZE; ++vidx) {
 CPUTLBEntry *vtlb = >tlb_v_table[mmu_idx][vidx];
-target_ulong cmp;
-
-/* elt_ofs might correspond to .addr_write, so use atomic_read */
-#if TCG_OVERSIZED_GUEST
-cmp = *(target_ulong *)((uintptr_t)vtlb + elt_ofs);
-#else
-cmp = atomic_read((target_ulong *)((uintptr_t)vtlb + elt_ofs));
-#endif
+target_ulong cmp = tlb_read_ofs(vtlb, elt_ofs);
 
 if (cmp == page) {
 /* Found entry in victim tlb, swap tlb and iotlb.  */
@@ -1107,6 +1110,56 @@ void probe_write(CPUArchState *env, target_ulong addr, 
int size, int mmu_idx,
 }
 }
 
+void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
+MMUAccessType access_type, int mmu_idx)
+{
+CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
+uintptr_t tlb_addr, page;
+size_t elt_ofs;
+
+switch (access_type) {
+case MMU_DATA_LOAD:
+elt_ofs = offsetof(CPUTLBEntry, addr_read);
+break;
+case MMU_DATA_STORE:
+elt_ofs = offsetof(CPUTLBEntry, addr_write);
+break;
+case MMU_INST_FETCH:
+elt_ofs = offsetof(CPUTLBEntry, addr_code);
+break;
+default:
+g_assert_not_reached();
+}
+
+page = addr & TARGET_PAGE_MASK;
+tlb_addr = tlb_read_ofs(entry, elt_ofs);
+
+if (!tlb_hit_page(tlb_addr, page)) {
+uintptr_t index = tlb_index(env, mmu_idx, addr);
+
+if (!victim_tlb_hit(env, mmu_idx, index, elt_ofs, page)) {
+CPUState *cs = ENV_GET_CPU(env);
+CPUClass *cc = CPU_GET_CLASS(cs);
+
+if (!cc->tlb_fill(cs, addr, 0, access_type, mmu_idx, true, 0)) {
+/* Non-faulting page table read failed.  */
+return NULL;
+}
+
+/* TLB resize via tlb_fill may have moved the entry.  */
+entry = tlb_entry(env, mmu_idx, addr);
+}
+tlb_addr = tlb_read_ofs(entry, elt_ofs);
+}
+
+if (tlb_addr & ~TARGET_PAGE_MASK) {
+/* IO access */
+return