Re: [PATCH v3 13/18] target/arm: Update contiguous first-fault and no-fault loads

2020-04-28 Thread Richard Henderson
On 4/27/20 11:38 AM, Peter Maydell wrote:
> I would suggest something like:
> 
> + * From this point on, all memory operations are MemSingleNF.
> + *
> + * Per the MemSingleNF pseudocode, a no-fault load from Device memory
> + * must not actually hit the bus -- it returns (UNKNOWN, FAULT) instead.
> + *
> + * Unfortuately we do not have access to the memory attributes from the 
> PTE
> + * to tell Device memory from Normal memory. So we make a mostly
> + * correct check, and indicate (UNKNOWN, FAULT) for any MMIO.
> + * This gives the right answer for the common cases of "Normal memory,
> + * backed by host RAM" and "Device memory, backed by MMIO".
> + * The architecture allows us to suppress an NF load and return
> + * (UNKNOWN, FAULT) for any reason), so our behaviour (indicate
> + * fault) for the corner case of "Normal memory, backed by MMIO" is
> + * permitted. The case we get wrong is "Device memory, backed by
> + * host RAM", which we should return (UNKNOWN, FAULT) for but do not.
> + *
> + * Similarly, CPU_BP breakpoints would raise exceptions, and so
> + * return (UNKNOWN, FAULT).  For simplicity, we consider gdb and
> + * architectural breakpoints the same.
> 
> assuming my understanding is correct...

Yep, thanks.  I'll merge this text.


r~



Re: [PATCH v3 13/18] target/arm: Update contiguous first-fault and no-fault loads

2020-04-27 Thread Peter Maydell
On Mon, 27 Apr 2020 at 17:45, Richard Henderson
 wrote:
> We *can* indicate fault from MemSingleNF for any reason whatsoever, or no
> reason whatsoever.
>
> > // Implementation may suppress NF load for any reason
> > if ConstrainUnpredictableBool(Unpredictable_NONFAULT) then
> >   return (bits(8*size) UNKNOWN, TRUE);
>
> What I'm trying to talk about above, is the third statement in MemSingleNF,
>
> > // Non-fault load from Device memory must not be performed externally
> > if memaddrdesc.memattrs.memtype == MemType_Device then
> >   return (bits(8*size) UNKNOWN, TRUE);
>
> and the reason we can't actually test MemType_Device here.
>
> If you have better wording for that, I'm all ears.  But I don't think there's
> an actual bug here.

Oh, the comment didn't say you were relying on the operation being
allowed to return 'fail' for any reason; in particular the line about
"Normal memory [...] should access the bus" implies the opposite.
(I also missed the distinction here between "indicate fault" and "fault".)

But in that case you have the opposite problem, I think -- just because
something's backed by host memory doesn't mean the guest didn't
map it as Device: that case the architecture says must be indicated
as 'fault' but this code won't do it.

I would suggest something like:

+ * From this point on, all memory operations are MemSingleNF.
+ *
+ * Per the MemSingleNF pseudocode, a no-fault load from Device memory
+ * must not actually hit the bus -- it returns (UNKNOWN, FAULT) instead.
+ *
+ * Unfortuately we do not have access to the memory attributes from the PTE
+ * to tell Device memory from Normal memory. So we make a mostly
+ * correct check, and indicate (UNKNOWN, FAULT) for any MMIO.
+ * This gives the right answer for the common cases of "Normal memory,
+ * backed by host RAM" and "Device memory, backed by MMIO".
+ * The architecture allows us to suppress an NF load and return
+ * (UNKNOWN, FAULT) for any reason), so our behaviour (indicate
+ * fault) for the corner case of "Normal memory, backed by MMIO" is
+ * permitted. The case we get wrong is "Device memory, backed by
+ * host RAM", which we should return (UNKNOWN, FAULT) for but do not.
+ *
+ * Similarly, CPU_BP breakpoints would raise exceptions, and so
+ * return (UNKNOWN, FAULT).  For simplicity, we consider gdb and
+ * architectural breakpoints the same.

assuming my understanding is correct...

thanks
-- PMM



Re: [PATCH v3 13/18] target/arm: Update contiguous first-fault and no-fault loads

2020-04-27 Thread Richard Henderson
On 4/27/20 9:32 AM, Peter Maydell wrote:
 + * From this point on, all memory operations are MemSingleNF.
 + *
 + * Per the MemSingleNF pseudocode, a no-fault load from Device memory
 + * must not actually hit the bus -- it returns (UNKNOWN, FAULT) 
 instead.
 + * If you map non-RAM with Normal memory attributes and do a NF
 + * load then it should access the bus -- but doing so is illegal.
 + *
 + * While we do not have access to the memory attributes from the PTE
 + * to tell Device memory from Normal memory, we can validly assume 
 that
 + * non-RAM has been mapped as Device memory.  Thus we indicate fault
 + * on all MMIO.
>>>
>>> I still don't understand why this is right. All non-RAM is MMIO
>>> but not all MMIO is non-RAM; so you might have something that's
>>> MMIO (at least for the moment) and has been mapped Normal. That
>>> shouldn't fault.
>>
>> Everything that must go through the slow path has TLB_MMIO set.
> 
> Yes. But not everything that goes through the slow path is Device memory.
> We can (should) fault on all accesses to Device memory, but we can't
> fault on all accesses that are slow-pathed, because some of them could
> be entirely valid Normal memory.

We *can* indicate fault from MemSingleNF for any reason whatsoever, or no
reason whatsoever.

> // Implementation may suppress NF load for any reason
> if ConstrainUnpredictableBool(Unpredictable_NONFAULT) then
>   return (bits(8*size) UNKNOWN, TRUE);

What I'm trying to talk about above, is the third statement in MemSingleNF,

> // Non-fault load from Device memory must not be performed externally
> if memaddrdesc.memattrs.memtype == MemType_Device then
>   return (bits(8*size) UNKNOWN, TRUE);

and the reason we can't actually test MemType_Device here.

If you have better wording for that, I'm all ears.  But I don't think there's
an actual bug here.


r~



Re: [PATCH v3 13/18] target/arm: Update contiguous first-fault and no-fault loads

2020-04-27 Thread Peter Maydell
On Mon, 27 Apr 2020 at 17:16, Richard Henderson
 wrote:
>
> On 4/27/20 4:03 AM, Peter Maydell wrote:
> > On Wed, 22 Apr 2020 at 05:33, Richard Henderson
> >  wrote:
> >>
> >> With sve_cont_ldst_pages, the differences between first-fault and no-fault
> >> are minimal, so unify the routines.  With cpu_probe_watchpoint, we are able
> >> to make progress through pages with TLB_WATCHPOINT set when the watchpoint
> >> does not actually fire.
> >>
> >> Signed-off-by: Richard Henderson 
> >
> >
> >>  /*
> >> - * Perform one normal read, which will fault or not.
> >> - * But it is likely to bring the page into the tlb.
> >> + * From this point on, all memory operations are MemSingleNF.
> >> + *
> >> + * Per the MemSingleNF pseudocode, a no-fault load from Device memory
> >> + * must not actually hit the bus -- it returns (UNKNOWN, FAULT) 
> >> instead.
> >> + * If you map non-RAM with Normal memory attributes and do a NF
> >> + * load then it should access the bus -- but doing so is illegal.
> >> + *
> >> + * While we do not have access to the memory attributes from the PTE
> >> + * to tell Device memory from Normal memory, we can validly assume 
> >> that
> >> + * non-RAM has been mapped as Device memory.  Thus we indicate fault
> >> + * on all MMIO.
> >
> > I still don't understand why this is right. All non-RAM is MMIO
> > but not all MMIO is non-RAM; so you might have something that's
> > MMIO (at least for the moment) and has been mapped Normal. That
> > shouldn't fault.
>
> Everything that must go through the slow path has TLB_MMIO set.

Yes. But not everything that goes through the slow path is Device memory.
We can (should) fault on all accesses to Device memory, but we can't
fault on all accesses that are slow-pathed, because some of them could
be entirely valid Normal memory.

> What you're thinking of, romd, has TLB_MMIO set on the write comparator but 
> not
> the read comparator.

True when the romd device is in 'romd mode', ie mr->romd_mode is
true. Otherwise memory_region_is_romd() returns false and
tlb_set_page_with_attrs() treats it like normal MMIO, because
both read and write must take the slow path. (For flash this
happens when it is put into programming mode and reads from
the memory region are no longer simple reads from the backing
host RAM.)

thanks
-- PMM



Re: [PATCH v3 13/18] target/arm: Update contiguous first-fault and no-fault loads

2020-04-27 Thread Richard Henderson
On 4/27/20 4:03 AM, Peter Maydell wrote:
> On Wed, 22 Apr 2020 at 05:33, Richard Henderson
>  wrote:
>>
>> With sve_cont_ldst_pages, the differences between first-fault and no-fault
>> are minimal, so unify the routines.  With cpu_probe_watchpoint, we are able
>> to make progress through pages with TLB_WATCHPOINT set when the watchpoint
>> does not actually fire.
>>
>> Signed-off-by: Richard Henderson 
> 
> 
>>  /*
>> - * Perform one normal read, which will fault or not.
>> - * But it is likely to bring the page into the tlb.
>> + * From this point on, all memory operations are MemSingleNF.
>> + *
>> + * Per the MemSingleNF pseudocode, a no-fault load from Device memory
>> + * must not actually hit the bus -- it returns (UNKNOWN, FAULT) instead.
>> + * If you map non-RAM with Normal memory attributes and do a NF
>> + * load then it should access the bus -- but doing so is illegal.
>> + *
>> + * While we do not have access to the memory attributes from the PTE
>> + * to tell Device memory from Normal memory, we can validly assume that
>> + * non-RAM has been mapped as Device memory.  Thus we indicate fault
>> + * on all MMIO.
> 
> I still don't understand why this is right. All non-RAM is MMIO
> but not all MMIO is non-RAM; so you might have something that's
> MMIO (at least for the moment) and has been mapped Normal. That
> shouldn't fault.

Everything that must go through the slow path has TLB_MMIO set.

What you're thinking of, romd, has TLB_MMIO set on the write comparator but not
the read comparator.

Have another browse through tlb_set_page_with_attrs, lines 859-893, and the
uses of is_romd.



r~



Re: [PATCH v3 13/18] target/arm: Update contiguous first-fault and no-fault loads

2020-04-27 Thread Peter Maydell
On Wed, 22 Apr 2020 at 05:33, Richard Henderson
 wrote:
>
> With sve_cont_ldst_pages, the differences between first-fault and no-fault
> are minimal, so unify the routines.  With cpu_probe_watchpoint, we are able
> to make progress through pages with TLB_WATCHPOINT set when the watchpoint
> does not actually fire.
>
> Signed-off-by: Richard Henderson 


>  /*
> - * Perform one normal read, which will fault or not.
> - * But it is likely to bring the page into the tlb.
> + * From this point on, all memory operations are MemSingleNF.
> + *
> + * Per the MemSingleNF pseudocode, a no-fault load from Device memory
> + * must not actually hit the bus -- it returns (UNKNOWN, FAULT) instead.
> + * If you map non-RAM with Normal memory attributes and do a NF
> + * load then it should access the bus -- but doing so is illegal.
> + *
> + * While we do not have access to the memory attributes from the PTE
> + * to tell Device memory from Normal memory, we can validly assume that
> + * non-RAM has been mapped as Device memory.  Thus we indicate fault
> + * on all MMIO.

I still don't understand why this is right. All non-RAM is MMIO
but not all MMIO is non-RAM; so you might have something that's
MMIO (at least for the moment) and has been mapped Normal. That
shouldn't fault.

> + *
> + * Similarly, CPU_BP breakpoints would raise exceptions, and so
> + * return (UNKNOWN, FAULT).  For simplicity, we consider gdb and
> + * architectural breakpoints the same.
>   */
> -tlb_fn(env, vd, reg_off, addr + mem_off, retaddr);
> +if (unlikely(flags & TLB_MMIO)) {
> +goto do_fault;
> +}

thanks
-- PMM



[PATCH v3 13/18] target/arm: Update contiguous first-fault and no-fault loads

2020-04-21 Thread Richard Henderson
With sve_cont_ldst_pages, the differences between first-fault and no-fault
are minimal, so unify the routines.  With cpu_probe_watchpoint, we are able
to make progress through pages with TLB_WATCHPOINT set when the watchpoint
does not actually fire.

Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 342 +++-
 1 file changed, 158 insertions(+), 184 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 9365e32646..3473b53928 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4101,18 +4101,6 @@ static intptr_t find_next_active(uint64_t *vg, intptr_t 
reg_off,
 return reg_off;
 }
 
-/*
- * Return the maximum offset <= @mem_max which is still within the page
- * referenced by @base + @mem_off.
- */
-static intptr_t max_for_page(target_ulong base, intptr_t mem_off,
- intptr_t mem_max)
-{
-target_ulong addr = base + mem_off;
-intptr_t split = -(intptr_t)(addr | TARGET_PAGE_MASK);
-return MIN(split, mem_max - mem_off) + mem_off;
-}
-
 /*
  * Resolve the guest virtual address to info->host and info->flags.
  * If @nofault, return false if the page is invalid, otherwise
@@ -4435,19 +4423,6 @@ static void sve_cont_ldst_watchpoints(SVEContLdSt *info, 
CPUARMState *env,
 #endif
 }
 
-/*
- * The result of tlb_vaddr_to_host for user-only is just g2h(x),
- * which is always non-null.  Elide the useless test.
- */
-static inline bool test_host_page(void *host)
-{
-#ifdef CONFIG_USER_ONLY
-return true;
-#else
-return likely(host != NULL);
-#endif
-}
-
 /*
  * Common helper for all contiguous 1,2,3,4-register predicated stores.
  */
@@ -4705,167 +4680,163 @@ static void record_fault(CPUARMState *env, uintptr_t 
i, uintptr_t oprsz)
 }
 
 /*
- * Common helper for all contiguous first-fault loads.
+ * Common helper for all contiguous no-fault and first-fault loads.
  */
-static void sve_ldff1_r(CPUARMState *env, void *vg, const target_ulong addr,
-uint32_t desc, const uintptr_t retaddr,
-const int esz, const int msz,
-sve_ldst1_host_fn *host_fn,
-sve_ldst1_tlb_fn *tlb_fn)
+static inline QEMU_ALWAYS_INLINE
+void sve_ldnfff1_r(CPUARMState *env, void *vg, const target_ulong addr,
+   uint32_t desc, const uintptr_t retaddr,
+   const int esz, const int msz, const SVEContFault fault,
+   sve_ldst1_host_fn *host_fn,
+   sve_ldst1_tlb_fn *tlb_fn)
 {
-const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
-const int mmu_idx = get_mmuidx(oi);
 const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
 void *vd = >vfp.zregs[rd];
-const int diffsz = esz - msz;
 const intptr_t reg_max = simd_oprsz(desc);
-const intptr_t mem_max = reg_max >> diffsz;
-intptr_t split, reg_off, mem_off, i;
+intptr_t reg_off, mem_off, reg_last;
+SVEContLdSt info;
+int flags;
 void *host;
 
-/* Skip to the first active element.  */
-reg_off = find_next_active(vg, 0, reg_max, esz);
-if (unlikely(reg_off == reg_max)) {
+/* Find the active elements.  */
+if (!sve_cont_ldst_elements(, addr, vg, reg_max, esz, 1 << msz)) {
 /* The entire predicate was false; no load occurs.  */
 memset(vd, 0, reg_max);
 return;
 }
-mem_off = reg_off >> diffsz;
+reg_off = info.reg_off_first[0];
 
-/*
- * If the (remaining) load is entirely within a single page, then:
- * For softmmu, and the tlb hits, then no faults will occur;
- * For user-only, either the first load will fault or none will.
- * We can thus perform the load directly to the destination and
- * Vd will be unmodified on any exception path.
- */
-split = max_for_page(addr, mem_off, mem_max);
-if (likely(split == mem_max)) {
-host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
-if (test_host_page(host)) {
-i = reg_off;
-host -= mem_off;
-do {
-host_fn(vd, i, host + (i >> diffsz));
-i = find_next_active(vg, i + (1 << esz), reg_max, esz);
-} while (i < reg_max);
-/* After any fault, zero any leading inactive elements.  */
+/* Probe the page(s). */
+if (!sve_cont_ldst_pages(, fault, env, addr, MMU_DATA_LOAD, retaddr)) 
{
+/* Fault on first element. */
+tcg_debug_assert(fault == FAULT_NO);
+memset(vd, 0, reg_max);
+goto do_fault;
+}
+
+mem_off = info.mem_off_first[0];
+flags = info.page[0].flags;
+
+if (fault == FAULT_FIRST) {
+/*
+ * Special handling of the first active element,
+ * if it crosses a page boundary or is MMIO.
+ */
+bool is_split = mem_off == info.mem_off_split;
+/* TODO: MTE check. */
+if