arm: Trap non-streaming usage when Streaming SVE is active

Peter Maydell Fri, 01 Jul 2022 04:12:07 -0700

On Tue, 28 Jun 2022 at 05:26, Richard Henderson
<richard.hender...@linaro.org> wrote:
>
> This new behaviour is in the ARM pseudocode function
> AArch64.CheckFPAdvSIMDEnabled, which applies to AArch32
> via AArch32.CheckAdvSIMDOrFPEnabled when the EL to which
> the trap would be delivered is in AArch64 mode.
>
> Given that ARMv9 drops support for AArch32 outside EL0, the trap EL
> detection ought to be trivially true, but the pseudocode still contains
> a number of conditions, and QEMU has not yet committed to dropping A32
> support for EL[12] when v9 features are present.
>
> Since the computation of SME_TRAP_NONSTREAMING is necessarily different
> for the two modes, we might as well preserve bits within TBFLAG_ANY and
> allocate separate bits within TBFLAG_A32 and TBFLAG_A64 instead.
>
> Signed-off-by: Richard Henderson <richard.hender...@linaro.org>


> +# These patterns are taken from Appendix E1.1 of DDI0616 A.a,
> +# Arm Architecture Reference Manual Supplement,
> +# The Scalable Matrix Extension (SME), for Armv9-A
> +
> +{
> +  [
> +    OK  0-00 1110 0000 0001 0010 11-- ---- ----   # SMOV W|Xd,Vn.B[0]
> +    OK  0-00 1110 0000 0010 0010 11-- ---- ----   # SMOV W|Xd,Vn.H[0]
> +    OK  0100 1110 0000 0100 0010 11-- ---- ----   # SMOV Xd,Vn.S[0]
> +    OK  0000 1110 0000 0001 0011 11-- ---- ----   # UMOV Wd,Vn.B[0]
> +    OK  0000 1110 0000 0010 0011 11-- ---- ----   # UMOV Wd,Vn.H[0]
> +    OK  0000 1110 0000 0100 0011 11-- ---- ----   # UMOV Wd,Vn.S[0]
> +    OK  0100 1110 0000 1000 0011 11-- ---- ----   # UMOV Xd,Vn.D[0]
> +  ]
> +  FAIL  0--0 111- ---- ---- ---- ---- ---- ----   # Advanced SIMD vector 
> operations
> +}
> +
> +{
> +  [
> +    OK  0101 1110 --1- ---- 11-1 11-- ---- ----   # FMULX/FRECPS/FRSQRTS 
> (scalar)
> +    OK  0101 1110 -10- ---- 00-1 11-- ---- ----   # FMULX/FRECPS/FRSQRTS 
> (scalar, FP16)
> +    OK  01-1 1110 1-10 0001 11-1 10-- ---- ----   # FRECPE/FRSQRTE/FRECPX 
> (scalar)
> +    OK  01-1 1110 1111 1001 11-1 10-- ---- ----   # FRECPE/FRSQRTE/FRECPX 
> (scalar, FP16)
> +  ]
> +  FAIL  01-1 111- ---- ---- ---- ---- ---- ----   # Advanced SIMD 
> single-element operations
> +}
> +
> +FAIL    0-00 110- ---- ---- ---- ---- ---- ----   # Advanced SIMD structure 
> load/store
> +FAIL    1100 1110 ---- ---- ---- ---- ---- ----   # Advanced SIMD 
> cryptography extensions
> +
> +# These are the "avoidance of doubt" final table of Illegal Advanced SIMD 
> instructions
> +# We don't actually need to include these, as the default is OK.
> +#       -001 111- ---- ---- ---- ---- ---- ----   # Scalar floating-point 
> operations
> +#       --10 110- ---- ---- ---- ---- ---- ----   # Load/store pair of FP 
> registers
> +#       --01 1100 ---- ---- ---- ---- ---- ----   # Load FP register 
> (PC-relative literal)
> +#       --11 1100 --0- ---- ---- ---- ---- ----   # Load/store FP register 
> (unscaled imm)
> +#       --11 1100 --1- ---- ---- ---- ---- --10   # Load/store FP register 
> (register offset)
> +#       --11 1101 ---- ---- ---- ---- ---- ----   # Load/store FP register 
> (scaled imm)

Don't we need a FAIL line for the "FJCVTZS should be illegal" case ?

> +FAIL    0000 0100 --1- ---- 1010 ---- ---- ----   # ADR
> +FAIL    0000 0100 --1- ---- 1011 -0-- ---- ----   # FTSSEL, FEXPA
> +FAIL    0000 0101 --10 0001 100- ---- ---- ----   # COMPACT
> +FAIL    0010 0101 --01 100- 1111 000- ---0 ----   # RDFFR, RDFFRS
> +FAIL    0010 0101 --10 1--- 1001 ---- ---- ----   # WRFFR, SETFFR
> +FAIL    0100 0101 --0- ---- 1011 ---- ---- ----   # BDEP, BEXT, BGRP
> +FAIL    0100 0101 000- ---- 0110 1--- ---- ----   # PMULLB, PMULLT (128b 
> result)
> +FAIL    0110 0100 --1- ---- 1110 01-- ---- ----   # FMMLA, BFMMLA
> +FAIL    0110 0101 --0- ---- 0000 11-- ---- ----   # FTSMUL
> +FAIL    0110 0101 --01 0--- 100- ---- ---- ----   # FTMAD
> +FAIL    0110 0101 --01 1--- 001- ---- ---- ----   # FADDA
> +FAIL    0100 0101 --0- ---- 1001 10-- ---- ----   # SMMLA, UMMLA, USMMLA
> +FAIL    0100 0101 --1- ---- 1--- ---- ---- ----   # SVE2 string/histo/crypto 
> instructions
> +FAIL    1000 010- -00- ---- 10-- ---- ---- ----   # SVE2 32-bit gather NT 
> load (vector+scalar)
> +FAIL    1000 010- -00- ---- 111- ---- ---- ----   # SVE 32-bit gather 
> prefetch (vector+imm)
> +FAIL    1000 0100 0-1- ---- 0--- ---- ---- ----   # SVE 32-bit gather 
> prefetch (scalar+vector)
> +FAIL    1000 010- -01- ---- 1--- ---- ---- ----   # SVE 32-bit gather load 
> (vector+imm)
> +FAIL    1000 0100 0-0- ---- 0--- ---- ---- ----   # SVE 32-bit gather load 
> byte (scalar+vector)
> +FAIL    1000 0100 1--- ---- 0--- ---- ---- ----   # SVE 32-bit gather load 
> half (scalar+vector)
> +FAIL    1000 0101 0--- ---- 0--- ---- ---- ----   # SVE 32-bit gather load 
> word (scalar+vector)
> +FAIL    1010 010- ---- ---- 011- ---- ---- ----   # SVE contiguous FF load 
> (scalar+scalar)
> +FAIL    1010 010- ---1 ---- 101- ---- ---- ----   # SVE contiguous NF load 
> (scalar+imm)
> +FAIL    1010 010- -10- ---- 000- ---- ---- ----   # SVE load & replicate 32 
> bytes (scalar+scalar)
> +FAIL    1010 010- -100 ---- 001- ---- ---- ----   # SVE load & replicate 32 
> bytes (scalar+imm)
> +FAIL    1100 010- ---- ---- ---- ---- ---- ----   # SVE 64-bit gather 
> load/prefetch
> +FAIL    1110 010- -00- ---- 001- ---- ---- ----   # SVE2 64-bit scatter NT 
> store (vector+scalar)
> +FAIL    1110 010- -10- ---- 001- ---- ---- ----   # SVE2 32-bit scatter NT 
> store (vector+scalar)
> +FAIL    1110 010- ---- ---- 1-0- ---- ---- ----   # SVE scatter store 
> (scalar+32-bit vector)
> +FAIL    1110 010- ---- ---- 101- ---- ---- ----   # SVE scatter store (misc)

> @@ -11312,6 +11338,21 @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState 
> *env, int fp_el,
>          DP_TBFLAG_ANY(flags, PSTATE__IL, 1);
>      }
>
> +    /*
> +     * The SME exception we are testing for is raised via
> +     * AArch64.CheckFPAdvSIMDEnabled(), and for AArch32 this is called
> +     * when EL1 is using A64 or EL2 using A64 and !TGE.
> +     * See AArch32.CheckAdvSIMDOrFPEnabled().
> +     */
> +    if (el == 0
> +        && FIELD_EX64(env->svcr, SVCR, SM)
> +        && (!arm_is_el2_enabled(env)
> +            || (arm_el_is_aa64(env, 2) && !(env->cp15.hcr_el2 & HCR_TGE)))
> +        && arm_el_is_aa64(env, 1)
> +        && !sme_fa64(env, el)) {

I can't get any of:
 * the logic in the comment
 * the logic in the C code
 * the logic in the the pseudocode AArch32.CheckAdvSIMDOrFPEnabled()
   which causes it to call AArch64.CheckFPEnabled()
to line up with each other.

The comment has:
 * (EL1 A64) OR (EL2 A64 && !TGE)
The pseudocode has:
 * (!TGE && EL1 A64) OR (TGE && EL2 A64 && EL1 A64)
   [seems odd that it is checking the width of EL1 in the TGE case
    but I haven't followed the logic through to find out why]
The C code here has:
 * (!TGE && EL2 A64 && EL1 A64)

What am I missing ?

thanks
-- PMM

Re: [PATCH v4 03/45] target/arm: Trap non-streaming usage when Streaming SVE is active

Reply via email to