On Tue, 28 Jun 2022 at 05:26, Richard Henderson <richard.hender...@linaro.org> wrote: > > This new behaviour is in the ARM pseudocode function > AArch64.CheckFPAdvSIMDEnabled, which applies to AArch32 > via AArch32.CheckAdvSIMDOrFPEnabled when the EL to which > the trap would be delivered is in AArch64 mode. > > Given that ARMv9 drops support for AArch32 outside EL0, the trap EL > detection ought to be trivially true, but the pseudocode still contains > a number of conditions, and QEMU has not yet committed to dropping A32 > support for EL[12] when v9 features are present. > > Since the computation of SME_TRAP_NONSTREAMING is necessarily different > for the two modes, we might as well preserve bits within TBFLAG_ANY and > allocate separate bits within TBFLAG_A32 and TBFLAG_A64 instead. > > Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
> +# These patterns are taken from Appendix E1.1 of DDI0616 A.a, > +# Arm Architecture Reference Manual Supplement, > +# The Scalable Matrix Extension (SME), for Armv9-A > + > +{ > + [ > + OK 0-00 1110 0000 0001 0010 11-- ---- ---- # SMOV W|Xd,Vn.B[0] > + OK 0-00 1110 0000 0010 0010 11-- ---- ---- # SMOV W|Xd,Vn.H[0] > + OK 0100 1110 0000 0100 0010 11-- ---- ---- # SMOV Xd,Vn.S[0] > + OK 0000 1110 0000 0001 0011 11-- ---- ---- # UMOV Wd,Vn.B[0] > + OK 0000 1110 0000 0010 0011 11-- ---- ---- # UMOV Wd,Vn.H[0] > + OK 0000 1110 0000 0100 0011 11-- ---- ---- # UMOV Wd,Vn.S[0] > + OK 0100 1110 0000 1000 0011 11-- ---- ---- # UMOV Xd,Vn.D[0] > + ] > + FAIL 0--0 111- ---- ---- ---- ---- ---- ---- # Advanced SIMD vector > operations > +} > + > +{ > + [ > + OK 0101 1110 --1- ---- 11-1 11-- ---- ---- # FMULX/FRECPS/FRSQRTS > (scalar) > + OK 0101 1110 -10- ---- 00-1 11-- ---- ---- # FMULX/FRECPS/FRSQRTS > (scalar, FP16) > + OK 01-1 1110 1-10 0001 11-1 10-- ---- ---- # FRECPE/FRSQRTE/FRECPX > (scalar) > + OK 01-1 1110 1111 1001 11-1 10-- ---- ---- # FRECPE/FRSQRTE/FRECPX > (scalar, FP16) > + ] > + FAIL 01-1 111- ---- ---- ---- ---- ---- ---- # Advanced SIMD > single-element operations > +} > + > +FAIL 0-00 110- ---- ---- ---- ---- ---- ---- # Advanced SIMD structure > load/store > +FAIL 1100 1110 ---- ---- ---- ---- ---- ---- # Advanced SIMD > cryptography extensions > + > +# These are the "avoidance of doubt" final table of Illegal Advanced SIMD > instructions > +# We don't actually need to include these, as the default is OK. > +# -001 111- ---- ---- ---- ---- ---- ---- # Scalar floating-point > operations > +# --10 110- ---- ---- ---- ---- ---- ---- # Load/store pair of FP > registers > +# --01 1100 ---- ---- ---- ---- ---- ---- # Load FP register > (PC-relative literal) > +# --11 1100 --0- ---- ---- ---- ---- ---- # Load/store FP register > (unscaled imm) > +# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register > (register offset) > +# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register > (scaled imm) Don't we need a FAIL line for the "FJCVTZS should be illegal" case ? > +FAIL 0000 0100 --1- ---- 1010 ---- ---- ---- # ADR > +FAIL 0000 0100 --1- ---- 1011 -0-- ---- ---- # FTSSEL, FEXPA > +FAIL 0000 0101 --10 0001 100- ---- ---- ---- # COMPACT > +FAIL 0010 0101 --01 100- 1111 000- ---0 ---- # RDFFR, RDFFRS > +FAIL 0010 0101 --10 1--- 1001 ---- ---- ---- # WRFFR, SETFFR > +FAIL 0100 0101 --0- ---- 1011 ---- ---- ---- # BDEP, BEXT, BGRP > +FAIL 0100 0101 000- ---- 0110 1--- ---- ---- # PMULLB, PMULLT (128b > result) > +FAIL 0110 0100 --1- ---- 1110 01-- ---- ---- # FMMLA, BFMMLA > +FAIL 0110 0101 --0- ---- 0000 11-- ---- ---- # FTSMUL > +FAIL 0110 0101 --01 0--- 100- ---- ---- ---- # FTMAD > +FAIL 0110 0101 --01 1--- 001- ---- ---- ---- # FADDA > +FAIL 0100 0101 --0- ---- 1001 10-- ---- ---- # SMMLA, UMMLA, USMMLA > +FAIL 0100 0101 --1- ---- 1--- ---- ---- ---- # SVE2 string/histo/crypto > instructions > +FAIL 1000 010- -00- ---- 10-- ---- ---- ---- # SVE2 32-bit gather NT > load (vector+scalar) > +FAIL 1000 010- -00- ---- 111- ---- ---- ---- # SVE 32-bit gather > prefetch (vector+imm) > +FAIL 1000 0100 0-1- ---- 0--- ---- ---- ---- # SVE 32-bit gather > prefetch (scalar+vector) > +FAIL 1000 010- -01- ---- 1--- ---- ---- ---- # SVE 32-bit gather load > (vector+imm) > +FAIL 1000 0100 0-0- ---- 0--- ---- ---- ---- # SVE 32-bit gather load > byte (scalar+vector) > +FAIL 1000 0100 1--- ---- 0--- ---- ---- ---- # SVE 32-bit gather load > half (scalar+vector) > +FAIL 1000 0101 0--- ---- 0--- ---- ---- ---- # SVE 32-bit gather load > word (scalar+vector) > +FAIL 1010 010- ---- ---- 011- ---- ---- ---- # SVE contiguous FF load > (scalar+scalar) > +FAIL 1010 010- ---1 ---- 101- ---- ---- ---- # SVE contiguous NF load > (scalar+imm) > +FAIL 1010 010- -10- ---- 000- ---- ---- ---- # SVE load & replicate 32 > bytes (scalar+scalar) > +FAIL 1010 010- -100 ---- 001- ---- ---- ---- # SVE load & replicate 32 > bytes (scalar+imm) > +FAIL 1100 010- ---- ---- ---- ---- ---- ---- # SVE 64-bit gather > load/prefetch > +FAIL 1110 010- -00- ---- 001- ---- ---- ---- # SVE2 64-bit scatter NT > store (vector+scalar) > +FAIL 1110 010- -10- ---- 001- ---- ---- ---- # SVE2 32-bit scatter NT > store (vector+scalar) > +FAIL 1110 010- ---- ---- 1-0- ---- ---- ---- # SVE scatter store > (scalar+32-bit vector) > +FAIL 1110 010- ---- ---- 101- ---- ---- ---- # SVE scatter store (misc) > @@ -11312,6 +11338,21 @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState > *env, int fp_el, > DP_TBFLAG_ANY(flags, PSTATE__IL, 1); > } > > + /* > + * The SME exception we are testing for is raised via > + * AArch64.CheckFPAdvSIMDEnabled(), and for AArch32 this is called > + * when EL1 is using A64 or EL2 using A64 and !TGE. > + * See AArch32.CheckAdvSIMDOrFPEnabled(). > + */ > + if (el == 0 > + && FIELD_EX64(env->svcr, SVCR, SM) > + && (!arm_is_el2_enabled(env) > + || (arm_el_is_aa64(env, 2) && !(env->cp15.hcr_el2 & HCR_TGE))) > + && arm_el_is_aa64(env, 1) > + && !sme_fa64(env, el)) { I can't get any of: * the logic in the comment * the logic in the C code * the logic in the the pseudocode AArch32.CheckAdvSIMDOrFPEnabled() which causes it to call AArch64.CheckFPEnabled() to line up with each other. The comment has: * (EL1 A64) OR (EL2 A64 && !TGE) The pseudocode has: * (!TGE && EL1 A64) OR (TGE && EL2 A64 && EL1 A64) [seems odd that it is checking the width of EL1 in the TGE case but I haven't followed the logic through to find out why] The C code here has: * (!TGE && EL2 A64 && EL1 A64) What am I missing ? thanks -- PMM