Re: [PATCH for-5.1 00/31] target/arm: SVE2, part 1
Hi Richard, I find BF16 is included in the ISA. Will you extend the softfpu in this patch set? Zhiwei On 2020/3/27 7:08, Richard Henderson wrote: Posting this for early review. It's based on some other patch sets that I have posted recently that also touch SVE, listed below. But it might just be easier to clone the devel tree [2]. While the branch itself will rebase frequently for development, I've also created a tag, post-sve2-20200326, for this posting. This is mostly untested, as the most recently released Foundation Model does not support SVE2. Some of the new instructions overlap with old fashioned NEON, and I can verify that those have not broken, and show that SVE2 will use the same code path. But the predicated insns and bottom/top interleaved insns are not yet RISU testable, as I have nothing to compare against. The patches are in general arranged so that one complete group of insns are added at once. The groups within the manual [1] have so far been small-ish. r~ --- [1] ISA manual: https://static.docs.arm.com/ddi0602/d/ISA_A64_xml_futureA-2019-12_OPT.pdf [2] Devel tree: https://github.com/rth7680/qemu/tree/tgt-arm-sve-2 Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=163610 ("target/arm: sve load/store improvements") Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164500 ("target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA") Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164048 ("target/arm: Implement ARMv8.5-MemTag, system mode") Richard Henderson (31): target/arm: Add ID_AA64ZFR0 fields and isar_feature_aa64_sve2 target/arm: Implement SVE2 Integer Multiply - Unpredicated target/arm: Implement SVE2 integer pairwise add and accumulate long target/arm: Remove fp_status from helper_{recpe,rsqrte}_u32 target/arm: Implement SVE2 integer unary operations (predicated) target/arm: Split out saturating/rounding shifts from neon target/arm: Implement SVE2 saturating/rounding bitwise shift left (predicated) target/arm: Implement SVE2 integer halving add/subtract (predicated) target/arm: Implement SVE2 integer pairwise arithmetic target/arm: Implement SVE2 saturating add/subtract (predicated) target/arm: Implement SVE2 integer add/subtract long target/arm: Implement SVE2 integer add/subtract interleaved long target/arm: Implement SVE2 integer add/subtract wide target/arm: Implement SVE2 integer multiply long target/arm: Implement PMULLB and PMULLT target/arm: Tidy SVE tszimm shift formats target/arm: Implement SVE2 bitwise shift left long target/arm: Implement SVE2 bitwise exclusive-or interleaved target/arm: Implement SVE2 bitwise permute target/arm: Implement SVE2 complex integer add target/arm: Implement SVE2 integer absolute difference and accumulate long target/arm: Implement SVE2 integer add/subtract long with carry target/arm: Create arm_gen_gvec_[us]sra target/arm: Create arm_gen_gvec_{u,s}{rshr,rsra} target/arm: Implement SVE2 bitwise shift right and accumulate target/arm: Create arm_gen_gvec_{sri,sli} target/arm: Tidy handle_vec_simd_shri target/arm: Implement SVE2 bitwise shift and insert target/arm: Vectorize SABD/UABD target/arm: Vectorize SABA/UABA target/arm: Implement SVE2 integer absolute difference and accumulate target/arm/cpu.h | 31 ++ target/arm/helper-sve.h| 345 + target/arm/helper.h| 81 +++- target/arm/translate-a64.h | 9 + target/arm/translate.h | 24 +- target/arm/vec_internal.h | 161 target/arm/sve.decode | 217 ++- target/arm/helper.c| 3 +- target/arm/kvm64.c | 2 + target/arm/neon_helper.c | 515 - target/arm/sve_helper.c| 757 ++--- target/arm/translate-a64.c | 557 +++ target/arm/translate-sve.c | 557 +++ target/arm/translate.c | 626 ++ target/arm/vec_helper.c| 411 target/arm/vfp_helper.c| 4 +- 16 files changed, 3532 insertions(+), 768 deletions(-) create mode 100644 target/arm/vec_internal.h
Re: [PATCH for-5.1 00/31] target/arm: SVE2, part 1
On 4/21/20 7:51 PM, LIU Zhiwei wrote: > I find BF16 is included in the ISA. Will you extend the softfpu in this > patch > set? I will do that eventually, but probably not part of the first full SVE2 patch set. There are several optional extensions to SVE2, of which BF16 is one. But BF16 also requires changes to the normal FPU as well, and Arm requires SVE and FPU be in sync. r~
RE: [PATCH for-5.1 00/31] target/arm: SVE2, part 1
Hello Richard, I want to introduce you to Stephen Long. He is our new hire who started this week. I want to know if you are available for a sync-up meeting to discuss how we can cooperate with qemu sve2 support. Thank you, Ana. -Original Message- From: Richard Henderson Sent: Thursday, March 26, 2020 4:08 PM To: qemu-devel@nongnu.org Cc: qemu-...@nongnu.org; Ana Pazos ; Raja Venkateswaran Subject: [PATCH for-5.1 00/31] target/arm: SVE2, part 1 - CAUTION: This email originated from outside of the organization. - Posting this for early review. It's based on some other patch sets that I have posted recently that also touch SVE, listed below. But it might just be easier to clone the devel tree [2]. While the branch itself will rebase frequently for development, I've also created a tag, post-sve2-20200326, for this posting. This is mostly untested, as the most recently released Foundation Model does not support SVE2. Some of the new instructions overlap with old fashioned NEON, and I can verify that those have not broken, and show that SVE2 will use the same code path. But the predicated insns and bottom/top interleaved insns are not yet RISU testable, as I have nothing to compare against. The patches are in general arranged so that one complete group of insns are added at once. The groups within the manual [1] have so far been small-ish. r~ --- [1] ISA manual: https://static.docs.arm.com/ddi0602/d/ISA_A64_xml_futureA-2019-12_OPT.pdf [2] Devel tree: https://github.com/rth7680/qemu/tree/tgt-arm-sve-2 Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=163610 ("target/arm: sve load/store improvements") Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164500 ("target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA") Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164048 ("target/arm: Implement ARMv8.5-MemTag, system mode") Richard Henderson (31): target/arm: Add ID_AA64ZFR0 fields and isar_feature_aa64_sve2 target/arm: Implement SVE2 Integer Multiply - Unpredicated target/arm: Implement SVE2 integer pairwise add and accumulate long target/arm: Remove fp_status from helper_{recpe,rsqrte}_u32 target/arm: Implement SVE2 integer unary operations (predicated) target/arm: Split out saturating/rounding shifts from neon target/arm: Implement SVE2 saturating/rounding bitwise shift left (predicated) target/arm: Implement SVE2 integer halving add/subtract (predicated) target/arm: Implement SVE2 integer pairwise arithmetic target/arm: Implement SVE2 saturating add/subtract (predicated) target/arm: Implement SVE2 integer add/subtract long target/arm: Implement SVE2 integer add/subtract interleaved long target/arm: Implement SVE2 integer add/subtract wide target/arm: Implement SVE2 integer multiply long target/arm: Implement PMULLB and PMULLT target/arm: Tidy SVE tszimm shift formats target/arm: Implement SVE2 bitwise shift left long target/arm: Implement SVE2 bitwise exclusive-or interleaved target/arm: Implement SVE2 bitwise permute target/arm: Implement SVE2 complex integer add target/arm: Implement SVE2 integer absolute difference and accumulate long target/arm: Implement SVE2 integer add/subtract long with carry target/arm: Create arm_gen_gvec_[us]sra target/arm: Create arm_gen_gvec_{u,s}{rshr,rsra} target/arm: Implement SVE2 bitwise shift right and accumulate target/arm: Create arm_gen_gvec_{sri,sli} target/arm: Tidy handle_vec_simd_shri target/arm: Implement SVE2 bitwise shift and insert target/arm: Vectorize SABD/UABD target/arm: Vectorize SABA/UABA target/arm: Implement SVE2 integer absolute difference and accumulate target/arm/cpu.h | 31 ++ target/arm/helper-sve.h| 345 + target/arm/helper.h| 81 +++- target/arm/translate-a64.h | 9 + target/arm/translate.h | 24 +- target/arm/vec_internal.h | 161 target/arm/sve.decode | 217 ++- target/arm/helper.c| 3 +- target/arm/kvm64.c | 2 + target/arm/neon_helper.c | 515 - target/arm/sve_helper.c| 757 ++--- target/arm/translate-a64.c | 557 +++ target/arm/translate-sve.c | 557 +++ target/arm/translate.c | 626 ++ target/arm/vec_helper.c| 411 target/arm/vfp_helper.c| 4 +- 16 files changed, 3532 insertions(+), 768 deletions(-) create mode 100644 target/arm/vec_internal.h -- 2.20.1
[PATCH for-5.1 00/31] target/arm: SVE2, part 1
Posting this for early review. It's based on some other patch sets that I have posted recently that also touch SVE, listed below. But it might just be easier to clone the devel tree [2]. While the branch itself will rebase frequently for development, I've also created a tag, post-sve2-20200326, for this posting. This is mostly untested, as the most recently released Foundation Model does not support SVE2. Some of the new instructions overlap with old fashioned NEON, and I can verify that those have not broken, and show that SVE2 will use the same code path. But the predicated insns and bottom/top interleaved insns are not yet RISU testable, as I have nothing to compare against. The patches are in general arranged so that one complete group of insns are added at once. The groups within the manual [1] have so far been small-ish. r~ --- [1] ISA manual: https://static.docs.arm.com/ddi0602/d/ISA_A64_xml_futureA-2019-12_OPT.pdf [2] Devel tree: https://github.com/rth7680/qemu/tree/tgt-arm-sve-2 Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=163610 ("target/arm: sve load/store improvements") Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164500 ("target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA") Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164048 ("target/arm: Implement ARMv8.5-MemTag, system mode") Richard Henderson (31): target/arm: Add ID_AA64ZFR0 fields and isar_feature_aa64_sve2 target/arm: Implement SVE2 Integer Multiply - Unpredicated target/arm: Implement SVE2 integer pairwise add and accumulate long target/arm: Remove fp_status from helper_{recpe,rsqrte}_u32 target/arm: Implement SVE2 integer unary operations (predicated) target/arm: Split out saturating/rounding shifts from neon target/arm: Implement SVE2 saturating/rounding bitwise shift left (predicated) target/arm: Implement SVE2 integer halving add/subtract (predicated) target/arm: Implement SVE2 integer pairwise arithmetic target/arm: Implement SVE2 saturating add/subtract (predicated) target/arm: Implement SVE2 integer add/subtract long target/arm: Implement SVE2 integer add/subtract interleaved long target/arm: Implement SVE2 integer add/subtract wide target/arm: Implement SVE2 integer multiply long target/arm: Implement PMULLB and PMULLT target/arm: Tidy SVE tszimm shift formats target/arm: Implement SVE2 bitwise shift left long target/arm: Implement SVE2 bitwise exclusive-or interleaved target/arm: Implement SVE2 bitwise permute target/arm: Implement SVE2 complex integer add target/arm: Implement SVE2 integer absolute difference and accumulate long target/arm: Implement SVE2 integer add/subtract long with carry target/arm: Create arm_gen_gvec_[us]sra target/arm: Create arm_gen_gvec_{u,s}{rshr,rsra} target/arm: Implement SVE2 bitwise shift right and accumulate target/arm: Create arm_gen_gvec_{sri,sli} target/arm: Tidy handle_vec_simd_shri target/arm: Implement SVE2 bitwise shift and insert target/arm: Vectorize SABD/UABD target/arm: Vectorize SABA/UABA target/arm: Implement SVE2 integer absolute difference and accumulate target/arm/cpu.h | 31 ++ target/arm/helper-sve.h| 345 + target/arm/helper.h| 81 +++- target/arm/translate-a64.h | 9 + target/arm/translate.h | 24 +- target/arm/vec_internal.h | 161 target/arm/sve.decode | 217 ++- target/arm/helper.c| 3 +- target/arm/kvm64.c | 2 + target/arm/neon_helper.c | 515 - target/arm/sve_helper.c| 757 ++--- target/arm/translate-a64.c | 557 +++ target/arm/translate-sve.c | 557 +++ target/arm/translate.c | 626 ++ target/arm/vec_helper.c| 411 target/arm/vfp_helper.c| 4 +- 16 files changed, 3532 insertions(+), 768 deletions(-) create mode 100644 target/arm/vec_internal.h -- 2.20.1