[PATCH] softmmu/vl: Do not recommend to use -M accel=... anymore
The new -accel parameter can be used multiple times now, so we should recommend this new way instead. Signed-off-by: Thomas Huth --- softmmu/vl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/softmmu/vl.c b/softmmu/vl.c index f669c06ede..e2b2991a5f 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -3487,7 +3487,7 @@ void qemu_init(int argc, char **argv, char **envp) } if (optarg && strchr(optarg, ':')) { error_report("Don't use ':' with -accel, " - "use -M accel=... for now instead"); + "use multiple -accel=... options instead"); exit(1); } break; -- 2.18.1
Re: [PATCH v2 000/100] target/arm: Implement SVE2
Patchew URL: https://patchew.org/QEMU/20200618042644.1685561-1-richard.hender...@linaro.org/ Hi, This series failed the asan build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash export ARCH=x86_64 make docker-image-fedora V=1 NETWORK=1 time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1 === TEST SCRIPT END === GEN docs/interop/qemu-qmp-ref.txt GEN docs/interop/qemu-qmp-ref.7 CC qga/commands.o /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) CC qga/guest-agent-command-state.o CC qga/main.o CC qga/commands-posix.o --- AR libvhost-user.a GEN docs/interop/qemu-ga-ref.html GEN docs/interop/qemu-ga-ref.txt /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) GEN docs/interop/qemu-ga-ref.7 LINKqemu-keymap AS pc-bios/optionrom/multiboot.o --- AS pc-bios/optionrom/linuxboot.o CC pc-bios/optionrom/pvh_main.o AS pc-bios/optionrom/pvh.o /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) LINKivshmem-client BUILD pc-bios/optionrom/linuxboot.img BUILD pc-bios/optionrom/multiboot.img --- BUILD pc-bios/optionrom/pvh.img BUILD pc-bios/optionrom/linuxboot_dma.img BUILD pc-bios/optionrom/pvh.raw /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) SIGNpc-bios/optionrom/pvh.bin BUILD pc-bios/optionrom/linuxboot_dma.raw SIGNpc-bios/optionrom/linuxboot_dma.bin LINKqemu-nbd /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) LINKqemu-storage-daemon /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) LINKqemu-img /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) LINKqemu-io /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) LINKqemu-edid LINKfsdev/virtfs-proxy-helper /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) LINKscsi/qemu-pr-helper /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) LINKqemu-bridge-helper /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o) LINKvirtiofsd LINKvhost-user-input LINKqemu-ga /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork'
Re: [PATCH 1/7] qemu-common: Briefly document qemu_timedate_diff() unit
Philippe Mathieu-Daudé writes: > It is not obvious that the qemu_timedate_diff() and > qemu_ref_timedate() functions return seconds. Briefly > document it. > > Signed-off-by: Philippe Mathieu-Daudé > --- > include/qemu-common.h | 1 + > softmmu/vl.c | 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/qemu-common.h b/include/qemu-common.h > index d0142f29ac..e97644710c 100644 > --- a/include/qemu-common.h > +++ b/include/qemu-common.h > @@ -27,6 +27,7 @@ int qemu_main(int argc, char **argv, char **envp); > #endif > > void qemu_get_timedate(struct tm *tm, int offset); > +/* Returns difference with RTC reference time (in seconds) */ > int qemu_timedate_diff(struct tm *tm); Not this patch's problem: use of int here smells; is it wide enough? > > void *qemu_oom_check(void *ptr); > diff --git a/softmmu/vl.c b/softmmu/vl.c > index f669c06ede..215459c7b5 100644 > --- a/softmmu/vl.c > +++ b/softmmu/vl.c > @@ -737,7 +737,7 @@ void qemu_system_vmstop_request(RunState state) > } > > /***/ > -/* RTC reference time/date access */ > +/* RTC reference time/date access (in seconds) */ > static time_t qemu_ref_timedate(QEMUClockType clock) > { > time_t value = qemu_clock_get_ms(clock) / 1000; time_t is seconds on all systems we support. Using it for something other than seconds would be wrong. The comment feels redundant to me. But if it helps someone else...
Memory leak in transfer_memory_block()?
We appear to leak an Error object when ga_read_sysfs_file() fails with errno != ENOENT unless caller passes true @sys2memblk: static void transfer_memory_block(GuestMemoryBlock *mem_blk, bool sys2memblk, GuestMemoryBlockResponse *result, Error **errp) { [...] if (local_err) { We have an Error object. /* treat with sysfs file that not exist in old kernel */ if (errno == ENOENT) { Case 1: ENOENT; we free it. Good. error_free(local_err); if (sys2memblk) { mem_blk->online = true; mem_blk->can_offline = false; } else if (!mem_blk->online) { result->response = GUEST_MEMORY_BLOCK_RESPONSE_TYPE_OPERATION_NOT_SUPPORTED; } } else { Case 2: other than ENOENT if (sys2memblk) { Case 2a: sys2memblk; we pass it to the caller. Good. error_propagate(errp, local_err); } else { Case 2b: !sys2memblk; ??? result->response = GUEST_MEMORY_BLOCK_RESPONSE_TYPE_OPERATION_FAILED; } } goto out2; } [...] out2: g_free(status); close(dirfd); out1: if (!sys2memblk) { result->has_error_code = true; result->error_code = errno; } } What is supposed to be done with @local_err in case 2b?
qemu-pr-helper -v suppresses errors, isn't that weird?
prh_co_entry() reports reports errors reading requests / writing responses only when @verbose (command line -v); relevant code appended for you convenience. Sure these are *errors*? The program recovers and continues, and this is deemed normal enough to inform the user only when he specifically asks for it. Yet when we inform, we format it as an error. Should we tune it down to warnings? static void coroutine_fn prh_co_entry(void *opaque) { [...] while (atomic_read() == RUNNING) { [...] sz = prh_read_request(client, , , _err); if (sz < 0) { break; } [...] if (prh_write_response(client, , , _err) < 0) { break; } } if (local_err) { if (verbose == 0) { error_free(local_err); } else { error_report_err(local_err); } } out: qio_channel_detach_aio_context(QIO_CHANNEL(client->ioc)); object_unref(OBJECT(client->ioc)); g_free(client); }
Re: [PATCH v2 000/100] target/arm: Implement SVE2
Patchew URL: https://patchew.org/QEMU/20200618042644.1685561-1-richard.hender...@linaro.org/ Hi, This series seems to have some coding style problems. See output below for more information: Subject: [PATCH v2 000/100] target/arm: Implement SVE2 Type: series Message-id: 20200618042644.1685561-1-richard.hender...@linaro.org === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu - [tag update] patchew/20200615201757.16868-1-apera...@pp1.inet.fi -> patchew/20200615201757.16868-1-apera...@pp1.inet.fi * [new tag] patchew/20200618042644.1685561-1-richard.hender...@linaro.org -> patchew/20200618042644.1685561-1-richard.hender...@linaro.org Switched to a new branch 'test' 2e724d0 target/arm: Implement SVE2 fp multiply-add long 1a69193 target/arm: Implement SVE2 bitwise shift immediate b59d6c5 target/arm: Implement 128-bit ZIP, UZP, TRN e1dfe3b target/arm: Implement SVE2 LD1RO 68bee80 target/arm: Share table of sve load functions f10622f tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem e59188f target/arm: Implement SVE2 FLOGB 361296a softfloat: Add float16_is_normal 9096b1b target/arm: Implement SVE2 FCVTXNT, FCVTX 93b0698 target/arm: Implement SVE2 FCVTLT 72737be target/arm: Implement SVE2 FCVTNT 1864b9a target/arm: Implement SVE2 TBL, TBX d115f1c target/arm: Implement SVE2 crypto constructive binary operations 083365c target/arm: Implement SVE2 crypto destructive binary operations 8f530d8 target/arm: Implement SVE2 crypto unary operations de3f6f9 target/arm: Implement SVE mixed sign dot product 9ba387a target/arm: Implement SVE mixed sign dot product (indexed) eb4c273 target/arm: Implement SVE2 complex integer multiply-add (indexed) beb1f42 target/arm: Implement SVE2 multiply-add long (indexed) 341d92e target/arm: Implement SVE2 saturating multiply high (indexed) c7bf238 target/arm: Use helper_neon_sq{, r}dmul_* for aa64 advsimd 9b878ec target/arm: Implement SVE2 signed saturating doubling multiply high 734d63d target/arm: Implement SVE2 saturating multiply (indexed) acdd6dd target/arm: Implement SVE2 integer multiply long (indexed) 2c45834 target/arm: Implement SVE2 saturating multiply-add (indexed) 3e13e5d target/arm: Implement SVE2 saturating multiply-add high (indexed) 44d1a6c target/arm: Use helper_gvec_ml{a, s}_idx_* for aa64 advsimd 958b462 target/arm: Implement SVE2 integer multiply-add (indexed) 7ec62cb target/arm: Use helper_gvec_mul_idx_* for aa64 advsimd f545153 target/arm: Implement SVE2 integer multiply (indexed) 254440f target/arm: Split out formats for 3 vectors + 1 index 61ec4e7 target/arm: Split out formats for 2 vectors + 1 index 86dfbf2 target/arm: Pass separate addend to FCMLA helpers 9808f01 target/arm: Pass separate addend to {U, S}DOT helpers 889fb4f target/arm: Fix sve_punpk_p vs odd vector lengths 0cadee1 target/arm: Fix sve_zip_p vs odd vector lengths c862133 target/arm: Fix sve_uzp_p vs odd vector lengths 30b5a7e target/arm: Implement SVE2 SPLICE, EXT ca2ba7d target/arm: Implement SVE2 FMMLA 6a2a30a target/arm: Implement SVE2 gather load insns 46b60a4 target/arm: Implement SVE2 scatter store insns f7879c3 target/arm: Implement SVE2 XAR e6de649 target/arm: Implement SVE2 HISTCNT, HISTSEG 12f07a3 target/arm: Implement SVE2 RSUBHNB, RSUBHNT 42be129 target/arm: Implement SVE2 SUBHNB, SUBHNT ca448c1 target/arm: Implement SVE2 RADDHNB, RADDHNT 9f13b4a target/arm: Implement SVE2 ADDHNB, ADDHNT 81f6bd0 target/arm: Implement SVE2 complex integer multiply-add 2db782a target/arm: Implement SVE2 integer multiply-add long 5ff35ee target/arm: Implement SVE2 saturating multiply-add high 7b01deb target/arm: Generalize inl_qrdmlah_* helper functions 56fe27c target/arm: Implement SVE2 saturating multiply-add long bfc3b4a target/arm: Implement SVE2 MATCH, NMATCH 2f44fd9 target/arm: Implement SVE2 bitwise ternary operations 53ae038 target/arm: Implement SVE2 WHILERW, WHILEWR af2cda9 target/arm: Implement SVE2 WHILEGT, WHILEGE, WHILEHI, WHILEHS b5976bb target/arm: Implement SVE2 SQSHRN, SQRSHRN 69a6d78 target/arm: Implement SVE2 UQSHRN, UQRSHRN dca2699 target/arm: Implement SVE2 SQSHRUN, SQRSHRUN 7ac87e5 target/arm: Implement SVE2 SHRN, RSHRN 2b2f5aa target/arm: Implement SVE2 floating-point pairwise c3a5eb0 target/arm: Implement SVE2 saturating extract narrow 2bd0506 target/arm: Implement SVE2 integer absolute difference and accumulate 5aca0f8 target/arm: Implement SVE2 bitwise shift and insert a672202 target/arm: Implement SVE2 bitwise shift right and accumulate 8eded2d target/arm: Implement SVE2 integer add/subtract long with carry 07ef8da target/arm: Implement SVE2 integer absolute difference and accumulate long 3ffc1f3 target/arm: Implement SVE2 complex integer add 6659680
[Bug 1884017] Re: Intermittently erratic mouse under Windows 95
Weirdly, this problem doesn't occur when running qemu on macOS (10.15.5). It only happens on my PC running openSUSE Tumbleweed. However, even on that PC, it only affects Windows 95, and not Windows 98, or other operating systems. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1884017 Title: Intermittently erratic mouse under Windows 95 Status in QEMU: New Bug description: The mouse works fine maybe 75-80% of the time, but intermittently (every 20-30 seconds or so), moving the mouse will cause the pointer to fly around the screen at high speed, usually colliding with the edges, and much more problematically, click all the mouse buttons at random, even if you are not clicking. This causes random objects on the screen to be clicked and dragged around, rendering the system generally unusable. I don't know if this is related to #1785485 - it happens even if you never use the scroll wheel. qemu version: 5.0.0 (openSUSE Tumbleweed) Launch command line: qemu-system-i386 -hda win95.qcow2 -cpu pentium2 -m 16 -vga cirrus -soundhw sb16 -nic user,model=pcnet -rtc base=localtime OS version: Windows 95 4.00.950 C I have made the disk image available here: https://home.gloveraoki.me/share/win95.qcow2.lz Setup notes: In order to make Windows 95 detect the system devices correctly, after first install you must change the driver for "Plug and Play BIOS" to "PCI bus". I have already done this in the above image. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1884017/+subscriptions
Re: [PATCH v2 2/5] hw/riscv: sifive: Change SiFive E/U CPU reset vector to 0x1004
Hi Alistair, On Thu, Jun 18, 2020 at 8:41 AM Bin Meng wrote: > > Hi Alistair, > > On Thu, Jun 18, 2020 at 12:40 AM Alistair Francis > wrote: > > > > On Mon, Jun 15, 2020 at 5:51 PM Bin Meng wrote: > > > > > > From: Bin Meng > > > > > > Per the SiFive manual, all E/U series CPU cores' reset vector is > > > at 0x1004. Update our codes to match the hardware. > > > > > > Signed-off-by: Bin Meng > > > > This commit breaks my Oreboot test. > > > > Oreboot starts in flash and we run the command with the > > `sifive_u,start-in-flash=true` machine. > > Could you please post an Oreboot binary for testing somewhere, or some > instructions so that I can test this? > I have figured out where the issue is. The issue is inside the Oreboot codes that its QEMU detecting logic should be updated to match this change. I've sent pull request to Oreboot to fix this: https://github.com/oreboot/oreboot/pull/264 > > > > I have removed this and the later patches from the RISC-V branch. I > > want to send a PR today. After that I'll look into this. > I don't think we should drop this patch and later ones in this series. Regards, Bin
[PATCH v2 096/100] target/arm: Share table of sve load functions
The table used by do_ldrq is a subset of the table used by do_ld_zpa; we can share them by passing dtype instead of msz to do_ldrq. Signed-off-by: Richard Henderson --- target/arm/translate-sve.c | 120 ++--- 1 file changed, 58 insertions(+), 62 deletions(-) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index f3b2463b7c..6bdff5ceca 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5190,61 +5190,63 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, tcg_temp_free_i32(t_desc); } +/* Indexed by [be][dtype][nreg] */ +static gen_helper_gvec_mem * const ldr_fns[2][16][4] = { +/* Little-endian */ +{ { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, +gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r }, + { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL }, + + { gen_helper_sve_ld1sds_le_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1hh_le_r, gen_helper_sve_ld2hh_le_r, +gen_helper_sve_ld3hh_le_r, gen_helper_sve_ld4hh_le_r }, + { gen_helper_sve_ld1hsu_le_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1hdu_le_r, NULL, NULL, NULL }, + + { gen_helper_sve_ld1hds_le_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1hss_le_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld2ss_le_r, +gen_helper_sve_ld3ss_le_r, gen_helper_sve_ld4ss_le_r }, + { gen_helper_sve_ld1sdu_le_r, NULL, NULL, NULL }, + + { gen_helper_sve_ld1bds_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bss_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1dd_le_r, gen_helper_sve_ld2dd_le_r, +gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r } }, + +/* Big-endian */ +{ { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, +gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r }, + { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL }, + + { gen_helper_sve_ld1sds_be_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1hh_be_r, gen_helper_sve_ld2hh_be_r, +gen_helper_sve_ld3hh_be_r, gen_helper_sve_ld4hh_be_r }, + { gen_helper_sve_ld1hsu_be_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1hdu_be_r, NULL, NULL, NULL }, + + { gen_helper_sve_ld1hds_be_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1hss_be_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld2ss_be_r, +gen_helper_sve_ld3ss_be_r, gen_helper_sve_ld4ss_be_r }, + { gen_helper_sve_ld1sdu_be_r, NULL, NULL, NULL }, + + { gen_helper_sve_ld1bds_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bss_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1dd_be_r, gen_helper_sve_ld2dd_be_r, +gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r } } +}; + static void do_ld_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype, int nreg) { -static gen_helper_gvec_mem * const fns[2][16][4] = { -/* Little-endian */ -{ { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, -gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r }, - { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL }, - - { gen_helper_sve_ld1sds_le_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1hh_le_r, gen_helper_sve_ld2hh_le_r, -gen_helper_sve_ld3hh_le_r, gen_helper_sve_ld4hh_le_r }, - { gen_helper_sve_ld1hsu_le_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1hdu_le_r, NULL, NULL, NULL }, - - { gen_helper_sve_ld1hds_le_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1hss_le_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld2ss_le_r, -gen_helper_sve_ld3ss_le_r, gen_helper_sve_ld4ss_le_r }, - { gen_helper_sve_ld1sdu_le_r, NULL, NULL, NULL }, - - { gen_helper_sve_ld1bds_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1bss_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1dd_le_r, gen_helper_sve_ld2dd_le_r, -gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r } }, - -/* Big-endian */ -{ { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, -gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r }, - { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL }, - - { gen_helper_sve_ld1sds_be_r, NULL, NULL, NULL }, - { gen_helper_sve_ld1hh_be_r,
[PATCH v2 095/100] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
We already support duplication of 128-bit blocks. This extends that support to 256-bit blocks. This will be needed by SVE2. Signed-off-by: Richard Henderson --- tcg/tcg-op-gvec.c | 52 --- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 3707c0effb..1b7876bb22 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -1570,12 +1570,10 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, do_dup(vece, dofs, oprsz, maxsz, NULL, in, 0); tcg_temp_free_i64(in); } -} else { +} else if (vece == 4) { /* 128-bit duplicate. */ -/* ??? Dup to 256-bit vector. */ int i; -tcg_debug_assert(vece == 4); tcg_debug_assert(oprsz >= 16); if (TCG_TARGET_HAS_v128) { TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V128); @@ -1601,6 +1599,54 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, if (oprsz < maxsz) { expand_clr(dofs + oprsz, maxsz - oprsz); } +} else if (vece == 5) { +/* 256-bit duplicate. */ +int i; + +tcg_debug_assert(oprsz >= 32); +tcg_debug_assert(oprsz % 32 == 0); +if (TCG_TARGET_HAS_v256) { +TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V256); + +tcg_gen_ld_vec(in, cpu_env, aofs); +for (i = 0; i < oprsz; i += 32) { +tcg_gen_st_vec(in, cpu_env, dofs + i); +} +tcg_temp_free_vec(in); +} else if (TCG_TARGET_HAS_v128) { +TCGv_vec in0 = tcg_temp_new_vec(TCG_TYPE_V128); +TCGv_vec in1 = tcg_temp_new_vec(TCG_TYPE_V128); + +tcg_gen_ld_vec(in0, cpu_env, aofs); +tcg_gen_ld_vec(in1, cpu_env, aofs + 16); +for (i = 0; i < oprsz; i += 32) { +tcg_gen_st_vec(in0, cpu_env, dofs + i); +tcg_gen_st_vec(in1, cpu_env, dofs + i + 16); +} +tcg_temp_free_vec(in0); +tcg_temp_free_vec(in1); +} else { +TCGv_i64 in[4]; +int j; + +for (j = 0; j < 4; ++j) { +in[j] = tcg_temp_new_i64(); +tcg_gen_ld_i64(in[j], cpu_env, aofs + j * 8); +} +for (i = 0; i < oprsz; i += 32) { +for (j = 0; j < 4; ++j) { +tcg_gen_st_i64(in[j], cpu_env, dofs + i + j * 8); +} +} +for (j = 0; j < 4; ++j) { +tcg_temp_free_i64(in[j]); +} +} +if (oprsz < maxsz) { +expand_clr(dofs + oprsz, maxsz - oprsz); +} +} else { +g_assert_not_reached(); } } -- 2.25.1
[PATCH v2 092/100] target/arm: Implement SVE2 FCVTXNT, FCVTX
From: Stephen Long Signed-off-by: Stephen Long Message-Id: <20200428174332.17162-4-stepl...@quicinc.com> [rth: Use do_frint_mode, which avoids a specific runtime helper.] Signed-off-by: Richard Henderson --- target/arm/sve.decode | 2 ++ target/arm/translate-sve.c | 49 ++ 2 files changed, 41 insertions(+), 10 deletions(-) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index abe26f2424..6c0e39d553 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1575,6 +1575,8 @@ SM4EKEY 01000101 00 1 . 0 0 . . @rd_rn_rm_e0 RAX101000101 00 1 . 0 1 . . @rd_rn_rm_e0 ### SVE2 floating-point convert precision odd elements +FCVTXNT_ds 01100100 00 0010 10 101 ... . . @rd_pg_rn_e0 +FCVTX_ds01100101 00 0010 10 101 ... . . @rd_pg_rn_e0 FCVTNT_sh 01100100 10 0010 00 101 ... . . @rd_pg_rn_e0 FCVTLT_hs 01100100 10 0010 01 101 ... . . @rd_pg_rn_e0 FCVTNT_ds 01100100 11 0010 10 101 ... . . @rd_pg_rn_e0 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 7b20c65778..0232381500 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -4773,11 +4773,9 @@ static bool trans_FRINTX(DisasContext *s, arg_rpr_esz *a) return do_zpz_ptr(s, a->rd, a->rn, a->pg, a->esz == MO_16, fns[a->esz - 1]); } -static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a, int mode) +static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a, + int mode, gen_helper_gvec_3_ptr *fn) { -if (a->esz == 0) { -return false; -} if (sve_access_check(s)) { unsigned vsz = vec_full_reg_size(s); TCGv_i32 tmode = tcg_const_i32(mode); @@ -4788,7 +4786,7 @@ static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a, int mode) tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd), vec_full_reg_offset(s, a->rn), pred_full_reg_offset(s, a->pg), - status, vsz, vsz, 0, frint_fns[a->esz - 1]); + status, vsz, vsz, 0, fn); gen_helper_set_rmode(tmode, tmode, status); tcg_temp_free_i32(tmode); @@ -4799,27 +4797,42 @@ static bool do_frint_mode(DisasContext *s, arg_rpr_esz *a, int mode) static bool trans_FRINTN(DisasContext *s, arg_rpr_esz *a) { -return do_frint_mode(s, a, float_round_nearest_even); +if (a->esz == 0) { +return false; +} +return do_frint_mode(s, a, float_round_nearest_even, frint_fns[a->esz - 1]); } static bool trans_FRINTP(DisasContext *s, arg_rpr_esz *a) { -return do_frint_mode(s, a, float_round_up); +if (a->esz == 0) { +return false; +} +return do_frint_mode(s, a, float_round_up, frint_fns[a->esz - 1]); } static bool trans_FRINTM(DisasContext *s, arg_rpr_esz *a) { -return do_frint_mode(s, a, float_round_down); +if (a->esz == 0) { +return false; +} +return do_frint_mode(s, a, float_round_down, frint_fns[a->esz - 1]); } static bool trans_FRINTZ(DisasContext *s, arg_rpr_esz *a) { -return do_frint_mode(s, a, float_round_to_zero); +if (a->esz == 0) { +return false; +} +return do_frint_mode(s, a, float_round_to_zero, frint_fns[a->esz - 1]); } static bool trans_FRINTA(DisasContext *s, arg_rpr_esz *a) { -return do_frint_mode(s, a, float_round_ties_away); +if (a->esz == 0) { +return false; +} +return do_frint_mode(s, a, float_round_ties_away, frint_fns[a->esz - 1]); } static bool trans_FRECPX(DisasContext *s, arg_rpr_esz *a) @@ -7812,3 +7825,19 @@ static bool trans_FCVTLT_sd(DisasContext *s, arg_rpr_esz *a) } return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtlt_sd); } + +static bool trans_FCVTX_ds(DisasContext *s, arg_rpr_esz *a) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +return do_frint_mode(s, a, float_round_to_odd, gen_helper_sve_fcvt_ds); +} + +static bool trans_FCVTXNT_ds(DisasContext *s, arg_rpr_esz *a) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +return do_frint_mode(s, a, float_round_to_odd, gen_helper_sve2_fcvtnt_ds); +} -- 2.25.1
Re: Query Regarding Contribution
On 17/06/2020 20.25, khyati agarwal wrote: > Respected , > I am Khyati Agarwal, second-year undergraduate in CSE, IIT, Mandi, > India. I have good knowledge of git, Python, C/C++, php, machine > learning and databases like mongodb, mysql. I'm interested in > contributing to QEMU. I have worked with nlp, tensorflow, keras, etc on > ML projects. I am also looking forward to the next Outreachy round. > Could you please guide me on how to start expressing and contributing ? Hi, thanks for your interest in contributing to QEMU! If you haven't seen it yet, please read our guide for submitting patches first: https://wiki.qemu.org/Contribute/SubmitAPatch And if you're looking for ideas for a patch, have a look at our list here: https://wiki.qemu.org/Contribute/BiteSizedTasks If you have questions, don't hesistate to ask! Thomas
[PATCH v2 089/100] target/arm: Implement SVE2 TBL, TBX
From: Stephen Long Signed-off-by: Stephen Long Message-Id: <20200428144352.9275-1-stepl...@quicinc.com> [rth: rearrange the macros a little and rebase] Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 10 + target/arm/sve.decode | 5 +++ target/arm/sve_helper.c| 90 ++ target/arm/translate-sve.c | 33 ++ 4 files changed, 119 insertions(+), 19 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index d7e2d168ba..a3d49506d0 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -661,6 +661,16 @@ DEF_HELPER_FLAGS_4(sve_tbl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_tbl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_tbl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_tbl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_tbl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_tbl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_tbl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_tbx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_tbx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_tbx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_tbx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(sve_sunpk_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_sunpk_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_sunpk_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 5f2fad4754..609ecce520 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -560,6 +560,11 @@ TBL 0101 .. 1 . 001100 . . @rd_rn_rm # SVE unpack vector elements UNPK0101 esz:2 1100 u:1 h:1 001110 rn:5 rd:5 +# SVE2 Table Lookup (three sources) + +TBL_sve20101 .. 1 . 001010 . . @rd_rn_rm +TBX 0101 .. 1 . 001011 . . @rd_rn_rm + ### SVE Permute - Predicates Group # SVE permute predicate elements diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 4b54ec8c25..6e9e43a1f9 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3014,28 +3014,80 @@ void HELPER(sve_rev_d)(void *vd, void *vn, uint32_t desc) } } -#define DO_TBL(NAME, TYPE, H) \ -void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ -{ \ -intptr_t i, opr_sz = simd_oprsz(desc); \ -uintptr_t elem = opr_sz / sizeof(TYPE);\ -TYPE *d = vd, *n = vn, *m = vm;\ -ARMVectorReg tmp; \ -if (unlikely(vd == vn)) { \ -n = memcpy(, vn, opr_sz); \ -} \ -for (i = 0; i < elem; i++) { \ -TYPE j = m[H(i)]; \ -d[H(i)] = j < elem ? n[H(j)] : 0; \ -} \ +typedef void tb_impl_fn(void *, void *, void *, void *, uintptr_t, bool); + +static inline void do_tbl1(void *vd, void *vn, void *vm, uint32_t desc, + bool is_tbx, tb_impl_fn *fn) +{ +ARMVectorReg scratch; +uintptr_t oprsz = simd_oprsz(desc); + +if (unlikely(vd == vn)) { +vn = memcpy(, vn, oprsz); +} + +fn(vd, vn, NULL, vm, oprsz, is_tbx); } -DO_TBL(sve_tbl_b, uint8_t, H1) -DO_TBL(sve_tbl_h, uint16_t, H2) -DO_TBL(sve_tbl_s, uint32_t, H4) -DO_TBL(sve_tbl_d, uint64_t, ) +static inline void do_tbl2(void *vd, void *vn0, void *vn1, void *vm, + uint32_t desc, bool is_tbx, tb_impl_fn *fn) +{ +ARMVectorReg scratch; +uintptr_t oprsz = simd_oprsz(desc); -#undef TBL +if (unlikely(vd == vn0)) { +vn0 = memcpy(, vn0, oprsz); +if (vd == vn1) { +vn1 = vn0; +} +} else if (unlikely(vd == vn1)) { +vn1 = memcpy(, vn1, oprsz); +} + +fn(vd, vn0, vn1, vm, oprsz, is_tbx); +} + +#define DO_TB(SUFF, TYPE, H)\ +static inline void do_tb_##SUFF(void *vd, void *vt0, void *vt1, \ +void *vm, uintptr_t oprsz, bool is_tbx) \ +{ \ +TYPE *d = vd, *tbl0 = vt0, *tbl1 = vt1, *indexes = vm; \ +uintptr_t i, nelem = oprsz / sizeof(TYPE); \ +for (i = 0; i < nelem; ++i) { \ +TYPE index =
Re: [PATCH v2 00/12] Add Nuvoton NPCM730/NPCM750 SoCs and two BMC machines
On Wed, Jun 17, 2020 at 8:54 AM Cédric Le Goater wrote: > Hello, > > On 6/12/20 12:30 AM, Havard Skinnemoen wrote: > > This patch series models enough of the Nuvoton NPCM730 and NPCM750 SoCs > to boot > > an OpenBMC image built for quanta-gsj. This includes device models for: > > > > - Global Configuration Registers > > - Clock Control > > - Timers > > - Fuses > > - Memory Controller > > - Flash Controller > > Do you have a git tree for this patchset ? > Yes, but nothing public. I can set up a github fork if you want. > > These modules, along with the existing Cortex A9 CPU cores and built-in > > peripherals, are integrated into a NPCM730 or NPCM750 SoC, which in turn > form > > the foundation for the quanta-gsj and npcm750-evb machines, > respectively. The > > two SoCs are very similar; the only difference is that NPCM730 is > missing some > > peripherals that NPCM750 has, and which are not considered essential for > > datacenter use (e.g. graphics controllers). For more information, see > > > > https://www.nuvoton.com/products/cloud-computing/ibmc/ > > > > Both quanta-gsj and npcm750-evb correspond to real boards supported by > OpenBMC. > > While this initial series uses a stripped-down kernel for testing, future > > series will be tested using OpenBMC images built from public sources. I'm > > currently putting the finishing touches on flash controller support, > which is > > necessary to boot a full OpenBMC image, and will be enabled by the next > series. > > ok. > > It would be nice to be able to download the images from some site > like we do for Aspeed. > It looks like Joel got this covered for gsj. I'll look into setting something up for npcm750-evb. > > > The patches in this series were developed by Google and reviewed by > Nuvoton. We > > will be maintaining the machine and peripheral support together. > > > > The data sheet for these SoCs is not generally available. Please let me > know if > > more comments are needed to understand the device behavior. > > > > Changes since v1 (requested by reviewers): > > > > - Clarify the source of CLK reset values. > > - Made smpboot a constant byte array, eliinated byte swapping. > > I have revived a PPC64 host. We might want to add the swapping back. > OK. Havard
[PATCH v2 086/100] target/arm: Implement SVE2 crypto unary operations
Signed-off-by: Richard Henderson --- target/arm/sve.decode | 6 ++ target/arm/translate-sve.c | 11 +++ 2 files changed, 17 insertions(+) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 0be8a020f6..9b0d0f3a5d 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1551,3 +1551,9 @@ STNT1_zprz 1110010 .. 00 . 001 ... . . \ # SVE2 32-bit scatter non-temporal store (vector plus scalar) STNT1_zprz 1110010 .. 10 . 001 ... . . \ @rprr_scatter_store xs=0 esz=2 scale=0 + +### SVE2 Crypto Extensions + +# SVE2 crypto unary operations +# AESMC and AESIMC +AESMC 01000101 00 1011100 decrypt:1 0 rd:5 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 152b0b605d..bc65b3e367 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7682,3 +7682,14 @@ static bool trans_USDOT_(DisasContext *s, arg_USDOT_ *a) } return true; } + +static bool trans_AESMC(DisasContext *s, arg_AESMC *a) +{ +if (!dc_isar_feature(aa64_sve2_aes, s)) { +return false; +} +if (sve_access_check(s)) { +gen_gvec_ool_zz(s, gen_helper_crypto_aesmc, a->rd, a->rd, a->decrypt); +} +return true; +} -- 2.25.1
[Bug 1884017] [NEW] Intermittently erratic mouse under Windows 95
Public bug reported: The mouse works fine maybe 75-80% of the time, but intermittently (every 20-30 seconds or so), moving the mouse will cause the pointer to fly around the screen at high speed, usually colliding with the edges, and much more problematically, click all the mouse buttons at random, even if you are not clicking. This causes random objects on the screen to be clicked and dragged around, rendering the system generally unusable. I don't know if this is related to #1785485 - it happens even if you never use the scroll wheel. qemu version: 5.0.0 (openSUSE Tumbleweed) Launch command line: qemu-system-i386 -hda win95.qcow2 -cpu pentium2 -m 16 -vga cirrus -soundhw sb16 -nic user,model=pcnet -rtc base=localtime OS version: Windows 95 4.00.950 C I have made the disk image available here: https://home.gloveraoki.me/share/win95.qcow2.lz Setup notes: In order to make Windows 95 detect the system devices correctly, after first install you must change the driver for "Plug and Play BIOS" to "PCI bus". I have already done this in the above image. ** Affects: qemu Importance: Undecided Status: New ** Tags: qemu-system-i386 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1884017 Title: Intermittently erratic mouse under Windows 95 Status in QEMU: New Bug description: The mouse works fine maybe 75-80% of the time, but intermittently (every 20-30 seconds or so), moving the mouse will cause the pointer to fly around the screen at high speed, usually colliding with the edges, and much more problematically, click all the mouse buttons at random, even if you are not clicking. This causes random objects on the screen to be clicked and dragged around, rendering the system generally unusable. I don't know if this is related to #1785485 - it happens even if you never use the scroll wheel. qemu version: 5.0.0 (openSUSE Tumbleweed) Launch command line: qemu-system-i386 -hda win95.qcow2 -cpu pentium2 -m 16 -vga cirrus -soundhw sb16 -nic user,model=pcnet -rtc base=localtime OS version: Windows 95 4.00.950 C I have made the disk image available here: https://home.gloveraoki.me/share/win95.qcow2.lz Setup notes: In order to make Windows 95 detect the system devices correctly, after first install you must change the driver for "Plug and Play BIOS" to "PCI bus". I have already done this in the above image. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1884017/+subscriptions
[PATCH v2 084/100] target/arm: Implement SVE mixed sign dot product (indexed)
Signed-off-by: Richard Henderson --- target/arm/cpu.h | 5 +++ target/arm/helper.h| 4 +++ target/arm/sve.decode | 4 +++ target/arm/translate-sve.c | 18 +++ target/arm/vec_helper.c| 66 ++ 5 files changed, 97 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 331c5cdd4b..df0a3e201b 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -3877,6 +3877,11 @@ static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id) return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0; } +static inline bool isar_feature_aa64_sve2_i8mm(const ARMISARegisters *id) +{ +return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, I8MM) != 0; +} + static inline bool isar_feature_aa64_sve2_f32mm(const ARMISARegisters *id) { return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, F32MM) != 0; diff --git a/target/arm/helper.h b/target/arm/helper.h index e9d7ab97da..6fac613dfc 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -587,6 +587,10 @@ DEF_HELPER_FLAGS_5(gvec_sdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_udot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sudot_idx_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_usdot_idx_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index e8011fe91b..51acbfa797 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -815,6 +815,10 @@ SQRDMLSH_zzxz_h 01000100 .. 1 . 000101 . . @rrxr_h SQRDMLSH_zzxz_s 01000100 .. 1 . 000101 . . @rrxr_s SQRDMLSH_zzxz_d 01000100 .. 1 . 000101 . . @rrxr_d +# SVE mixed sign dot product (indexed) +USDOT_zzxw_s01000100 .. 1 . 000110 . . @rrxr_s +SUDOT_zzxw_s01000100 .. 1 . 000111 . . @rrxr_s + # SVE2 saturating multiply-add (indexed) SQDMLALB_zzxw_s 01000100 .. 1 . 0010.0 . . @rrxw_s SQDMLALB_zzxw_d 01000100 .. 1 . 0010.0 . . @rrxw_d diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 94c1e9aa05..fe4b4b7387 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3842,6 +3842,24 @@ DO_RRXR(trans_SDOT_zzxw_d, gen_helper_gvec_sdot_idx_h) DO_RRXR(trans_UDOT_zzxw_s, gen_helper_gvec_udot_idx_b) DO_RRXR(trans_UDOT_zzxw_d, gen_helper_gvec_udot_idx_h) +static bool trans_SUDOT_zzxw_s(DisasContext *s, arg_rrxr_esz *a) +{ +if (!dc_isar_feature(aa64_sve2_i8mm, s)) { +return false; +} +return do_zzxz_data(s, a->rd, a->rn, a->rm, a->ra, +gen_helper_gvec_sudot_idx_b, a->index); +} + +static bool trans_USDOT_zzxw_s(DisasContext *s, arg_rrxr_esz *a) +{ +if (!dc_isar_feature(aa64_sve2_i8mm, s)) { +return false; +} +return do_zzxz_data(s, a->rd, a->rn, a->rm, a->ra, +gen_helper_gvec_usdot_idx_b, a->index); +} + #undef DO_RRXR static bool do_sve2_zzx_data(DisasContext *s, arg_rrx_esz *a, diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 8e85a16e7e..e1689d730f 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -678,6 +678,72 @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, clear_tail(d, opr_sz, simd_maxsz(desc)); } +void HELPER(gvec_sudot_idx_b)(void *vd, void *vn, void *vm, + void *va, uint32_t desc) +{ +intptr_t i, segend, opr_sz = simd_oprsz(desc), opr_sz_4 = opr_sz / 4; +intptr_t index = simd_data(desc); +int32_t *d = vd, *a = va; +int8_t *n = vn; +uint8_t *m_indexed = (uint8_t *)vm + index * 4; + +/* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd. + * Otherwise opr_sz is a multiple of 16. + */ +segend = MIN(4, opr_sz_4); +i = 0; +do { +uint8_t m0 = m_indexed[i * 4 + 0]; +uint8_t m1 = m_indexed[i * 4 + 1]; +uint8_t m2 = m_indexed[i * 4 + 2]; +uint8_t m3 = m_indexed[i * 4 + 3]; + +do { +d[i] = (a[i] + +n[i * 4 + 0] * m0 + +n[i * 4 + 1] * m1 + +n[i * 4 + 2] * m2 + +n[i * 4 + 3] * m3); +} while (++i < segend); +segend = i + 4; +} while (i < opr_sz_4); + +clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_usdot_idx_b)(void *vd, void *vn, void *vm, + void *va, uint32_t desc) +{ +intptr_t i, segend, opr_sz = simd_oprsz(desc), opr_sz_4 = opr_sz / 4; +intptr_t index = simd_data(desc); +uint32_t *d = vd, *a = va; +uint8_t *n = vn; +int8_t *m_indexed =
Re: [PATCH] hw/audio/gus: Fix registers 32-bit access
On 18/06/2020 00.25, Allan Peramaki wrote: > On 17/06/2020 23:23, Peter Maydell wrote: >> >> This patch is quite difficult to read because it mixes some >> whitespace only changes with some actual changes of >> behaviour. > > Sorry about that. I had to put some whitespace in the two lines I > modified because of checkpatch.pl, but then the nearby lines would have > had inconsistent style if left unmodified. Hi Allan! Makes perfect sense, but for the review, it might have been helpful if you'd put this information in the commit message. Anyway, the change looks correct to me, so: Reviewed-by: Thomas Huth
[PATCH v2 099/100] target/arm: Implement SVE2 bitwise shift immediate
From: Stephen Long Implements SQSHL/UQSHL, SRSHR/URSHR, and SQSHLU Signed-off-by: Stephen Long Message-Id: <20200430194159.24064-1-stepl...@quicinc.com> Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 33 + target/arm/sve.decode | 5 target/arm/sve_helper.c| 39 +++-- target/arm/translate-sve.c | 60 ++ 4 files changed, 135 insertions(+), 2 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index a00d1904b7..cb609b5daa 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2250,6 +2250,39 @@ DEF_HELPER_FLAGS_5(sve2_sqrdcmlah_idx_h, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve2_sqrdcmlah_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqshl_zpzi_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqshl_zpzi_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqshl_zpzi_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqshl_zpzi_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_uqshl_zpzi_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_uqshl_zpzi_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_uqshl_zpzi_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_uqshl_zpzi_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_srshr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_srshr_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_srshr_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_srshr_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_urshr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_urshr_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_urshr_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_urshr_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_sqshlu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqshlu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqshlu_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqshlu_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 4e21274dc4..d2f33d96f3 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -342,6 +342,11 @@ ASR_zpzi0100 .. 000 000 100 ... .. ... . @rdn_pg_tszimm_shr LSR_zpzi0100 .. 000 001 100 ... .. ... . @rdn_pg_tszimm_shr LSL_zpzi0100 .. 000 011 100 ... .. ... . @rdn_pg_tszimm_shl ASRD0100 .. 000 100 100 ... .. ... . @rdn_pg_tszimm_shr +SQSHL_zpzi 0100 .. 000 110 100 ... .. ... . @rdn_pg_tszimm_shl +UQSHL_zpzi 0100 .. 000 111 100 ... .. ... . @rdn_pg_tszimm_shl +SRSHR 0100 .. 001 100 100 ... .. ... . @rdn_pg_tszimm_shr +URSHR 0100 .. 001 101 100 ... .. ... . @rdn_pg_tszimm_shr +SQSHLU 0100 .. 001 111 100 ... .. ... . @rdn_pg_tszimm_shl # SVE bitwise shift by vector (predicated) ASR_zpzz0100 .. 010 000 100 ... . . @rdn_pg_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index b37fb60b7d..fe79e22bb8 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2231,6 +2231,43 @@ DO_ZPZI(sve_asrd_h, int16_t, H1_2, DO_ASRD) DO_ZPZI(sve_asrd_s, int32_t, H1_4, DO_ASRD) DO_ZPZI_D(sve_asrd_d, int64_t, DO_ASRD) +#define DO_RSHR(x, sh) ((x >> sh) + ((x >> (sh - 1)) & 1)) + +/* SVE2 bitwise shift by immediate */ +DO_ZPZI(sve2_sqshl_zpzi_b, int8_t, H1, do_sqshl_b) +DO_ZPZI(sve2_sqshl_zpzi_h, int16_t, H1_2, do_sqshl_h) +DO_ZPZI(sve2_sqshl_zpzi_s, int32_t, H1_4, do_sqshl_s) +DO_ZPZI_D(sve2_sqshl_zpzi_d, int64_t, do_sqshl_d) + +DO_ZPZI(sve2_uqshl_zpzi_b, uint8_t, H1, do_uqshl_b) +DO_ZPZI(sve2_uqshl_zpzi_h, uint16_t, H1_2, do_uqshl_h) +DO_ZPZI(sve2_uqshl_zpzi_s, uint32_t, H1_4, do_uqshl_s) +DO_ZPZI_D(sve2_uqshl_zpzi_d, uint64_t, do_uqshl_d) + +DO_ZPZI(sve2_srshr_b, int8_t, H1, DO_RSHR) +DO_ZPZI(sve2_srshr_h, int16_t, H1_2, DO_RSHR) +DO_ZPZI(sve2_srshr_s, int32_t, H1_4, DO_RSHR) +DO_ZPZI_D(sve2_srshr_d, int64_t, DO_RSHR) + +DO_ZPZI(sve2_urshr_b, uint8_t, H1, DO_RSHR) +DO_ZPZI(sve2_urshr_h, uint16_t, H1_2, DO_RSHR) +DO_ZPZI(sve2_urshr_s, uint32_t, H1_4, DO_RSHR) +DO_ZPZI_D(sve2_urshr_d, uint64_t, DO_RSHR) + +#define do_suqrshl_b(n, m) \ + ({ uint32_t discard;
[PATCH v2 091/100] target/arm: Implement SVE2 FCVTLT
From: Stephen Long Signed-off-by: Stephen Long Message-Id: <20200428174332.17162-3-stepl...@quicinc.com> Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 5 + target/arm/sve.decode | 2 ++ target/arm/sve_helper.c| 23 +++ target/arm/translate-sve.c | 16 4 files changed, 46 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 252344bda6..935655d07a 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2251,3 +2251,8 @@ DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_fcvtlt_sd, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 9ba4bb476e..abe26f2424 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1576,4 +1576,6 @@ RAX101000101 00 1 . 0 1 . . @rd_rn_rm_e0 ### SVE2 floating-point convert precision odd elements FCVTNT_sh 01100100 10 0010 00 101 ... . . @rd_pg_rn_e0 +FCVTLT_hs 01100100 10 0010 01 101 ... . . @rd_pg_rn_e0 FCVTNT_ds 01100100 11 0010 10 101 ... . . @rd_pg_rn_e0 +FCVTLT_sd 01100100 11 0010 11 101 ... . . @rd_pg_rn_e0 diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 7a5b0d37c5..8bfc9393a1 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -7194,3 +7194,26 @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc) \ DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16) DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t, H1_4, H1_2, float64_to_float32) + +#define DO_FCVTLT(NAME, TYPEW, TYPEN, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc) \ +{ \ +intptr_t i = simd_oprsz(desc);\ +uint64_t *g = vg; \ +do { \ +uint64_t pg = g[(i - 1) >> 6];\ +do { \ +i -= sizeof(TYPEW); \ +if (likely((pg >> (i & 63)) & 1)) { \ +TYPEN nn = *(TYPEN *)(vn + HN(i + sizeof(TYPEN)));\ +*(TYPEW *)(vd + HW(i)) = OP(nn, status); \ +} \ +} while (i & 63); \ +} while (i != 0); \ +} + +DO_FCVTLT(sve2_fcvtlt_hs, uint32_t, uint16_t, H1_4, H1_2, sve_f16_to_f32) +DO_FCVTLT(sve2_fcvtlt_sd, uint64_t, uint32_t, H1_4, H1_2, float32_to_float64) + +#undef DO_FCVTLT +#undef DO_FCVTNT diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 3c145857db..7b20c65778 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7796,3 +7796,19 @@ static bool trans_FCVTNT_ds(DisasContext *s, arg_rpr_esz *a) } return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_ds); } + +static bool trans_FCVTLT_hs(DisasContext *s, arg_rpr_esz *a) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtlt_hs); +} + +static bool trans_FCVTLT_sd(DisasContext *s, arg_rpr_esz *a) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtlt_sd); +} -- 2.25.1
[PATCH v2 082/100] target/arm: Implement SVE2 multiply-add long (indexed)
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 17 + target/arm/sve.decode | 18 ++ target/arm/sve_helper.c| 16 target/arm/translate-sve.c | 20 4 files changed, 71 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index e8e616a247..f309753620 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2210,3 +2210,20 @@ DEF_HELPER_FLAGS_4(sve2_sqdmull_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqdmull_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_smlal_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_smlal_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_smlsl_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_smlsl_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_umlal_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_umlal_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_umlsl_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_umlsl_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 80d76982e8..da77ad689f 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -825,6 +825,24 @@ SQDMLSLB_zzxw_d 01000100 .. 1 . 0011.0 . . @rrxw_d SQDMLSLT_zzxw_s 01000100 .. 1 . 0011.1 . . @rrxw_s SQDMLSLT_zzxw_d 01000100 .. 1 . 0011.1 . . @rrxw_d +# SVE2 multiply-add long (indexed) +SMLALB_zzxw_s 01000100 .. 1 . 1000.0 . . @rrxw_s +SMLALB_zzxw_d 01000100 .. 1 . 1000.0 . . @rrxw_d +SMLALT_zzxw_s 01000100 .. 1 . 1000.1 . . @rrxw_s +SMLALT_zzxw_d 01000100 .. 1 . 1000.1 . . @rrxw_d +UMLALB_zzxw_s 01000100 .. 1 . 1001.0 . . @rrxw_s +UMLALB_zzxw_d 01000100 .. 1 . 1001.0 . . @rrxw_d +UMLALT_zzxw_s 01000100 .. 1 . 1001.1 . . @rrxw_s +UMLALT_zzxw_d 01000100 .. 1 . 1001.1 . . @rrxw_d +SMLSLB_zzxw_s 01000100 .. 1 . 1010.0 . . @rrxw_s +SMLSLB_zzxw_d 01000100 .. 1 . 1010.0 . . @rrxw_d +SMLSLT_zzxw_s 01000100 .. 1 . 1010.1 . . @rrxw_s +SMLSLT_zzxw_d 01000100 .. 1 . 1010.1 . . @rrxw_d +UMLSLB_zzxw_s 01000100 .. 1 . 1011.0 . . @rrxw_s +UMLSLB_zzxw_d 01000100 .. 1 . 1011.0 . . @rrxw_d +UMLSLT_zzxw_s 01000100 .. 1 . 1011.1 . . @rrxw_s +UMLSLT_zzxw_d 01000100 .. 1 . 1011.1 . . @rrxw_d + # SVE2 integer multiply long (indexed) SMULLB_zzx_s01000100 .. 1 . 1100.0 . . @rrxl_s SMULLB_zzx_d01000100 .. 1 . 1100.0 . . @rrxl_d diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 52a235826c..479fffa16c 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1546,6 +1546,20 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ } \ } +#define DO_MLA(N, M, A) (A + N * M) + +DO_ZZXW(sve2_smlal_idx_s, int32_t, int16_t, H1_4, H1_2, DO_MLA) +DO_ZZXW(sve2_smlal_idx_d, int64_t, int32_t, , H1_4, DO_MLA) +DO_ZZXW(sve2_umlal_idx_s, uint32_t, uint16_t, H1_4, H1_2, DO_MLA) +DO_ZZXW(sve2_umlal_idx_d, uint64_t, uint32_t, , H1_4, DO_MLA) + +#define DO_MLS(A, N, M) (A - N * M) + +DO_ZZXW(sve2_smlsl_idx_s, int32_t, int16_t, H1_4, H1_2, DO_MLS) +DO_ZZXW(sve2_smlsl_idx_d, int64_t, int32_t, , H1_4, DO_MLS) +DO_ZZXW(sve2_umlsl_idx_s, uint32_t, uint16_t, H1_4, H1_2, DO_MLS) +DO_ZZXW(sve2_umlsl_idx_d, uint64_t, uint32_t, , H1_4, DO_MLS) + #define DO_SQDMLAL_S(N, M, A) DO_SQADD_S(A, do_sqdmull_s(N, M)) #define DO_SQDMLAL_D(N, M, A) do_sqadd_d(A, do_sqdmull_d(N, M)) @@ -1558,6 +1572,8 @@ DO_ZZXW(sve2_sqdmlal_idx_d, int64_t, int32_t, , H1_4, DO_SQDMLAL_D) DO_ZZXW(sve2_sqdmlsl_idx_s, int32_t, int16_t, H1_4, H1_2, DO_SQDMLSL_S) DO_ZZXW(sve2_sqdmlsl_idx_d, int64_t, int32_t, , H1_4, DO_SQDMLSL_D) +#undef DO_MLA +#undef DO_MLS #undef DO_ZZXW #define DO_ZZX(NAME, TYPEW, TYPEN, HW, HN, OP) \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index fa28f011b6..4105a7977a 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3944,6 +3944,26 @@ DO_SVE2_RRXR_TB(trans_SQDMLSLB_zzxw_d, gen_helper_sve2_sqdmlsl_idx_d,
Re: [PATCH] hw/audio/gus: Fix registers 32-bit access
On 15/06/2020 22.17, Allan Peramaki wrote: > Fix audio on software that accesses DRAM above 64k via register peek/poke > and some cases when more than 16 voices are used. > > Fixes: 135f5ae1974c ("audio: GUSsample is int16_t") > Signed-off-by: Allan Peramaki > --- > hw/audio/gusemu_hal.c | 6 +++--- > hw/audio/gusemu_mixer.c | 8 > 2 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/hw/audio/gusemu_hal.c b/hw/audio/gusemu_hal.c > index ae40ca341c..e35e941926 100644 > --- a/hw/audio/gusemu_hal.c > +++ b/hw/audio/gusemu_hal.c > @@ -30,9 +30,9 @@ > #include "gustate.h" > #include "gusemu.h" > > -#define GUSregb(position) (*(gusptr+(position))) > -#define GUSregw(position) (*(uint16_t *) (gusptr+(position))) > -#define GUSregd(position) (*(uint16_t *)(gusptr+(position))) > +#define GUSregb(position) (*(gusptr + (position))) > +#define GUSregw(position) (*(uint16_t *)(gusptr + (position))) > +#define GUSregd(position) (*(uint32_t *)(gusptr + (position))) > > /* size given in bytes */ > unsigned int gus_read(GUSEmuState * state, int port, int size) > diff --git a/hw/audio/gusemu_mixer.c b/hw/audio/gusemu_mixer.c > index 00b9861b92..3b39254518 100644 > --- a/hw/audio/gusemu_mixer.c > +++ b/hw/audio/gusemu_mixer.c > @@ -26,11 +26,11 @@ > #include "gusemu.h" > #include "gustate.h" > > -#define GUSregb(position) (*(gusptr+(position))) > -#define GUSregw(position) (*(uint16_t *) (gusptr+(position))) > -#define GUSregd(position) (*(uint16_t *)(gusptr+(position))) > +#define GUSregb(position) (*(gusptr + (position))) > +#define GUSregw(position) (*(uint16_t *)(gusptr + (position))) > +#define GUSregd(position) (*(uint32_t *)(gusptr + (position))) > > -#define GUSvoice(position) (*(uint16_t *)(voiceptr+(position))) > +#define GUSvoice(position) (*(uint16_t *)(voiceptr + (position))) > > /* samples are always 16bit stereo (4 bytes each, first right then left > interleaved) */ > void gus_mixvoices(GUSEmuState * state, unsigned int playback_freq, unsigned > int numsamples, > This might be a good candidate for the stable branches, too (now on the CC: list). Thomas
[PATCH v2 098/100] target/arm: Implement 128-bit ZIP, UZP, TRN
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 3 ++ target/arm/sve.decode | 8 ++ target/arm/sve_helper.c| 29 +-- target/arm/translate-sve.c | 58 ++ 4 files changed, 90 insertions(+), 8 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index aa7d113232..a00d1904b7 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -689,16 +689,19 @@ DEF_HELPER_FLAGS_4(sve_zip_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_zip_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_zip_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_zip_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_zip_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_uzp_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_trn_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_trn_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_trn_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_trn_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_trn_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_compact_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_compact_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index e0d093c5d7..4e21274dc4 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -592,6 +592,14 @@ UZP2_z 0101 .. 1 . 011 011 . . @rd_rn_rm TRN1_z 0101 .. 1 . 011 100 . . @rd_rn_rm TRN2_z 0101 .. 1 . 011 101 . . @rd_rn_rm +# SVE2 permute vector segments +ZIP1_q 0101 10 1 . 000 000 . . @rd_rn_rm_e0 +ZIP2_q 0101 10 1 . 000 001 . . @rd_rn_rm_e0 +UZP1_q 0101 10 1 . 000 010 . . @rd_rn_rm_e0 +UZP2_q 0101 10 1 . 000 011 . . @rd_rn_rm_e0 +TRN1_q 0101 10 1 . 000 110 . . @rd_rn_rm_e0 +TRN2_q 0101 10 1 . 000 111 . . @rd_rn_rm_e0 + ### SVE Permute - Predicated Group # SVE compress active elements diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 1b92f203c2..b37fb60b7d 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3486,36 +3486,45 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ *(TYPE *)(vd + H(2 * i + 0)) = *(TYPE *)(vn + H(i)); \ *(TYPE *)(vd + H(2 * i + sizeof(TYPE))) = *(TYPE *)(vm + H(i)); \ }\ +if (sizeof(TYPE) == 16 && unlikely(oprsz & 16)) {\ +memset(vd + oprsz - 16, 0, 16); \ +}\ } DO_ZIP(sve_zip_b, uint8_t, H1) DO_ZIP(sve_zip_h, uint16_t, H1_2) DO_ZIP(sve_zip_s, uint32_t, H1_4) DO_ZIP(sve_zip_d, uint64_t, ) +DO_ZIP(sve2_zip_q, Int128, ) #define DO_UZP(NAME, TYPE, H) \ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ { \ intptr_t oprsz = simd_oprsz(desc); \ -intptr_t oprsz_2 = oprsz / 2; \ intptr_t odd_ofs = simd_data(desc);\ -intptr_t i;\ +intptr_t i, p; \ ARMVectorReg tmp_m;\ if (unlikely((vm - vd) < (uintptr_t)oprsz)) { \ vm = memcpy(_m, vm, oprsz);\ } \ -for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \ -*(TYPE *)(vd + H(i)) = *(TYPE *)(vn + H(2 * i + odd_ofs)); \ -} \ -for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \ -*(TYPE *)(vd + H(oprsz_2 + i)) = *(TYPE *)(vm + H(2 * i + odd_ofs)); \ -} \ +i = 0, p = odd_ofs;\ +do {
[PATCH v2 097/100] target/arm: Implement SVE2 LD1RO
Signed-off-by: Richard Henderson --- target/arm/sve.decode | 4 ++ target/arm/translate-sve.c | 95 ++ 2 files changed, 99 insertions(+) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 6808ff4194..e0d093c5d7 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1119,11 +1119,15 @@ LD_zpri 1010010 .. nreg:2 0 111 ... . . @rpri_load_msz # SVE load and broadcast quadword (scalar plus scalar) LD1RQ_zprr 1010010 .. 00 . 000 ... . . \ @rprr_load_msz nreg=0 +LD1RO_zprr 1010010 .. 01 . 000 ... . . \ +@rprr_load_msz nreg=0 # SVE load and broadcast quadword (scalar plus immediate) # LD1RQB, LD1RQH, LD1RQS, LD1RQD LD1RQ_zpri 1010010 .. 00 0 001 ... . . \ @rpri_load_msz nreg=0 +LD1RO_zpri 1010010 .. 01 0 001 ... . . \ +@rpri_load_msz nreg=0 # SVE 32-bit gather prefetch (scalar plus 32-bit scaled offsets) PRF 110 00 -1 - 0-- --- - 0 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 6bdff5ceca..6b4a05a76d 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5469,6 +5469,101 @@ static bool trans_LD1RQ_zpri(DisasContext *s, arg_rpri_load *a) return true; } +static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype) +{ +unsigned vsz = vec_full_reg_size(s); +unsigned vsz_r32; +TCGv_ptr t_pg; +TCGv_i32 t_desc; +int desc, poff, doff; + +if (vsz < 32) { +/* + * Note that this UNDEFINED check comes after CheckSVEEnabled() + * in the ARM pseudocode, which is the sve_access_check() done + * in our caller. We should not now return false from the caller. + */ +unallocated_encoding(s); +return; +} + +/* Load the first octaword using the normal predicated load helpers. */ + +poff = pred_full_reg_offset(s, pg); +if (vsz > 32) { +/* + * Zero-extend the first 32 bits of the predicate into a temporary. + * This avoids triggering an assert making sure we don't have bits + * set within a predicate beyond VQ, but we have lowered VQ to 2 + * for this load operation. + */ +TCGv_i64 tmp = tcg_temp_new_i64(); +#ifdef HOST_WORDS_BIGENDIAN +poff += 4; +#endif +tcg_gen_ld32u_i64(tmp, cpu_env, poff); + +poff = offsetof(CPUARMState, vfp.preg_tmp); +tcg_gen_st_i64(tmp, cpu_env, poff); +tcg_temp_free_i64(tmp); +} + +t_pg = tcg_temp_new_ptr(); +tcg_gen_addi_ptr(t_pg, cpu_env, poff); + +desc = simd_desc(32, 32, zt); +t_desc = tcg_const_i32(desc); + +ldr_fns[s->be_data == MO_BE][dtype][0](cpu_env, t_pg, addr, t_desc); + +tcg_temp_free_ptr(t_pg); +tcg_temp_free_i32(t_desc); + +/* + * Replicate that first octaword. + * The replication happens in units of 32; if the full vector size + * is not a multiple of 32, the final bits are zeroed. + */ +doff = vec_full_reg_offset(s, zt); +vsz_r32 = QEMU_ALIGN_DOWN(vsz, 32); +if (vsz >= 64) { +tcg_gen_gvec_dup_mem(5, doff + 32, doff, vsz_r32 - 32, vsz - 32); +} else if (vsz > vsz_r32) { +/* Nop move, with side effect of clearing the tail. */ +tcg_gen_gvec_mov(MO_64, doff, doff, vsz_r32, vsz); +} +} + +static bool trans_LD1RO_zprr(DisasContext *s, arg_rprr_load *a) +{ +if (!dc_isar_feature(aa64_sve2_f64mm, s)) { +return false; +} +if (a->rm == 31) { +return false; +} +if (sve_access_check(s)) { +TCGv_i64 addr = new_tmp_a64(s); +tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype)); +tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn)); +do_ldro(s, a->rd, a->pg, addr, a->dtype); +} +return true; +} + +static bool trans_LD1RO_zpri(DisasContext *s, arg_rpri_load *a) +{ +if (!dc_isar_feature(aa64_sve2_f64mm, s)) { +return false; +} +if (sve_access_check(s)) { +TCGv_i64 addr = new_tmp_a64(s); +tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), a->imm * 32); +do_ldro(s, a->rd, a->pg, addr, a->dtype); +} +return true; +} + /* Load and broadcast element. */ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a) { -- 2.25.1
[PATCH v2 100/100] target/arm: Implement SVE2 fp multiply-add long
From: Stephen Long Implements both vectored and indexed FMLALB, FMLALT, FMLSLB, FMLSLT Signed-off-by: Stephen Long Message-Id: <20200504171240.11220-1-stepl...@quicinc.com> [rth: Rearrange to use float16_to_float32_by_bits.] Signed-off-by: Richard Henderson --- target/arm/helper.h| 5 +++ target/arm/sve.decode | 12 ++ target/arm/translate-sve.c | 75 ++ target/arm/vec_helper.c| 51 ++ 4 files changed, 143 insertions(+) diff --git a/target/arm/helper.h b/target/arm/helper.h index bfeb327272..f471ff27d1 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -847,6 +847,11 @@ DEF_HELPER_FLAGS_4(sve2_sqrdmulh_idx_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_4(sve2_sqrdmulh_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_fmlal_zzzw_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_fmlal_zzxw_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/sve.decode b/target/arm/sve.decode index d2f33d96f3..6708c048e0 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1601,3 +1601,15 @@ FCVTLT_sd 01100100 11 0010 11 101 ... . . @rd_pg_rn_e0 ### SVE2 floating-point convert to integer FLOGB 01100101 00 011 esz:2 0101 pg:3 rn:5 rd:5 _esz + +### SVE2 floating-point multiply-add long (vectors) +FMLALB_zzzw 01100100 .. 1 . 10 0 00 0 . . @rda_rn_rm +FMLALT_zzzw 01100100 .. 1 . 10 0 00 1 . . @rda_rn_rm +FMLSLB_zzzw 01100100 .. 1 . 10 1 00 0 . . @rda_rn_rm +FMLSLT_zzzw 01100100 .. 1 . 10 1 00 1 . . @rda_rn_rm + +### SVE2 floating-point multiply-add long (indexed) +FMLALB_zzxw 01100100 .. 1 . 0100.0 . . @rrxw_s +FMLALT_zzxw 01100100 .. 1 . 0100.1 . . @rrxw_s +FMLSLB_zzxw 01100100 .. 1 . 0110.0 . . @rrxw_s +FMLSLT_zzxw 01100100 .. 1 . 0110.1 . . @rrxw_s diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 2b2e186988..8b51b9f5a6 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -8059,3 +8059,78 @@ static bool trans_FLOGB(DisasContext *s, arg_rpr_esz *a) }; return do_sve2_zpz_ool(s, a, fns[a->esz]); } + +static bool do_FMLAL_zzzw(DisasContext *s, arg__esz *a, bool sub, bool sel) +{ +if (a->esz != MO_32 || !dc_isar_feature(aa64_sve2, s)) { +return false; +} +if (sve_access_check(s)) { +unsigned vsz = vec_full_reg_size(s); +tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vec_full_reg_offset(s, a->ra), + cpu_env, vsz, vsz, (sel << 1) | sub, + gen_helper_sve2_fmlal_zzzw_s); +} +return true; +} + +static bool trans_FMLALB_zzzw(DisasContext *s, arg__esz *a) +{ +return do_FMLAL_zzzw(s, a, false, false); +} + +static bool trans_FMLALT_zzzw(DisasContext *s, arg__esz *a) +{ +return do_FMLAL_zzzw(s, a, false, true); +} + +static bool trans_FMLSLB_zzzw(DisasContext *s, arg__esz *a) +{ +return do_FMLAL_zzzw(s, a, true, false); +} + +static bool trans_FMLSLT_zzzw(DisasContext *s, arg__esz *a) +{ +return do_FMLAL_zzzw(s, a, true, true); +} + +static bool do_FMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sub, bool sel) +{ +if (a->esz != MO_32 || !dc_isar_feature(aa64_sve2, s)) { +return false; +} +if (sve_access_check(s)) { +unsigned vsz = vec_full_reg_size(s); +tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vec_full_reg_offset(s, a->ra), + cpu_env, vsz, vsz, + (a->index << 2) | (sel << 1) | sub, + gen_helper_sve2_fmlal_zzxw_s); +} +return true; +} + +static bool trans_FMLALB_zzxw(DisasContext *s, arg_rrxr_esz *a) +{ +return do_FMLAL_zzxw(s, a, false, false); +} + +static bool trans_FMLALT_zzxw(DisasContext *s, arg_rrxr_esz *a) +{ +return do_FMLAL_zzxw(s, a, false, true); +} + +static bool trans_FMLSLB_zzxw(DisasContext *s, arg_rrxr_esz *a) +{ +return do_FMLAL_zzxw(s, a, true, false); +} + +static bool trans_FMLSLT_zzxw(DisasContext *s, arg_rrxr_esz *a) +{ +return do_FMLAL_zzxw(s, a, true, true); +} diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index a51cbf2c7e..09847788d9 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -29,10 +29,14 @@ so addressing units smaller than that needs a host-endian
[PATCH v2 080/100] target/arm: Use helper_neon_sq{, r}dmul_* for aa64 advsimd
Signed-off-by: Richard Henderson --- target/arm/helper.h| 10 target/arm/translate-a64.c | 33 + target/arm/vec_helper.c| 49 ++ 3 files changed, 82 insertions(+), 10 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index ce6ff95672..e1cac31e95 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -807,6 +807,16 @@ DEF_HELPER_FLAGS_5(gvec_mls_idx_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_mls_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(neon_sqdmulh_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(neon_sqdmulh_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(neon_sqrdmulh_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(neon_sqrdmulh_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve2_sqdmulh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqdmulh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqdmulh_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 5ef6ecfbf1..7c98938077 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -605,6 +605,20 @@ static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn, tcg_temp_free_ptr(fpst); } +/* Expand a 3-operand + qc + operation using an out-of-line helper. */ +static void gen_gvec_op3_qc(DisasContext *s, bool is_q, int rd, int rn, +int rm, gen_helper_gvec_3_ptr *fn) +{ +TCGv_ptr qc_ptr = tcg_temp_new_ptr(); + +tcg_gen_addi_ptr(qc_ptr, cpu_env, offsetof(CPUARMState, vfp.qc)); +tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), qc_ptr, + is_q ? 16 : 8, vec_full_reg_size(s), 0, fn); +tcg_temp_free_ptr(qc_ptr); +} + /* Expand a 4-operand operation using an out-of-line helper. */ static void gen_gvec_op4_ool(DisasContext *s, bool is_q, int rd, int rn, int rm, int ra, int data, gen_helper_gvec_4 *fn) @@ -11270,6 +11284,15 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_mla, size); } return; +case 0x16: /* SQDMULH, SQRDMULH */ +{ +static gen_helper_gvec_3_ptr * const fns[2][2] = { +{ gen_helper_neon_sqdmulh_h, gen_helper_neon_sqrdmulh_h }, +{ gen_helper_neon_sqdmulh_s, gen_helper_neon_sqrdmulh_s }, +}; +gen_gvec_op3_qc(s, is_q, rd, rn, rm, fns[size - 1][u]); +} +return; case 0x11: if (!u) { /* CMTST */ gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_cmtst, size); @@ -11381,16 +11404,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn) genenvfn = fns[size][u]; break; } -case 0x16: /* SQDMULH, SQRDMULH */ -{ -static NeonGenTwoOpEnvFn * const fns[2][2] = { -{ gen_helper_neon_qdmulh_s16, gen_helper_neon_qrdmulh_s16 }, -{ gen_helper_neon_qdmulh_s32, gen_helper_neon_qrdmulh_s32 }, -}; -assert(size == 1 || size == 2); -genenvfn = fns[size - 1][u]; -break; -} default: g_assert_not_reached(); } diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 766555a5d6..d73c1afe30 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -194,6 +194,30 @@ void HELPER(sve2_sqrdmlsh_h)(void *vd, void *vn, void *vm, } } +void HELPER(neon_sqdmulh_h)(void *vd, void *vn, void *vm, +void *vq, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int16_t *d = vd, *n = vn, *m = vm; + +for (i = 0; i < opr_sz / 2; ++i) { +d[i] = do_sqrdmlah_h(n[i], m[i], 0, false, false, vq); +} +clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(neon_sqrdmulh_h)(void *vd, void *vn, void *vm, + void *vq, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int16_t *d = vd, *n = vn, *m = vm; + +for (i = 0; i < opr_sz / 2; ++i) { +d[i] = do_sqrdmlah_h(n[i], m[i], 0, false, true, vq); +} +clear_tail(d, opr_sz, simd_maxsz(desc)); +} + void HELPER(sve2_sqdmulh_h)(void *vd, void *vn, void *vm, uint32_t desc) { intptr_t i, opr_sz = simd_oprsz(desc); @@ -291,6 +315,7 @@ void HELPER(sve2_sqrdmlah_s)(void *vd, void *vn, void *vm, } } + void HELPER(sve2_sqrdmlsh_s)(void *vd, void *vn, void
[PATCH v2 090/100] target/arm: Implement SVE2 FCVTNT
From: Stephen Long Signed-off-by: Stephen Long Message-Id: <20200428174332.17162-2-stepl...@quicinc.com> Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 5 + target/arm/sve.decode | 4 target/arm/sve_helper.c| 20 target/arm/translate-sve.c | 16 4 files changed, 45 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index a3d49506d0..252344bda6 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2246,3 +2246,8 @@ DEF_HELPER_FLAGS_5(sve2_sqrdcmlah_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_sqrdcmlah_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 609ecce520..9ba4bb476e 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1573,3 +1573,7 @@ SM4E01000101 00 10001 1 11100 0 . . @rdn_rm_e0 # SVE2 crypto constructive binary operations SM4EKEY 01000101 00 1 . 0 0 . . @rd_rn_rm_e0 RAX101000101 00 1 . 0 1 . . @rd_rn_rm_e0 + +### SVE2 floating-point convert precision odd elements +FCVTNT_sh 01100100 10 0010 00 101 ... . . @rd_pg_rn_e0 +FCVTNT_ds 01100100 11 0010 10 101 ... . . @rd_pg_rn_e0 diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 6e9e43a1f9..7a5b0d37c5 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -7174,3 +7174,23 @@ void HELPER(fmmla_d)(void *vd, void *vn, void *vm, void *va, d[3] = float64_add(a[3], float64_add(p0, p1, status), status); } } + +#define DO_FCVTNT(NAME, TYPEW, TYPEN, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc) \ +{ \ +intptr_t i = simd_oprsz(desc);\ +uint64_t *g = vg; \ +do { \ +uint64_t pg = g[(i - 1) >> 6];\ +do { \ +i -= sizeof(TYPEW); \ +if (likely((pg >> (i & 63)) & 1)) { \ +TYPEW nn = *(TYPEW *)(vn + HW(i));\ +*(TYPEN *)(vd + HN(i + sizeof(TYPEN))) = OP(nn, status); \ +} \ +} while (i & 63); \ +} while (i != 0); \ +} + +DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16) +DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t, H1_4, H1_2, float64_to_float32) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 6b9d715c2d..3c145857db 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7780,3 +7780,19 @@ static bool trans_RAX1(DisasContext *s, arg_rrr_esz *a) } return true; } + +static bool trans_FCVTNT_sh(DisasContext *s, arg_rpr_esz *a) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_sh); +} + +static bool trans_FCVTNT_ds(DisasContext *s, arg_rpr_esz *a) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_ds); +} -- 2.25.1
[PATCH v2 087/100] target/arm: Implement SVE2 crypto destructive binary operations
Signed-off-by: Richard Henderson --- target/arm/cpu.h | 5 + target/arm/sve.decode | 7 +++ target/arm/translate-sve.c | 38 ++ 3 files changed, 50 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index df0a3e201b..37fc866cf8 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -3877,6 +3877,11 @@ static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id) return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0; } +static inline bool isar_feature_aa64_sve2_sm4(const ARMISARegisters *id) +{ +return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, SM4) != 0; +} + static inline bool isar_feature_aa64_sve2_i8mm(const ARMISARegisters *id) { return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, I8MM) != 0; diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 9b0d0f3a5d..2ebf65f376 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -118,6 +118,8 @@ @pd_pn_pm esz:2 .. rm:4 ... rn:4 . rd:4 _esz @rdn_rm esz:2 .. .. rm:5 rd:5 \ _esz rn=%reg_movprfx +@rdn_rm_e0 .. .. .. rm:5 rd:5 \ +_esz rn=%reg_movprfx esz=0 @rdn_sh_i8u esz:2 .. .. . rd:5 \ _esz rn=%reg_movprfx imm=%sh8_i8u @rdn_i8u esz:2 .. ... imm:8 rd:5 \ @@ -1557,3 +1559,8 @@ STNT1_zprz 1110010 .. 10 . 001 ... . . \ # SVE2 crypto unary operations # AESMC and AESIMC AESMC 01000101 00 1011100 decrypt:1 0 rd:5 + +# SVE2 crypto destructive binary operations +AESE01000101 00 10001 0 11100 0 . . @rdn_rm_e0 +AESD01000101 00 10001 0 11100 1 . . @rdn_rm_e0 +SM4E01000101 00 10001 1 11100 0 . . @rdn_rm_e0 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index bc65b3e367..92140ed2fa 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7693,3 +7693,41 @@ static bool trans_AESMC(DisasContext *s, arg_AESMC *a) } return true; } + +static bool do_aese(DisasContext *s, arg_rrr_esz *a, bool decrypt) +{ +if (!dc_isar_feature(aa64_sve2_aes, s)) { +return false; +} +if (sve_access_check(s)) { +gen_gvec_ool_zzz(s, gen_helper_crypto_aese, + a->rd, a->rn, a->rm, decrypt); +} +return true; +} + +static bool trans_AESE(DisasContext *s, arg_rrr_esz *a) +{ +return do_aese(s, a, false); +} + +static bool trans_AESD(DisasContext *s, arg_rrr_esz *a) +{ +return do_aese(s, a, true); +} + +static bool do_sm4(DisasContext *s, arg_rrr_esz *a, gen_helper_gvec_3 *fn) +{ +if (!dc_isar_feature(aa64_sve2_sm4, s)) { +return false; +} +if (sve_access_check(s)) { +gen_gvec_ool_zzz(s, fn, a->rd, a->rn, a->rm, 0); +} +return true; +} + +static bool trans_SM4E(DisasContext *s, arg_rrr_esz *a) +{ +return do_sm4(s, a, gen_helper_crypto_sm4e); +} -- 2.25.1
[PATCH v2 094/100] target/arm: Implement SVE2 FLOGB
From: Stephen Long Signed-off-by: Stephen Long Message-Id: <20200430191405.21641-1-stepl...@quicinc.com> [rth: Fixed esz index and c++ comments] Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 4 target/arm/sve.decode | 3 +++ target/arm/sve_helper.c| 49 ++ target/arm/translate-sve.c | 9 +++ 4 files changed, 65 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 935655d07a..aa7d113232 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2256,3 +2256,7 @@ DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_fcvtlt_sd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(flogb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(flogb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(flogb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 6c0e39d553..6808ff4194 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1581,3 +1581,6 @@ FCVTNT_sh 01100100 10 0010 00 101 ... . . @rd_pg_rn_e0 FCVTLT_hs 01100100 10 0010 01 101 ... . . @rd_pg_rn_e0 FCVTNT_ds 01100100 11 0010 10 101 ... . . @rd_pg_rn_e0 FCVTLT_sd 01100100 11 0010 11 101 ... . . @rd_pg_rn_e0 + +### SVE2 floating-point convert to integer +FLOGB 01100101 00 011 esz:2 0101 pg:3 rn:5 rd:5 _esz diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 8bfc9393a1..1b92f203c2 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1121,6 +1121,55 @@ DO_ZPZ_D(sve2_sqneg_d, uint64_t, DO_SQNEG) DO_ZPZ(sve2_urecpe_s, uint32_t, H1_4, helper_recpe_u32) DO_ZPZ(sve2_ursqrte_s, uint32_t, H1_4, helper_rsqrte_u32) +static int16_t do_float16_logb_as_int(float16 a) +{ +if (float16_is_normal(a)) { +return extract16(a, 10, 5) - 15; +} else if (float16_is_infinity(a)) { +return INT16_MAX; +} else if (float16_is_any_nan(a) || float16_is_zero(a)) { +return INT16_MIN; +} else { +/* denormal */ +int shift = 6 - clz32(extract16(a, 0, 10)) - 16; +return -15 - shift + 1; +} +} + +static int32_t do_float32_logb_as_int(float32 a) +{ +if (float32_is_normal(a)) { +return extract32(a, 23, 8) - 127; +} else if (float32_is_infinity(a)) { +return INT32_MAX; +} else if (float32_is_any_nan(a) || float32_is_zero(a)) { +return INT32_MIN; +} else { +/* denormal */ +int shift = 9 - clz32(extract32(a, 0, 23)); +return -127 - shift + 1; +} +} + +static int64_t do_float64_logb_as_int(float64 a) +{ +if (float64_is_normal(a)) { +return extract64(a, 52, 11) - 1023; +} else if (float64_is_infinity(a)) { +return INT64_MAX; +} else if (float64_is_any_nan(a) || float64_is_zero(a)) { +return INT64_MIN; +} else { +/* denormal */ +int shift = 12 - clz64(extract64(a, 0, 52)); +return -1023 - shift + 1; +} +} + +DO_ZPZ(flogb_h, float16, H1_2, do_float16_logb_as_int) +DO_ZPZ(flogb_s, float32, H1_4, do_float32_logb_as_int) +DO_ZPZ(flogb_d, float64, , do_float64_logb_as_int) + /* Three-operand expander, unpredicated, in which the third operand is "wide". */ #define DO_ZZW(NAME, TYPE, TYPEW, H, OP) \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 0232381500..f3b2463b7c 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7841,3 +7841,12 @@ static bool trans_FCVTXNT_ds(DisasContext *s, arg_rpr_esz *a) } return do_frint_mode(s, a, float_round_to_odd, gen_helper_sve2_fcvtnt_ds); } + +static bool trans_FLOGB(DisasContext *s, arg_rpr_esz *a) +{ +static gen_helper_gvec_3 * const fns[] = { +NULL, gen_helper_flogb_h, +gen_helper_flogb_s, gen_helper_flogb_d +}; +return do_sve2_zpz_ool(s, a, fns[a->esz]); +} -- 2.25.1
[PATCH v2 081/100] target/arm: Implement SVE2 saturating multiply high (indexed)
Signed-off-by: Richard Henderson --- target/arm/helper.h| 14 ++ target/arm/sve.decode | 8 target/arm/translate-sve.c | 8 target/arm/vec_helper.c| 88 ++ 4 files changed, 118 insertions(+) diff --git a/target/arm/helper.h b/target/arm/helper.h index e1cac31e95..e9d7ab97da 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -827,6 +827,20 @@ DEF_HELPER_FLAGS_4(sve2_sqrdmulh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqrdmulh_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqrdmulh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqdmulh_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqdmulh_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqdmulh_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_sqrdmulh_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqrdmulh_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqrdmulh_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 6879870cc1..80d76982e8 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -841,6 +841,14 @@ SQDMULLB_zzx_d 01000100 .. 1 . 1110.0 . . @rrxl_d SQDMULLT_zzx_s 01000100 .. 1 . 1110.1 . . @rrxl_s SQDMULLT_zzx_d 01000100 .. 1 . 1110.1 . . @rrxl_d +# SVE2 saturating multiply high (indexed) +SQDMULH_zzx_h 01000100 .. 1 . 00 . . @rrx_h +SQDMULH_zzx_s 01000100 .. 1 . 00 . . @rrx_s +SQDMULH_zzx_d 01000100 .. 1 . 00 . . @rrx_d +SQRDMULH_zzx_h 01000100 .. 1 . 01 . . @rrx_h +SQRDMULH_zzx_s 01000100 .. 1 . 01 . . @rrx_s +SQRDMULH_zzx_d 01000100 .. 1 . 01 . . @rrx_d + # SVE2 integer multiply (indexed) MUL_zzx_h 01000100 .. 1 . 10 . . @rrx_h MUL_zzx_s 01000100 .. 1 . 10 . . @rrx_s diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index a13d2f5711..fa28f011b6 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3868,6 +3868,14 @@ DO_SVE2_RRX(trans_MUL_zzx_h, gen_helper_gvec_mul_idx_h) DO_SVE2_RRX(trans_MUL_zzx_s, gen_helper_gvec_mul_idx_s) DO_SVE2_RRX(trans_MUL_zzx_d, gen_helper_gvec_mul_idx_d) +DO_SVE2_RRX(trans_SQDMULH_zzx_h, gen_helper_sve2_sqdmulh_idx_h) +DO_SVE2_RRX(trans_SQDMULH_zzx_s, gen_helper_sve2_sqdmulh_idx_s) +DO_SVE2_RRX(trans_SQDMULH_zzx_d, gen_helper_sve2_sqdmulh_idx_d) + +DO_SVE2_RRX(trans_SQRDMULH_zzx_h, gen_helper_sve2_sqrdmulh_idx_h) +DO_SVE2_RRX(trans_SQRDMULH_zzx_s, gen_helper_sve2_sqrdmulh_idx_s) +DO_SVE2_RRX(trans_SQRDMULH_zzx_d, gen_helper_sve2_sqrdmulh_idx_d) + #undef DO_SVE2_RRX #define DO_SVE2_RRX_TB(NAME, FUNC, TOP) \ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index d73c1afe30..8e85a16e7e 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -240,6 +240,36 @@ void HELPER(sve2_sqrdmulh_h)(void *vd, void *vn, void *vm, uint32_t desc) } } +void HELPER(sve2_sqdmulh_idx_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, j, opr_sz = simd_oprsz(desc); +int idx = simd_data(desc); +int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx); +uint32_t discard; + +for (i = 0; i < opr_sz / 2; i += 16 / 2) { +int16_t mm = m[i]; +for (j = 0; j < 16 / 2; ++j) { +d[i + j] = do_sqrdmlah_h(n[i + j], mm, 0, false, false, ); +} +} +} + +void HELPER(sve2_sqrdmulh_idx_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, j, opr_sz = simd_oprsz(desc); +int idx = simd_data(desc); +int16_t *d = vd, *n = vn, *m = (int16_t *)vm + H2(idx); +uint32_t discard; + +for (i = 0; i < opr_sz / 2; i += 16 / 2) { +int16_t mm = m[i]; +for (j = 0; j < 16 / 2; ++j) { +d[i + j] = do_sqrdmlah_h(n[i + j], mm, 0, false, true, ); +} +} +} + /* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */ int32_t do_sqrdmlah_s(int32_t src1, int32_t src2, int32_t src3, bool neg, bool round, uint32_t *sat) @@ -374,6 +404,36 @@ void HELPER(sve2_sqrdmulh_s)(void *vd, void *vn, void *vm, uint32_t desc) } } +void HELPER(sve2_sqdmulh_idx_s)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, j, opr_sz = simd_oprsz(desc); +int idx = simd_data(desc); +int32_t *d = vd, *n = vn, *m = (int32_t *)vm + H4(idx); +uint32_t discard; + +for
[PATCH v2 079/100] target/arm: Implement SVE2 signed saturating doubling multiply high
Signed-off-by: Richard Henderson --- target/arm/helper.h| 10 + target/arm/sve.decode | 4 ++ target/arm/translate-sve.c | 18 target/arm/vec_helper.c| 84 ++ 4 files changed, 116 insertions(+) diff --git a/target/arm/helper.h b/target/arm/helper.h index 7964d299f6..ce6ff95672 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -807,6 +807,16 @@ DEF_HELPER_FLAGS_5(gvec_mls_idx_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_mls_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqdmulh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqdmulh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqdmulh_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqdmulh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_sqrdmulh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqrdmulh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqrdmulh_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqrdmulh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 400940a18d..6879870cc1 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1214,6 +1214,10 @@ SMULH_zzz 0100 .. 1 . 0110 10 . . @rd_rn_rm UMULH_zzz 0100 .. 1 . 0110 11 . . @rd_rn_rm PMUL_zzz0100 00 1 . 0110 01 . . @rd_rn_rm_e0 +# SVE2 signed saturating doubling multiply high (unpredicated) +SQDMULH_zzz 0100 .. 1 . 0111 00 . . @rd_rn_rm +SQRDMULH_zzz0100 .. 1 . 0111 01 . . @rd_rn_rm + ### SVE2 Integer - Predicated SADALP_zpzz 01000100 .. 000 100 101 ... . . @rdm_pg_rn diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 4246d721d9..a13d2f5711 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5990,6 +5990,24 @@ static bool trans_PMUL_zzz(DisasContext *s, arg_rrr_esz *a) return do_sve2_zzz_ool(s, a, gen_helper_gvec_pmul_b); } +static bool trans_SQDMULH_zzz(DisasContext *s, arg_rrr_esz *a) +{ +static gen_helper_gvec_3 * const fns[4] = { +gen_helper_sve2_sqdmulh_b, gen_helper_sve2_sqdmulh_h, +gen_helper_sve2_sqdmulh_s, gen_helper_sve2_sqdmulh_d, +}; +return do_sve2_zzz_ool(s, a, fns[a->esz]); +} + +static bool trans_SQRDMULH_zzz(DisasContext *s, arg_rrr_esz *a) +{ +static gen_helper_gvec_3 * const fns[4] = { +gen_helper_sve2_sqrdmulh_b, gen_helper_sve2_sqrdmulh_h, +gen_helper_sve2_sqrdmulh_s, gen_helper_sve2_sqrdmulh_d, +}; +return do_sve2_zzz_ool(s, a, fns[a->esz]); +} + /* * SVE2 Integer - Predicated */ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index fb8596c1fd..766555a5d6 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -81,6 +81,26 @@ void HELPER(sve2_sqrdmlsh_b)(void *vd, void *vn, void *vm, } } +void HELPER(sve2_sqdmulh_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int8_t *d = vd, *n = vn, *m = vm; + +for (i = 0; i < opr_sz; ++i) { +d[i] = do_sqrdmlah_b(n[i], m[i], 0, false, false); +} +} + +void HELPER(sve2_sqrdmulh_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int8_t *d = vd, *n = vn, *m = vm; + +for (i = 0; i < opr_sz; ++i) { +d[i] = do_sqrdmlah_b(n[i], m[i], 0, false, true); +} +} + /* Signed saturating rounding doubling multiply-accumulate high half, 16-bit */ int16_t do_sqrdmlah_h(int16_t src1, int16_t src2, int16_t src3, bool neg, bool round, uint32_t *sat) @@ -174,6 +194,28 @@ void HELPER(sve2_sqrdmlsh_h)(void *vd, void *vn, void *vm, } } +void HELPER(sve2_sqdmulh_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int16_t *d = vd, *n = vn, *m = vm; +uint32_t discard; + +for (i = 0; i < opr_sz / 2; ++i) { +d[i] = do_sqrdmlah_h(n[i], m[i], 0, false, false, ); +} +} + +void HELPER(sve2_sqrdmulh_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int16_t *d = vd, *n = vn, *m = vm; +uint32_t discard; + +for (i = 0; i < opr_sz / 2; ++i) { +d[i] = do_sqrdmlah_h(n[i], m[i], 0, false, true, ); +} +} + /* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */ int32_t do_sqrdmlah_s(int32_t src1, int32_t src2, int32_t src3, bool neg, bool round, uint32_t *sat) @@ -261,6 +303,28 @@ void HELPER(sve2_sqrdmlsh_s)(void *vd, void *vn, void *vm, } } +void HELPER(sve2_sqdmulh_s)(void *vd, void *vn, void *vm,
[PATCH v2 093/100] softfloat: Add float16_is_normal
From: Stephen Long This float16 predicate was missing from the normal set. Signed-off-by: Stephen Long Signed-off-by: Richard Henderson --- include/fpu/softfloat.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h index 16ca697a73..cd1fcfbf0c 100644 --- a/include/fpu/softfloat.h +++ b/include/fpu/softfloat.h @@ -264,6 +264,11 @@ static inline bool float16_is_zero_or_denormal(float16 a) return (float16_val(a) & 0x7c00) == 0; } +static inline bool float16_is_normal(float16 a) +{ +return (((float16_val(a) >> 10) + 1) & 0x1f) >= 2; +} + static inline float16 float16_abs(float16 a) { /* Note that abs does *not* handle NaN specially, nor does -- 2.25.1
[PATCH v2 085/100] target/arm: Implement SVE mixed sign dot product
Signed-off-by: Richard Henderson --- target/arm/helper.h| 2 ++ target/arm/sve.decode | 4 target/arm/translate-sve.c | 16 target/arm/vec_helper.c| 18 ++ 4 files changed, 40 insertions(+) diff --git a/target/arm/helper.h b/target/arm/helper.h index 6fac613dfc..bfeb327272 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -578,6 +578,8 @@ DEF_HELPER_FLAGS_5(gvec_sdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_udot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_sdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_udot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_usdot_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_sdot_idx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 51acbfa797..0be8a020f6 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1523,6 +1523,10 @@ UMLSLT_zzzw 01000100 .. 0 . 010 111 . . @rda_rn_rm CMLA_ 01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5 ra=%reg_movprfx SQRDCMLAH_ 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx +## SVE mixed sign dot product + +USDOT_ 01000100 .. 0 . 011 110 . . @rda_rn_rm + ### SVE2 floating point matrix multiply accumulate FMMLA 01100100 .. 1 . 111001 . . @rda_rn_rm diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index fe4b4b7387..152b0b605d 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7666,3 +7666,19 @@ static bool trans_SQRDCMLAH_(DisasContext *s, arg_SQRDCMLAH_ *a) } return true; } + +static bool trans_USDOT_(DisasContext *s, arg_USDOT_ *a) +{ +if (a->esz != 2 || !dc_isar_feature(aa64_sve2_i8mm, s)) { +return false; +} +if (sve_access_check(s)) { +unsigned vsz = vec_full_reg_size(s); +tcg_gen_gvec_4_ool(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vec_full_reg_offset(s, a->ra), + vsz, vsz, 0, gen_helper_gvec_usdot_b); +} +return true; +} diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index e1689d730f..a51cbf2c7e 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -580,6 +580,24 @@ void HELPER(gvec_udot_b)(void *vd, void *vn, void *vm, void *va, uint32_t desc) clear_tail(d, opr_sz, simd_maxsz(desc)); } +void HELPER(gvec_usdot_b)(void *vd, void *vn, void *vm, + void *va, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int32_t *d = vd, *a = va; +uint8_t *n = vn; +int8_t *m = vm; + +for (i = 0; i < opr_sz / 4; ++i) { +d[i] = (a[i] + +n[i * 4 + 0] * m[i * 4 + 0] + +n[i * 4 + 1] * m[i * 4 + 1] + +n[i * 4 + 2] * m[i * 4 + 2] + +n[i * 4 + 3] * m[i * 4 + 3]); +} +clear_tail(d, opr_sz, simd_maxsz(desc)); +} + void HELPER(gvec_sdot_h)(void *vd, void *vn, void *vm, void *va, uint32_t desc) { intptr_t i, opr_sz = simd_oprsz(desc); -- 2.25.1
[PATCH v2 077/100] target/arm: Implement SVE2 integer multiply long (indexed)
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 5 + target/arm/sve.decode | 16 target/arm/sve_helper.c| 23 +++ target/arm/translate-sve.c | 24 4 files changed, 64 insertions(+), 4 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 08210b2c66..91cce85f17 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2200,3 +2200,8 @@ DEF_HELPER_FLAGS_5(sve2_sqdmlsl_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_sqdmlsl_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_smull_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_smull_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_umull_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_umull_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 36cdd9dab4..f0a4d86428 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -257,6 +257,12 @@ @rrx_d 11 . index:1 rm:4 .. rn:5 rd:5 \ _esz esz=3 +# Two registers and a scalar by index, wide +@rrxl_s 10 ... rm:3 .. rn:5 rd:5 \ +_esz index=%index3_19_11 esz=2 +@rrxl_d 11 .. rm:4 .. rn:5 rd:5 \ +_esz index=%index2_20_11 esz=3 + # Three registers and a scalar by index @rrxr_h 0. . .. rm:3 .. rn:5 rd:5 \ _esz ra=%reg_movprfx index=%index3_22_19 esz=1 @@ -819,6 +825,16 @@ SQDMLSLB_zzxw_d 01000100 .. 1 . 0011.0 . . @rrxw_d SQDMLSLT_zzxw_s 01000100 .. 1 . 0011.1 . . @rrxw_s SQDMLSLT_zzxw_d 01000100 .. 1 . 0011.1 . . @rrxw_d +# SVE2 integer multiply long (indexed) +SMULLB_zzx_s01000100 .. 1 . 1100.0 . . @rrxl_s +SMULLB_zzx_d01000100 .. 1 . 1100.0 . . @rrxl_d +SMULLT_zzx_s01000100 .. 1 . 1100.1 . . @rrxl_s +SMULLT_zzx_d01000100 .. 1 . 1100.1 . . @rrxl_d +UMULLB_zzx_s01000100 .. 1 . 1101.0 . . @rrxl_s +UMULLB_zzx_d01000100 .. 1 . 1101.0 . . @rrxl_d +UMULLT_zzx_s01000100 .. 1 . 1101.1 . . @rrxl_s +UMULLT_zzx_d01000100 .. 1 . 1101.1 . . @rrxl_d + # SVE2 integer multiply (indexed) MUL_zzx_h 01000100 .. 1 . 10 . . @rrx_h MUL_zzx_s 01000100 .. 1 . 10 . . @rrx_s diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 32f5d1d790..4aff792345 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1560,6 +1560,29 @@ DO_ZZXW(sve2_sqdmlsl_idx_d, int64_t, int32_t, , H1_4, DO_SQDMLSL_D) #undef DO_ZZXW +#define DO_ZZX(NAME, TYPEW, TYPEN, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc)\ +{ \ +intptr_t i, j, oprsz = simd_oprsz(desc); \ +intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1) * sizeof(TYPEN); \ +intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 1, 3); \ +for (i = 0; i < oprsz; i += 16) { \ +TYPEW mm = *(TYPEN *)(vm + i + idx); \ +for (j = 0; j < 16; j += sizeof(TYPEW)) { \ +TYPEW nn = *(TYPEN *)(vn + HN(i + j + sel)); \ +*(TYPEW *)(vd + HW(i + j)) = OP(nn, mm); \ +} \ +} \ +} + +DO_ZZX(sve2_smull_idx_s, int32_t, int16_t, H1_4, H1_2, DO_MUL) +DO_ZZX(sve2_smull_idx_d, int64_t, int32_t, , H1_4, DO_MUL) + +DO_ZZX(sve2_umull_idx_s, uint32_t, uint16_t, H1_4, H1_2, DO_MUL) +DO_ZZX(sve2_umull_idx_d, uint64_t, uint32_t, , H1_4, DO_MUL) + +#undef DO_ZZX + #define DO_BITPERM(NAME, TYPE, OP) \ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ { \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 61e59f369f..d8bb877ba5 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3844,8 +3844,8 @@ DO_RRXR(trans_UDOT_zzxw_d, gen_helper_gvec_udot_idx_h) #undef DO_RRXR -static bool do_sve2_zzx_ool(DisasContext *s, arg_rrx_esz *a, -gen_helper_gvec_3 *fn) +static bool do_sve2_zzx_data(DisasContext *s, arg_rrx_esz *a, + gen_helper_gvec_3 *fn, int data) { if (fn == NULL ||
[PATCH v2 076/100] target/arm: Implement SVE2 saturating multiply-add (indexed)
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 9 + target/arm/sve.decode | 18 ++ target/arm/sve_helper.c| 30 ++ target/arm/translate-sve.c | 32 4 files changed, 81 insertions(+), 8 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index c86dcf0c55..08210b2c66 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2191,3 +2191,12 @@ DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_sqdmlal_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqdmlal_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqdmlsl_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqdmlsl_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 5fc76b7fc3..36cdd9dab4 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -30,6 +30,8 @@ %size_2323:2 %dtype_23_1323:2 13:2 %index3_22_19 22:1 19:2 +%index3_19_11 19:2 11:1 +%index2_20_11 20:1 11:1 # A combination of tsz:imm3 -- extract esize. %tszimm_esz 22:2 5:5 !function=tszimm_esz @@ -263,6 +265,12 @@ @rrxr_d 11 . index:1 rm:4 .. rn:5 rd:5 \ _esz ra=%reg_movprfx esz=3 +# Three registers and a scalar by index, wide +@rrxw_s 10 ... rm:3 .. rn:5 rd:5 \ +_esz ra=%reg_movprfx index=%index3_19_11 esz=2 +@rrxw_d 11 .. rm:4 .. rn:5 rd:5 \ +_esz ra=%reg_movprfx index=%index2_20_11 esz=3 + ### # Instruction patterns. Grouped according to the SVE encodingindex.xhtml. @@ -801,6 +809,16 @@ SQRDMLSH_zzxz_h 01000100 .. 1 . 000101 . . @rrxr_h SQRDMLSH_zzxz_s 01000100 .. 1 . 000101 . . @rrxr_s SQRDMLSH_zzxz_d 01000100 .. 1 . 000101 . . @rrxr_d +# SVE2 saturating multiply-add (indexed) +SQDMLALB_zzxw_s 01000100 .. 1 . 0010.0 . . @rrxw_s +SQDMLALB_zzxw_d 01000100 .. 1 . 0010.0 . . @rrxw_d +SQDMLALT_zzxw_s 01000100 .. 1 . 0010.1 . . @rrxw_s +SQDMLALT_zzxw_d 01000100 .. 1 . 0010.1 . . @rrxw_d +SQDMLSLB_zzxw_s 01000100 .. 1 . 0011.0 . . @rrxw_s +SQDMLSLB_zzxw_d 01000100 .. 1 . 0011.0 . . @rrxw_d +SQDMLSLT_zzxw_s 01000100 .. 1 . 0011.1 . . @rrxw_s +SQDMLSLT_zzxw_d 01000100 .. 1 . 0011.1 . . @rrxw_d + # SVE2 integer multiply (indexed) MUL_zzx_h 01000100 .. 1 . 10 . . @rrx_h MUL_zzx_s 01000100 .. 1 . 10 . . @rrx_s diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index b3a87fb0b7..32f5d1d790 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1530,6 +1530,36 @@ DO_ZZXZ(sve2_sqrdmlsh_idx_d, int64_t, , DO_SQRDMLSH_D) #undef DO_ZZXZ +#define DO_ZZXW(NAME, TYPEW, TYPEN, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ +intptr_t i, j, oprsz = simd_oprsz(desc); \ +intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1) * sizeof(TYPEN); \ +intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 1, 3); \ +for (i = 0; i < oprsz; i += 16) { \ +TYPEW mm = *(TYPEN *)(vm + i + idx); \ +for (j = 0; j < 16; j += sizeof(TYPEW)) { \ +TYPEW nn = *(TYPEN *)(vn + HN(i + j + sel)); \ +TYPEW aa = *(TYPEW *)(va + HW(i + j));\ +*(TYPEW *)(vd + HW(i + j)) = OP(nn, mm, aa); \ +} \ +} \ +} + +#define DO_SQDMLAL_S(N, M, A) DO_SQADD_S(A, do_sqdmull_s(N, M)) +#define DO_SQDMLAL_D(N, M, A) do_sqadd_d(A, do_sqdmull_d(N, M)) + +DO_ZZXW(sve2_sqdmlal_idx_s, int32_t, int16_t, H1_4, H1_2, DO_SQDMLAL_S) +DO_ZZXW(sve2_sqdmlal_idx_d, int64_t, int32_t, , H1_4, DO_SQDMLAL_D) + +#define DO_SQDMLSL_S(N, M, A) DO_SQSUB_S(A, do_sqdmull_s(N, M)) +#define DO_SQDMLSL_D(N, M, A) do_sqsub_d(A, do_sqdmull_d(N, M)) + +DO_ZZXW(sve2_sqdmlsl_idx_s, int32_t, int16_t, H1_4, H1_2, DO_SQDMLSL_S)
[PATCH v2 088/100] target/arm: Implement SVE2 crypto constructive binary operations
Signed-off-by: Richard Henderson --- target/arm/cpu.h | 5 + target/arm/sve.decode | 4 target/arm/translate-sve.c | 16 3 files changed, 25 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 37fc866cf8..1be8f51162 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -3877,6 +3877,11 @@ static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id) return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0; } +static inline bool isar_feature_aa64_sve2_sha3(const ARMISARegisters *id) +{ +return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, SHA3) != 0; +} + static inline bool isar_feature_aa64_sve2_sm4(const ARMISARegisters *id) { return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, SM4) != 0; diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 2ebf65f376..5f2fad4754 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1564,3 +1564,7 @@ AESMC 01000101 00 1011100 decrypt:1 0 rd:5 AESE01000101 00 10001 0 11100 0 . . @rdn_rm_e0 AESD01000101 00 10001 0 11100 1 . . @rdn_rm_e0 SM4E01000101 00 10001 1 11100 0 . . @rdn_rm_e0 + +# SVE2 crypto constructive binary operations +SM4EKEY 01000101 00 1 . 0 0 . . @rd_rn_rm_e0 +RAX101000101 00 1 . 0 1 . . @rd_rn_rm_e0 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 92140ed2fa..3da42e2743 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7731,3 +7731,19 @@ static bool trans_SM4E(DisasContext *s, arg_rrr_esz *a) { return do_sm4(s, a, gen_helper_crypto_sm4e); } + +static bool trans_SM4EKEY(DisasContext *s, arg_rrr_esz *a) +{ +return do_sm4(s, a, gen_helper_crypto_sm4ekey); +} + +static bool trans_RAX1(DisasContext *s, arg_rrr_esz *a) +{ +if (!dc_isar_feature(aa64_sve2_sha3, s)) { +return false; +} +if (sve_access_check(s)) { +gen_gvec_fn_zzz(s, gen_gvec_rax1, MO_64, a->rd, a->rn, a->rm); +} +return true; +} -- 2.25.1
[PATCH v2 078/100] target/arm: Implement SVE2 saturating multiply (indexed)
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 5 + target/arm/sve.decode | 6 ++ target/arm/sve_helper.c| 3 +++ target/arm/translate-sve.c | 5 + 4 files changed, 19 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 91cce85f17..e8e616a247 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2205,3 +2205,8 @@ DEF_HELPER_FLAGS_4(sve2_smull_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_smull_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_umull_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_umull_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_sqdmull_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqdmull_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index f0a4d86428..400940a18d 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -835,6 +835,12 @@ UMULLB_zzx_d01000100 .. 1 . 1101.0 . . @rrxl_d UMULLT_zzx_s01000100 .. 1 . 1101.1 . . @rrxl_s UMULLT_zzx_d01000100 .. 1 . 1101.1 . . @rrxl_d +# SVE2 saturating multiply (indexed) +SQDMULLB_zzx_s 01000100 .. 1 . 1110.0 . . @rrxl_s +SQDMULLB_zzx_d 01000100 .. 1 . 1110.0 . . @rrxl_d +SQDMULLT_zzx_s 01000100 .. 1 . 1110.1 . . @rrxl_s +SQDMULLT_zzx_d 01000100 .. 1 . 1110.1 . . @rrxl_d + # SVE2 integer multiply (indexed) MUL_zzx_h 01000100 .. 1 . 10 . . @rrx_h MUL_zzx_s 01000100 .. 1 . 10 . . @rrx_s diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 4aff792345..52a235826c 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1581,6 +1581,9 @@ DO_ZZX(sve2_smull_idx_d, int64_t, int32_t, , H1_4, DO_MUL) DO_ZZX(sve2_umull_idx_s, uint32_t, uint16_t, H1_4, H1_2, DO_MUL) DO_ZZX(sve2_umull_idx_d, uint64_t, uint32_t, , H1_4, DO_MUL) +DO_ZZX(sve2_sqdmull_idx_s, int32_t, int16_t, H1_4, H1_2, do_sqdmull_s) +DO_ZZX(sve2_sqdmull_idx_d, int64_t, int32_t, , H1_4, do_sqdmull_d) + #undef DO_ZZX #define DO_BITPERM(NAME, TYPE, OP) \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index d8bb877ba5..4246d721d9 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3884,6 +3884,11 @@ DO_SVE2_RRX_TB(trans_UMULLB_zzx_d, gen_helper_sve2_umull_idx_d, false) DO_SVE2_RRX_TB(trans_UMULLT_zzx_s, gen_helper_sve2_umull_idx_s, true) DO_SVE2_RRX_TB(trans_UMULLT_zzx_d, gen_helper_sve2_umull_idx_d, true) +DO_SVE2_RRX_TB(trans_SQDMULLB_zzx_s, gen_helper_sve2_sqdmull_idx_s, false) +DO_SVE2_RRX_TB(trans_SQDMULLB_zzx_d, gen_helper_sve2_sqdmull_idx_d, false) +DO_SVE2_RRX_TB(trans_SQDMULLT_zzx_s, gen_helper_sve2_sqdmull_idx_s, true) +DO_SVE2_RRX_TB(trans_SQDMULLT_zzx_d, gen_helper_sve2_sqdmull_idx_d, true) + #undef DO_SVE2_RRX_TB static bool do_sve2_zzxz_data(DisasContext *s, arg_rrxr_esz *a, -- 2.25.1
[PATCH v2 074/100] target/arm: Use helper_gvec_ml{a, s}_idx_* for aa64 advsimd
Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index a3135754ce..5ef6ecfbf1 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -13053,6 +13053,40 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn) return; } break; + +case 0x10: /* MLA */ +if (!is_long && !is_scalar) { +static gen_helper_gvec_4 * const fns[3] = { +gen_helper_gvec_mla_idx_h, +gen_helper_gvec_mla_idx_s, +gen_helper_gvec_mla_idx_d, +}; +tcg_gen_gvec_4_ool(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + vec_full_reg_offset(s, rd), + is_q ? 16 : 8, vec_full_reg_size(s), + index, fns[size - 1]); +return; +} +break; + +case 0x14: /* MLS */ +if (!is_long && !is_scalar) { +static gen_helper_gvec_4 * const fns[3] = { +gen_helper_gvec_mls_idx_h, +gen_helper_gvec_mls_idx_s, +gen_helper_gvec_mls_idx_d, +}; +tcg_gen_gvec_4_ool(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + vec_full_reg_offset(s, rd), + is_q ? 16 : 8, vec_full_reg_size(s), + index, fns[size - 1]); +return; +} +break; } if (size == 3) { -- 2.25.1
[PATCH v2 071/100] target/arm: Implement SVE2 integer multiply (indexed)
Signed-off-by: Richard Henderson --- target/arm/helper.h| 4 target/arm/sve.decode | 7 +++ target/arm/translate-sve.c | 30 ++ target/arm/vec_helper.c| 29 + 4 files changed, 66 insertions(+), 4 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index 331f77c908..ce81a16a58 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -789,6 +789,10 @@ DEF_HELPER_FLAGS_4(gvec_uaba_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_xar_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_mul_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_mul_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_mul_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/sve.decode b/target/arm/sve.decode index ea6ec5f198..fa0a572da6 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -777,12 +777,19 @@ MUL_zzi 00100101 .. 110 000 110 . @rdn_i8s DOT_01000100 1 sz:1 0 rm:5 0 u:1 rn:5 rd:5 \ ra=%reg_movprfx + SVE Multiply - Indexed + # SVE integer dot product (indexed) SDOT_zzxw_s 01000100 .. 1 . 00 . . @rrxr_s SDOT_zzxw_d 01000100 .. 1 . 00 . . @rrxr_d UDOT_zzxw_s 01000100 .. 1 . 01 . . @rrxr_s UDOT_zzxw_d 01000100 .. 1 . 01 . . @rrxr_d +# SVE2 integer multiply (indexed) +MUL_zzx_h 01000100 .. 1 . 10 . . @rrx_h +MUL_zzx_s 01000100 .. 1 . 10 . . @rrx_s +MUL_zzx_d 01000100 .. 1 . 10 . . @rrx_d + # SVE floating-point complex add (predicated) FCADD 01100100 esz:2 0 rot:1 100 pg:3 rm:5 rd:5 \ rn=%reg_movprfx diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 88ffc458ee..dd2cd22061 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3817,6 +3817,10 @@ static bool trans_DOT_(DisasContext *s, arg_DOT_ *a) return true; } +/* + * SVE Multiply - Indexed + */ + static bool do_zzxz_ool(DisasContext *s, arg_rrxr_esz *a, gen_helper_gvec_4 *fn) { @@ -3840,6 +3844,32 @@ DO_RRXR(trans_UDOT_zzxw_d, gen_helper_gvec_udot_idx_h) #undef DO_RRXR +static bool do_sve2_zzx_ool(DisasContext *s, arg_rrx_esz *a, +gen_helper_gvec_3 *fn) +{ +if (fn == NULL || !dc_isar_feature(aa64_sve2, s)) { +return false; +} +if (sve_access_check(s)) { +unsigned vsz = vec_full_reg_size(s); +tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vsz, vsz, a->index, fn); +} +return true; +} + +#define DO_SVE2_RRX(NAME, FUNC) \ +static bool NAME(DisasContext *s, arg_rrx_esz *a) \ +{ return do_sve2_zzx_ool(s, a, FUNC); } + +DO_SVE2_RRX(trans_MUL_zzx_h, gen_helper_gvec_mul_idx_h) +DO_SVE2_RRX(trans_MUL_zzx_s, gen_helper_gvec_mul_idx_s) +DO_SVE2_RRX(trans_MUL_zzx_d, gen_helper_gvec_mul_idx_d) + +#undef DO_SVE2_RRX + /* *** SVE Floating Point Multiply-Add Indexed Group */ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 5c0760de05..08eadf06fc 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -863,6 +863,27 @@ DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64) */ #define DO_MUL_IDX(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ +intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \ +intptr_t idx = simd_data(desc);\ +TYPE *d = vd, *n = vn, *m = vm;\ +for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \ +TYPE mm = m[H(i + idx)]; \ +for (j = 0; j < segment; j++) {\ +d[i + j] = n[i + j] * mm; \ +} \ +} \ +clear_tail(d, oprsz, simd_maxsz(desc));\ +} + +DO_MUL_IDX(gvec_mul_idx_h, uint16_t, H2) +DO_MUL_IDX(gvec_mul_idx_s, uint32_t, H4) +DO_MUL_IDX(gvec_mul_idx_d, uint64_t, ) + +#undef DO_MUL_IDX + +#define DO_FMUL_IDX(NAME, TYPE, H) \ void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \ {
[PATCH v2 067/100] target/arm: Pass separate addend to {U, S}DOT helpers
For SVE, we potentially have a 4th argument coming from the movprfx instruction. Currently we do not optimize movprfx, so the problem is not visible. Signed-off-by: Richard Henderson --- target/arm/helper.h | 20 +++--- target/arm/sve.decode | 7 +- target/arm/translate-a64.c | 15 - target/arm/translate-neon.inc.c | 10 +-- target/arm/translate-sve.c | 13 ++-- target/arm/vec_helper.c | 112 ++-- 6 files changed, 105 insertions(+), 72 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index 7a29194052..dd32a11b9d 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -574,15 +574,19 @@ DEF_HELPER_FLAGS_5(sve2_sqrdmlah_d, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_sdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_udot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_sdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_udot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_sdot_idx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_udot_idx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_sdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_udot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_idx_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_idx_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 0688dae450..5815ba9b1c 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -756,12 +756,13 @@ UMIN_zzi00100101 .. 101 011 110 . @rdn_i8u MUL_zzi 00100101 .. 110 000 110 . @rdn_i8s # SVE integer dot product (unpredicated) -DOT_zzz 01000100 1 sz:1 0 rm:5 0 u:1 rn:5 rd:5 ra=%reg_movprfx +DOT_01000100 1 sz:1 0 rm:5 0 u:1 rn:5 rd:5 \ +ra=%reg_movprfx # SVE integer dot product (indexed) -DOT_zzx 01000100 101 index:2 rm:3 0 u:1 rn:5 rd:5 \ +DOT_zzxw01000100 101 index:2 rm:3 0 u:1 rn:5 rd:5 \ sz=0 ra=%reg_movprfx -DOT_zzx 01000100 111 index:1 rm:4 0 u:1 rn:5 rd:5 \ +DOT_zzxw01000100 111 index:1 rm:4 0 u:1 rn:5 rd:5 \ sz=1 ra=%reg_movprfx # SVE floating-point complex add (predicated) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 4f5c433b47..7366553f8d 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -605,6 +605,17 @@ static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, int rd, int rn, tcg_temp_free_ptr(fpst); } +/* Expand a 4-operand operation using an out-of-line helper. */ +static void gen_gvec_op4_ool(DisasContext *s, bool is_q, int rd, int rn, + int rm, int ra, int data, gen_helper_gvec_4 *fn) +{ +tcg_gen_gvec_4_ool(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + vec_full_reg_offset(s, ra), + is_q ? 16 : 8, vec_full_reg_size(s), data, fn); +} + /* Set ZF and NF based on a 64 bit result. This is alas fiddlier * than the 32 bit equivalent. */ @@ -11710,7 +11721,7 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn) return; case 0x2: /* SDOT / UDOT */ -gen_gvec_op3_ool(s, is_q, rd, rn, rm, 0, +gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b); return; @@ -12972,7 +12983,7 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn) switch (16 * u + opcode) { case 0x0e: /* SDOT */ case 0x1e: /* UDOT */ -gen_gvec_op3_ool(s, is_q, rd, rn, rm, index, +gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index, u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b); return; diff --git
[PATCH v2 083/100] target/arm: Implement SVE2 complex integer multiply-add (indexed)
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 9 +++ target/arm/sve.decode | 12 target/arm/sve_helper.c| 142 +++-- target/arm/translate-sve.c | 38 +++--- 4 files changed, 169 insertions(+), 32 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index f309753620..d7e2d168ba 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2227,3 +2227,12 @@ DEF_HELPER_FLAGS_5(sve2_umlsl_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_umlsl_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_cmla_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_cmla_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdcmlah_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdcmlah_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index da77ad689f..e8011fe91b 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -825,6 +825,18 @@ SQDMLSLB_zzxw_d 01000100 .. 1 . 0011.0 . . @rrxw_d SQDMLSLT_zzxw_s 01000100 .. 1 . 0011.1 . . @rrxw_s SQDMLSLT_zzxw_d 01000100 .. 1 . 0011.1 . . @rrxw_d +# SVE2 complex integer multiply-add (indexed) +CMLA_zzxz_h 01000100 10 1 index:2 rm:3 0110 rot:2 rn:5 rd:5 \ +ra=%reg_movprfx +CMLA_zzxz_s 01000100 11 1 index:1 rm:4 0110 rot:2 rn:5 rd:5 \ +ra=%reg_movprfx + +# SVE2 complex saturating integer multiply-add (indexed) +SQRDCMLAH_zzxz_h 01000100 10 1 index:2 rm:3 0111 rot:2 rn:5 rd:5 \ + ra=%reg_movprfx +SQRDCMLAH_zzxz_s 01000100 11 1 index:1 rm:4 0111 rot:2 rn:5 rd:5 \ + ra=%reg_movprfx + # SVE2 multiply-add long (indexed) SMLALB_zzxw_s 01000100 .. 1 . 1000.0 . . @rrxw_s SMLALB_zzxw_d 01000100 .. 1 . 1000.0 . . @rrxw_d diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 479fffa16c..4b54ec8c25 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1466,34 +1466,132 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ } \ } -#define do_cmla(N, M, A, S) (A + (N * M) * (S ? -1 : 1)) +static int8_t do_cmla_b(int8_t n, int8_t m, int8_t a, bool sub) +{ +return n * m * (sub ? -1 : 1) + a; +} -DO_CMLA(sve2_cmla__b, uint8_t, H1, do_cmla) -DO_CMLA(sve2_cmla__h, uint16_t, H2, do_cmla) -DO_CMLA(sve2_cmla__s, uint32_t, H4, do_cmla) -DO_CMLA(sve2_cmla__d, uint64_t, , do_cmla) +static int16_t do_cmla_h(int16_t n, int16_t m, int16_t a, bool sub) +{ +return n * m * (sub ? -1 : 1) + a; +} -#define DO_SQRDMLAH_B(N, M, A, S) \ -do_sqrdmlah_b(N, M, A, S, true) -#define DO_SQRDMLAH_H(N, M, A, S) \ -({ uint32_t discard; do_sqrdmlah_h(N, M, A, S, true, ); }) -#define DO_SQRDMLAH_S(N, M, A, S) \ -({ uint32_t discard; do_sqrdmlah_s(N, M, A, S, true, ); }) -#define DO_SQRDMLAH_D(N, M, A, S) \ -do_sqrdmlah_d(N, M, A, S, true) +static int32_t do_cmla_s(int32_t n, int32_t m, int32_t a, bool sub) +{ +return n * m * (sub ? -1 : 1) + a; +} -DO_CMLA(sve2_sqrdcmlah__b, int8_t, H1, DO_SQRDMLAH_B) -DO_CMLA(sve2_sqrdcmlah__h, int16_t, H2, DO_SQRDMLAH_H) -DO_CMLA(sve2_sqrdcmlah__s, int32_t, H4, DO_SQRDMLAH_S) -DO_CMLA(sve2_sqrdcmlah__d, int64_t, , DO_SQRDMLAH_D) +static int64_t do_cmla_d(int64_t n, int64_t m, int64_t a, bool sub) +{ +return n * m * (sub ? -1 : 1) + a; +} + +DO_CMLA(sve2_cmla__b, uint8_t, H1, do_cmla_b) +DO_CMLA(sve2_cmla__h, uint16_t, H2, do_cmla_h) +DO_CMLA(sve2_cmla__s, uint32_t, H4, do_cmla_s) +DO_CMLA(sve2_cmla__d, uint64_t, , do_cmla_d) + +static int8_t do_sqrdcmlah_b(int8_t n, int8_t m, int8_t a, bool sub) +{ +return do_sqrdmlah_b(n, m, a, sub, true); +} + +static int16_t do_sqrdcmlah_h(int16_t n, int16_t m, int16_t a, bool sub) +{ +uint32_t discard; +return do_sqrdmlah_h(n, m, a, sub, true, ); +} + +static int32_t do_sqrdcmlah_s(int32_t n, int32_t m, int32_t a, bool sub) +{ +uint32_t discard; +return do_sqrdmlah_s(n, m, a, sub, true, ); +} + +static int64_t do_sqrdcmlah_d(int64_t n, int64_t m, int64_t a, bool sub) +{ +return do_sqrdmlah_d(n, m, a, sub, true); +} + +DO_CMLA(sve2_sqrdcmlah__b, int8_t, H1, do_sqrdcmlah_b) +DO_CMLA(sve2_sqrdcmlah__h, int16_t, H2, do_sqrdcmlah_h) +DO_CMLA(sve2_sqrdcmlah__s, int32_t, H4, do_sqrdcmlah_s) +DO_CMLA(sve2_sqrdcmlah__d, int64_t, , do_sqrdcmlah_d) -#undef DO_SQRDMLAH_B -#undef DO_SQRDMLAH_H -#undef DO_SQRDMLAH_S -#undef DO_SQRDMLAH_D -#undef do_cmla #undef DO_CMLA +static void
[PATCH v2 069/100] target/arm: Split out formats for 2 vectors + 1 index
Currently only used by FMUL, but will shortly be used more. Signed-off-by: Richard Henderson --- target/arm/sve.decode | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 5815ba9b1c..a121e55f07 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -67,6 +67,7 @@ _eszrd rn imm esz _esz rd rn rm imm esz _eszrd rn rm esz +_eszrd rn rm index esz _eszrd pg rn esz _s rd pg rn s _s rd pg rn rm s @@ -245,6 +246,14 @@ @rpri_scatter_store ... msz:2 ..imm:5 ... pg:3 rn:5 rd:5 \ _scatter_store +# Two registers and a scalar by index +@rrx_h 0. . .. rm:3 .. rn:5 rd:5 \ +_esz index=%index3_22_19 esz=1 +@rrx_s 10 . index:2 rm:3 .. rn:5 rd:5 \ +_esz esz=2 +@rrx_d 11 . index:1 rm:4 .. rn:5 rd:5 \ +_esz esz=3 + ### # Instruction patterns. Grouped according to the SVE encodingindex.xhtml. @@ -792,10 +801,9 @@ FMLA_zzxz 01100100 111 index:1 rm:4 0 sub:1 rn:5 rd:5 \ ### SVE FP Multiply Indexed Group # SVE floating-point multiply (indexed) -FMUL_zzx01100100 0.1 .. rm:3 001000 rn:5 rd:5 \ -index=%index3_22_19 esz=1 -FMUL_zzx01100100 101 index:2 rm:3 001000 rn:5 rd:5 esz=2 -FMUL_zzx01100100 111 index:1 rm:4 001000 rn:5 rd:5 esz=3 +FMUL_zzx01100100 .. 1 . 001000 . . @rrx_h +FMUL_zzx01100100 .. 1 . 001000 . . @rrx_s +FMUL_zzx01100100 .. 1 . 001000 . . @rrx_d ### SVE FP Fast Reduction Group -- 2.25.1
[PATCH v2 070/100] target/arm: Split out formats for 3 vectors + 1 index
Used by FMLA and DOT, but will shortly be used more. Split FMLA from FMLS to avoid an extra sub field; similarly for SDOT from UDOT. Signed-off-by: Richard Henderson --- target/arm/sve.decode | 29 +++-- target/arm/translate-sve.c | 38 -- 2 files changed, 47 insertions(+), 20 deletions(-) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index a121e55f07..ea6ec5f198 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -73,6 +73,7 @@ _s rd pg rn rm s _esz rd pg rn rm esz _esz rd ra rn rm esz +_esz rd rn rm ra index esz _esz rd pg rn rm ra esz _esz rd pg rn imm esz rd esz pat s @@ -254,6 +255,14 @@ @rrx_d 11 . index:1 rm:4 .. rn:5 rd:5 \ _esz esz=3 +# Three registers and a scalar by index +@rrxr_h 0. . .. rm:3 .. rn:5 rd:5 \ +_esz ra=%reg_movprfx index=%index3_22_19 esz=1 +@rrxr_s 10 . index:2 rm:3 .. rn:5 rd:5 \ +_esz ra=%reg_movprfx esz=2 +@rrxr_d 11 . index:1 rm:4 .. rn:5 rd:5 \ +_esz ra=%reg_movprfx esz=3 + ### # Instruction patterns. Grouped according to the SVE encodingindex.xhtml. @@ -769,10 +778,10 @@ DOT_01000100 1 sz:1 0 rm:5 0 u:1 rn:5 rd:5 \ ra=%reg_movprfx # SVE integer dot product (indexed) -DOT_zzxw01000100 101 index:2 rm:3 0 u:1 rn:5 rd:5 \ -sz=0 ra=%reg_movprfx -DOT_zzxw01000100 111 index:1 rm:4 0 u:1 rn:5 rd:5 \ -sz=1 ra=%reg_movprfx +SDOT_zzxw_s 01000100 .. 1 . 00 . . @rrxr_s +SDOT_zzxw_d 01000100 .. 1 . 00 . . @rrxr_d +UDOT_zzxw_s 01000100 .. 1 . 01 . . @rrxr_s +UDOT_zzxw_d 01000100 .. 1 . 01 . . @rrxr_d # SVE floating-point complex add (predicated) FCADD 01100100 esz:2 0 rot:1 100 pg:3 rm:5 rd:5 \ @@ -791,12 +800,12 @@ FCMLA_zzxz 01100100 11 1 index:1 rm:4 0001 rot:2 rn:5 rd:5 \ ### SVE FP Multiply-Add Indexed Group # SVE floating-point multiply-add (indexed) -FMLA_zzxz 01100100 0.1 .. rm:3 0 sub:1 rn:5 rd:5 \ -ra=%reg_movprfx index=%index3_22_19 esz=1 -FMLA_zzxz 01100100 101 index:2 rm:3 0 sub:1 rn:5 rd:5 \ -ra=%reg_movprfx esz=2 -FMLA_zzxz 01100100 111 index:1 rm:4 0 sub:1 rn:5 rd:5 \ -ra=%reg_movprfx esz=3 +FMLA_zzxz 01100100 .. 1 . 00 . . @rrxr_h +FMLA_zzxz 01100100 .. 1 . 00 . . @rrxr_s +FMLA_zzxz 01100100 .. 1 . 00 . . @rrxr_d +FMLS_zzxz 01100100 .. 1 . 01 . . @rrxr_h +FMLS_zzxz 01100100 .. 1 . 01 . . @rrxr_s +FMLS_zzxz 01100100 .. 1 . 01 . . @rrxr_d ### SVE FP Multiply Indexed Group diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 26497f0a6d..88ffc458ee 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3817,26 +3817,34 @@ static bool trans_DOT_(DisasContext *s, arg_DOT_ *a) return true; } -static bool trans_DOT_zzxw(DisasContext *s, arg_DOT_zzxw *a) +static bool do_zzxz_ool(DisasContext *s, arg_rrxr_esz *a, +gen_helper_gvec_4 *fn) { -static gen_helper_gvec_4 * const fns[2][2] = { -{ gen_helper_gvec_sdot_idx_b, gen_helper_gvec_sdot_idx_h }, -{ gen_helper_gvec_udot_idx_b, gen_helper_gvec_udot_idx_h } -}; - +if (fn == NULL) { +return false; +} if (sve_access_check(s)) { -gen_gvec_ool_(s, fns[a->u][a->sz], a->rd, a->rn, a->rm, - a->ra, a->index); +gen_gvec_ool_(s, fn, a->rd, a->rn, a->rm, a->ra, a->index); } return true; } +#define DO_RRXR(NAME, FUNC) \ +static bool NAME(DisasContext *s, arg_rrxr_esz *a) \ +{ return do_zzxz_ool(s, a, FUNC); } + +DO_RRXR(trans_SDOT_zzxw_s, gen_helper_gvec_sdot_idx_b) +DO_RRXR(trans_SDOT_zzxw_d, gen_helper_gvec_sdot_idx_h) +DO_RRXR(trans_UDOT_zzxw_s, gen_helper_gvec_udot_idx_b) +DO_RRXR(trans_UDOT_zzxw_d, gen_helper_gvec_udot_idx_h) + +#undef DO_RRXR /* *** SVE Floating Point Multiply-Add Indexed Group */ -static bool trans_FMLA_zzxz(DisasContext *s, arg_FMLA_zzxz *a) +static bool do_FMLA_zzxz(DisasContext *s, arg_rrxr_esz *a, bool sub) { static gen_helper_gvec_4_ptr * const fns[3] = { gen_helper_gvec_fmla_idx_h, @@ -3851,13 +3859,23 @@ static bool trans_FMLA_zzxz(DisasContext *s, arg_FMLA_zzxz *a) vec_full_reg_offset(s, a->rn), vec_full_reg_offset(s, a->rm), vec_full_reg_offset(s, a->ra), -
[PATCH v2 075/100] target/arm: Implement SVE2 saturating multiply-add high (indexed)
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 14 + target/arm/sve.decode | 8 target/arm/sve_helper.c| 40 ++ target/arm/translate-sve.c | 8 4 files changed, 70 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 2929ad48a7..c86dcf0c55 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2177,3 +2177,17 @@ DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__d, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(fmmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(fmmla_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_sqrdmlah_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlah_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlah_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 467a93052f..5fc76b7fc3 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -793,6 +793,14 @@ MLS_zzxz_h 01000100 .. 1 . 11 . . @rrxr_h MLS_zzxz_s 01000100 .. 1 . 11 . . @rrxr_s MLS_zzxz_d 01000100 .. 1 . 11 . . @rrxr_d +# SVE2 saturating multiply-add high (indexed) +SQRDMLAH_zzxz_h 01000100 .. 1 . 000100 . . @rrxr_h +SQRDMLAH_zzxz_s 01000100 .. 1 . 000100 . . @rrxr_s +SQRDMLAH_zzxz_d 01000100 .. 1 . 000100 . . @rrxr_d +SQRDMLSH_zzxz_h 01000100 .. 1 . 000101 . . @rrxr_h +SQRDMLSH_zzxz_s 01000100 .. 1 . 000101 . . @rrxr_s +SQRDMLSH_zzxz_d 01000100 .. 1 . 000101 . . @rrxr_d + # SVE2 integer multiply (indexed) MUL_zzx_h 01000100 .. 1 . 10 . . @rrx_h MUL_zzx_s 01000100 .. 1 . 10 . . @rrx_s diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 24c733fea1..b3a87fb0b7 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1487,9 +1487,49 @@ DO_CMLA(sve2_sqrdcmlah__h, int16_t, H2, DO_SQRDMLAH_H) DO_CMLA(sve2_sqrdcmlah__s, int32_t, H4, DO_SQRDMLAH_S) DO_CMLA(sve2_sqrdcmlah__d, int64_t, , DO_SQRDMLAH_D) +#undef DO_SQRDMLAH_B +#undef DO_SQRDMLAH_H +#undef DO_SQRDMLAH_S +#undef DO_SQRDMLAH_D #undef do_cmla #undef DO_CMLA +#define DO_ZZXZ(NAME, TYPE, H, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ +intptr_t oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \ +intptr_t i, j, idx = simd_data(desc); \ +TYPE *d = vd, *a = va, *n = vn, *m = (TYPE *)vm + H(idx); \ +for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \ +TYPE mm = m[i]; \ +for (j = 0; j < segment; j++) { \ +d[i + j] = OP(n[i + j], mm, a[i + j]); \ +} \ +} \ +} + +#define DO_SQRDMLAH_H(N, M, A) \ +({ uint32_t discard; do_sqrdmlah_h(N, M, A, false, true, ); }) +#define DO_SQRDMLAH_S(N, M, A) \ +({ uint32_t discard; do_sqrdmlah_s(N, M, A, false, true, ); }) +#define DO_SQRDMLAH_D(N, M, A) do_sqrdmlah_d(N, M, A, false, true) + +DO_ZZXZ(sve2_sqrdmlah_idx_h, int16_t, H2, DO_SQRDMLAH_H) +DO_ZZXZ(sve2_sqrdmlah_idx_s, int32_t, H4, DO_SQRDMLAH_S) +DO_ZZXZ(sve2_sqrdmlah_idx_d, int64_t, , DO_SQRDMLAH_D) + +#define DO_SQRDMLSH_H(N, M, A) \ +({ uint32_t discard; do_sqrdmlah_h(N, M, A, true, true, ); }) +#define DO_SQRDMLSH_S(N, M, A) \ +({ uint32_t discard; do_sqrdmlah_s(N, M, A, true, true, ); }) +#define DO_SQRDMLSH_D(N, M, A) do_sqrdmlah_d(N, M, A, true, true) + +DO_ZZXZ(sve2_sqrdmlsh_idx_h, int16_t, H2, DO_SQRDMLSH_H) +DO_ZZXZ(sve2_sqrdmlsh_idx_s, int32_t, H4, DO_SQRDMLSH_S) +DO_ZZXZ(sve2_sqrdmlsh_idx_d, int64_t, , DO_SQRDMLSH_D) + +#undef DO_ZZXZ + #define DO_BITPERM(NAME, TYPE, OP) \ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ { \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 0fb88f4aa5..2903e46f91 100644 --- a/target/arm/translate-sve.c +++
[PATCH v2 066/100] target/arm: Fix sve_punpk_p vs odd vector lengths
Wrote too much with punpk1 with vl % 512 != 0. Reported-by: Laurent Desnogues Signed-off-by: Richard Henderson --- target/arm/sve_helper.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index dc23a9b3e0..24c733fea1 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3137,11 +3137,11 @@ void HELPER(sve_punpk_p)(void *vd, void *vn, uint32_t pred_desc) high = oprsz >> 1; } -if ((high & 3) == 0) { +if ((oprsz & 7) == 0) { uint32_t *n = vn; high >>= 2; -for (i = 0; i < DIV_ROUND_UP(oprsz, 8); i++) { +for (i = 0; i < oprsz / 8; i++) { uint64_t nn = n[H4(high + i)]; d[i] = expand_bits(nn, 0); } -- 2.25.1
[PATCH v2 064/100] target/arm: Fix sve_uzp_p vs odd vector lengths
Missed out on compressing the second half of a predicate with length vl % 512 > 256. Adjust all of the x + (y << s) to x | (y << s) as a general style fix. Reported-by: Laurent Desnogues Signed-off-by: Richard Henderson --- target/arm/sve_helper.c | 30 +- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index b1bb2300f8..f0601bf25b 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2971,7 +2971,7 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc) if (oprsz <= 8) { l = compress_bits(n[0] >> odd, esz); h = compress_bits(m[0] >> odd, esz); -d[0] = extract64(l + (h << (4 * oprsz)), 0, 8 * oprsz); +d[0] = l | (h << (4 * oprsz)); } else { ARMPredicateReg tmp_m; intptr_t oprsz_16 = oprsz / 16; @@ -2985,23 +2985,35 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc) h = n[2 * i + 1]; l = compress_bits(l >> odd, esz); h = compress_bits(h >> odd, esz); -d[i] = l + (h << 32); +d[i] = l | (h << 32); } -/* For VL which is not a power of 2, the results from M do not - align nicely with the uint64_t for D. Put the aligned results - from M into TMP_M and then copy it into place afterward. */ +/* + * For VL which is not a multiple of 512, the results from M do not + * align nicely with the uint64_t for D. Put the aligned results + * from M into TMP_M and then copy it into place afterward. + */ if (oprsz & 15) { -d[i] = compress_bits(n[2 * i] >> odd, esz); +int final_shift = (oprsz & 15) * 2; + +l = n[2 * i + 0]; +h = n[2 * i + 1]; +l = compress_bits(l >> odd, esz); +h = compress_bits(h >> odd, esz); +d[i] = l | (h << final_shift); for (i = 0; i < oprsz_16; i++) { l = m[2 * i + 0]; h = m[2 * i + 1]; l = compress_bits(l >> odd, esz); h = compress_bits(h >> odd, esz); -tmp_m.p[i] = l + (h << 32); +tmp_m.p[i] = l | (h << 32); } -tmp_m.p[i] = compress_bits(m[2 * i] >> odd, esz); +l = m[2 * i + 0]; +h = m[2 * i + 1]; +l = compress_bits(l >> odd, esz); +h = compress_bits(h >> odd, esz); +tmp_m.p[i] = l | (h << final_shift); swap_memmove(vd + oprsz / 2, _m, oprsz / 2); } else { @@ -3010,7 +3022,7 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc) h = m[2 * i + 1]; l = compress_bits(l >> odd, esz); h = compress_bits(h >> odd, esz); -d[oprsz_16 + i] = l + (h << 32); +d[oprsz_16 + i] = l | (h << 32); } } } -- 2.25.1
[PATCH v2 063/100] target/arm: Implement SVE2 SPLICE, EXT
From: Stephen Long Signed-off-by: Stephen Long Message-Id: <20200423180347.9403-1-stepl...@quicinc.com> [rth: Rename the trans_* functions to *_sve2.] Signed-off-by: Richard Henderson --- target/arm/sve.decode | 11 +-- target/arm/translate-sve.c | 35 ++- 2 files changed, 39 insertions(+), 7 deletions(-) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 11e724d3a2..0688dae450 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -494,10 +494,14 @@ CPY_z_i 0101 .. 01 00 . . @rdn_pg4 imm=%sh8_i8s ### SVE Permute - Extract Group -# SVE extract vector (immediate offset) +# SVE extract vector (destructive) EXT 0101 001 . 000 ... rm:5 rd:5 \ rn=%reg_movprfx imm=%imm8_16_10 +# SVE2 extract vector (constructive) +EXT_sve20101 011 . 000 ... rn:5 rd:5 \ + imm=%imm8_16_10 + ### SVE Permute - Unpredicated Group # SVE broadcast general register @@ -588,9 +592,12 @@ REVH0101 .. 1001 01 100 ... . . @rd_pg_rn REVW0101 .. 1001 10 100 ... . . @rd_pg_rn RBIT0101 .. 1001 11 100 ... . . @rd_pg_rn -# SVE vector splice (predicated) +# SVE vector splice (predicated, destructive) SPLICE 0101 .. 101 100 100 ... . . @rdn_pg_rm +# SVE2 vector splice (predicated, constructive) +SPLICE_sve2 0101 .. 101 101 100 ... . . @rd_pg_rn + ### SVE Select Vectors Group # SVE select vector elements (predicated) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 0fa04afcaf..45ee91d3fe 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -2266,18 +2266,18 @@ static bool trans_CPY_z_i(DisasContext *s, arg_CPY_z_i *a) *** SVE Permute Extract Group */ -static bool trans_EXT(DisasContext *s, arg_EXT *a) +static bool do_EXT(DisasContext *s, int rd, int rn, int rm, int imm) { if (!sve_access_check(s)) { return true; } unsigned vsz = vec_full_reg_size(s); -unsigned n_ofs = a->imm >= vsz ? 0 : a->imm; +unsigned n_ofs = imm >= vsz ? 0 : imm; unsigned n_siz = vsz - n_ofs; -unsigned d = vec_full_reg_offset(s, a->rd); -unsigned n = vec_full_reg_offset(s, a->rn); -unsigned m = vec_full_reg_offset(s, a->rm); +unsigned d = vec_full_reg_offset(s, rd); +unsigned n = vec_full_reg_offset(s, rn); +unsigned m = vec_full_reg_offset(s, rm); /* Use host vector move insns if we have appropriate sizes * and no unfortunate overlap. @@ -2296,6 +2296,19 @@ static bool trans_EXT(DisasContext *s, arg_EXT *a) return true; } +static bool trans_EXT(DisasContext *s, arg_EXT *a) +{ +return do_EXT(s, a->rd, a->rn, a->rm, a->imm); +} + +static bool trans_EXT_sve2(DisasContext *s, arg_rri *a) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +return do_EXT(s, a->rd, a->rn, (a->rn + 1) % 32, a->imm); +} + /* *** SVE Permute - Unpredicated Group */ @@ -3023,6 +3036,18 @@ static bool trans_SPLICE(DisasContext *s, arg_rprr_esz *a) return true; } +static bool trans_SPLICE_sve2(DisasContext *s, arg_rpr_esz *a) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +if (sve_access_check(s)) { +gen_gvec_ool_zzzp(s, gen_helper_sve_splice, + a->rd, a->rn, (a->rn + 1) % 32, a->pg, 0); +} +return true; +} + /* *** SVE Integer Compare - Vectors Group */ -- 2.25.1
[PATCH v2 065/100] target/arm: Fix sve_zip_p vs odd vector lengths
Wrote too much with low-half zip (zip1) with vl % 512 != 0. Adjust all of the x + (y << s) to x | (y << s) as a style fix. Reported-by: Laurent Desnogues Signed-off-by: Richard Henderson --- target/arm/sve_helper.c | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index f0601bf25b..dc23a9b3e0 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2903,6 +2903,7 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc) intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2; int esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2); intptr_t high = extract32(pred_desc, SIMD_DATA_SHIFT + 2, 1); +int esize = 1 << esz; uint64_t *d = vd; intptr_t i; @@ -2915,33 +2916,35 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc) mm = extract64(mm, high * half, half); nn = expand_bits(nn, esz); mm = expand_bits(mm, esz); -d[0] = nn + (mm << (1 << esz)); +d[0] = nn | (mm << esize); } else { -ARMPredicateReg tmp_n, tmp_m; +ARMPredicateReg tmp; /* We produce output faster than we consume input. Therefore we must be mindful of possible overlap. */ -if ((vn - vd) < (uintptr_t)oprsz) { -vn = memcpy(_n, vn, oprsz); -} -if ((vm - vd) < (uintptr_t)oprsz) { -vm = memcpy(_m, vm, oprsz); +if (vd == vn) { +vn = memcpy(, vn, oprsz); +if (vd == vm) { +vm = vn; +} +} else if (vd == vm) { +vm = memcpy(, vm, oprsz); } if (high) { high = oprsz >> 1; } -if ((high & 3) == 0) { +if ((oprsz & 7) == 0) { uint32_t *n = vn, *m = vm; high >>= 2; -for (i = 0; i < DIV_ROUND_UP(oprsz, 8); i++) { +for (i = 0; i < oprsz / 8; i++) { uint64_t nn = n[H4(high + i)]; uint64_t mm = m[H4(high + i)]; nn = expand_bits(nn, esz); mm = expand_bits(mm, esz); -d[i] = nn + (mm << (1 << esz)); +d[i] = nn | (mm << esize); } } else { uint8_t *n = vn, *m = vm; @@ -2953,7 +2956,7 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc) nn = expand_bits(nn, esz); mm = expand_bits(mm, esz); -d16[H2(i)] = nn + (mm << (1 << esz)); +d16[H2(i)] = nn | (mm << esize); } } } -- 2.25.1
[PATCH v2 073/100] target/arm: Implement SVE2 integer multiply-add (indexed)
Signed-off-by: Richard Henderson --- target/arm/helper.h| 14 ++ target/arm/sve.decode | 8 target/arm/translate-sve.c | 23 +++ target/arm/vec_helper.c| 25 + 4 files changed, 70 insertions(+) diff --git a/target/arm/helper.h b/target/arm/helper.h index ce81a16a58..7964d299f6 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -793,6 +793,20 @@ DEF_HELPER_FLAGS_4(gvec_mul_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_mul_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_mul_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_mla_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_mla_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_mla_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(gvec_mls_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_mls_idx_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_mls_idx_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/sve.decode b/target/arm/sve.decode index fa0a572da6..467a93052f 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -785,6 +785,14 @@ SDOT_zzxw_d 01000100 .. 1 . 00 . . @rrxr_d UDOT_zzxw_s 01000100 .. 1 . 01 . . @rrxr_s UDOT_zzxw_d 01000100 .. 1 . 01 . . @rrxr_d +# SVE2 integer multiply-add (indexed) +MLA_zzxz_h 01000100 .. 1 . 10 . . @rrxr_h +MLA_zzxz_s 01000100 .. 1 . 10 . . @rrxr_s +MLA_zzxz_d 01000100 .. 1 . 10 . . @rrxr_d +MLS_zzxz_h 01000100 .. 1 . 11 . . @rrxr_h +MLS_zzxz_s 01000100 .. 1 . 11 . . @rrxr_s +MLS_zzxz_d 01000100 .. 1 . 11 . . @rrxr_d + # SVE2 integer multiply (indexed) MUL_zzx_h 01000100 .. 1 . 10 . . @rrx_h MUL_zzx_s 01000100 .. 1 . 10 . . @rrx_s diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index dd2cd22061..0fb88f4aa5 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3870,6 +3870,29 @@ DO_SVE2_RRX(trans_MUL_zzx_d, gen_helper_gvec_mul_idx_d) #undef DO_SVE2_RRX +static bool do_sve2_zzxz_ool(DisasContext *s, arg_rrxr_esz *a, + gen_helper_gvec_4 *fn) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +return do_zzxz_ool(s, a, fn); +} + +#define DO_SVE2_RRXR(NAME, FUNC) \ +static bool NAME(DisasContext *s, arg_rrxr_esz *a) \ +{ return do_sve2_zzxz_ool(s, a, FUNC); } + +DO_SVE2_RRXR(trans_MLA_zzxz_h, gen_helper_gvec_mla_idx_h) +DO_SVE2_RRXR(trans_MLA_zzxz_s, gen_helper_gvec_mla_idx_s) +DO_SVE2_RRXR(trans_MLA_zzxz_d, gen_helper_gvec_mla_idx_d) + +DO_SVE2_RRXR(trans_MLS_zzxz_h, gen_helper_gvec_mls_idx_h) +DO_SVE2_RRXR(trans_MLS_zzxz_s, gen_helper_gvec_mls_idx_s) +DO_SVE2_RRXR(trans_MLS_zzxz_d, gen_helper_gvec_mls_idx_d) + +#undef DO_SVE2_RRXR + /* *** SVE Floating Point Multiply-Add Indexed Group */ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 08eadf06fc..fb8596c1fd 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -883,6 +883,31 @@ DO_MUL_IDX(gvec_mul_idx_d, uint64_t, ) #undef DO_MUL_IDX +#define DO_MLA_IDX(NAME, TYPE, OP, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ +intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \ +intptr_t idx = simd_data(desc);\ +TYPE *d = vd, *n = vn, *m = vm, *a = va; \ +for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \ +TYPE mm = m[H(i + idx)]; \ +for (j = 0; j < segment; j++) {\ +d[i + j] = a[i + j] OP n[i + j] * mm; \ +} \ +} \ +clear_tail(d, oprsz, simd_maxsz(desc));\ +} + +DO_MLA_IDX(gvec_mla_idx_h, uint16_t, +, H2) +DO_MLA_IDX(gvec_mla_idx_s, uint32_t, +, H4) +DO_MLA_IDX(gvec_mla_idx_d, uint64_t, +, ) + +DO_MLA_IDX(gvec_mls_idx_h, uint16_t, -, H2) +DO_MLA_IDX(gvec_mls_idx_s, uint32_t, -, H4)
[PATCH v2 061/100] target/arm: Implement SVE2 gather load insns
From: Stephen Long Add decoding logic for SVE2 64-bit/32-bit gather non-temporal load insns. 64-bit * LDNT1SB * LDNT1B (vector plus scalar) * LDNT1SH * LDNT1H (vector plus scalar) * LDNT1SW * LDNT1W (vector plus scalar) * LDNT1D (vector plus scalar) 32-bit * LDNT1SB * LDNT1B (vector plus scalar) * LDNT1SH * LDNT1H (vector plus scalar) * LDNT1W (vector plus scalar) Signed-off-by: Stephen Long Message-Id: <20200422152343.12493-1-stepl...@quicinc.com> Signed-off-by: Richard Henderson --- target/arm/sve.decode | 11 +++ target/arm/translate-sve.c | 8 2 files changed, 19 insertions(+) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index dc784dcabe..1b5bd2d193 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1389,6 +1389,17 @@ UMLSLT_zzzw 01000100 .. 0 . 010 111 . . @rda_rn_rm CMLA_ 01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5 ra=%reg_movprfx SQRDCMLAH_ 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx +### SVE2 Memory Gather Load Group + +# SVE2 64-bit gather non-temporal load +# (scalar plus unpacked 32-bit unscaled offsets) +LDNT1_zprz 1100010 msz:2 00 rm:5 1 u:1 0 pg:3 rn:5 rd:5 \ +_gather_load xs=0 esz=3 scale=0 ff=0 + +# SVE2 32-bit gather non-temporal load (scalar plus 32-bit unscaled offsets) +LDNT1_zprz 110 msz:2 00 rm:5 10 u:1 pg:3 rn:5 rd:5 \ +_gather_load xs=0 esz=2 scale=0 ff=0 + ### SVE2 Memory Store Group # SVE2 64-bit scatter non-temporal store (vector plus scalar) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 7fa1e0d354..77003ee43e 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5622,6 +5622,14 @@ static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a) return true; } +static bool trans_LDNT1_zprz(DisasContext *s, arg_LD1_zprz *a) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +return trans_LDNT1_zprz(s, a); +} + /* Indexed by [be][xs][msz]. */ static gen_helper_gvec_mem_scatter * const scatter_store_fn32[2][2][3] = { /* Little-endian */ -- 2.25.1
[PATCH v2 060/100] target/arm: Implement SVE2 scatter store insns
From: Stephen Long Add decoding logic for SVE2 64-bit/32-bit scatter non-temporal store insns. 64-bit * STNT1B (vector plus scalar) * STNT1H (vector plus scalar) * STNT1W (vector plus scalar) * STNT1D (vector plus scalar) 32-bit * STNT1B (vector plus scalar) * STNT1H (vector plus scalar) * STNT1W (vector plus scalar) Signed-off-by: Stephen Long Message-Id: <20200422141553.8037-1-stepl...@quicinc.com> Signed-off-by: Richard Henderson --- target/arm/sve.decode | 10 ++ target/arm/translate-sve.c | 8 2 files changed, 18 insertions(+) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index a375ce31f1..dc784dcabe 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1388,3 +1388,13 @@ UMLSLT_zzzw 01000100 .. 0 . 010 111 . . @rda_rn_rm CMLA_ 01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5 ra=%reg_movprfx SQRDCMLAH_ 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx + +### SVE2 Memory Store Group + +# SVE2 64-bit scatter non-temporal store (vector plus scalar) +STNT1_zprz 1110010 .. 00 . 001 ... . . \ +@rprr_scatter_store xs=2 esz=3 scale=0 + +# SVE2 32-bit scatter non-temporal store (vector plus scalar) +STNT1_zprz 1110010 .. 10 . 001 ... . . \ +@rprr_scatter_store xs=0 esz=2 scale=0 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 640b109166..7fa1e0d354 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5728,6 +5728,14 @@ static bool trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz *a) return true; } +static bool trans_STNT1_zprz(DisasContext *s, arg_ST1_zprz *a) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +return trans_ST1_zprz(s, a); +} + /* * Prefetches */ -- 2.25.1
[PATCH v2 072/100] target/arm: Use helper_gvec_mul_idx_* for aa64 advsimd
Signed-off-by: Richard Henderson --- target/arm/translate-a64.c | 16 1 file changed, 16 insertions(+) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 341b11f98d..a3135754ce 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -13037,6 +13037,22 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn) data, gen_helper_gvec_fmlal_idx_a64); } return; + +case 0x08: /* MUL */ +if (!is_long && !is_scalar) { +static gen_helper_gvec_3 * const fns[3] = { +gen_helper_gvec_mul_idx_h, +gen_helper_gvec_mul_idx_s, +gen_helper_gvec_mul_idx_d, +}; +tcg_gen_gvec_3_ool(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), + index, fns[size - 1]); +return; +} +break; } if (size == 3) { -- 2.25.1
[PATCH v2 058/100] target/arm: Implement SVE2 HISTCNT, HISTSEG
From: Stephen Long Signed-off-by: Stephen Long Message-Id: <20200416173109.8856-1-stepl...@quicinc.com> Signed-off-by: Richard Henderson --- v2: Fix overlap between output and input vectors. --- target/arm/helper-sve.h| 7 +++ target/arm/sve.decode | 6 ++ target/arm/sve_helper.c| 124 + target/arm/translate-sve.c | 19 ++ 4 files changed, 156 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index c47dea5920..1d5d272c5c 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2063,6 +2063,13 @@ DEF_HELPER_FLAGS_5(sve2_nmatch_ppzz_b, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve2_nmatch_ppzz_h, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_histcnt_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_histcnt_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_histseg, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_s, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 3121eabbf8..0edb72d4fb 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -146,6 +146,7 @@ _esz rn=%reg_movprfx @rdn_pg_rm_ra esz:2 . ra:5 ... pg:3 rm:5 rd:5 \ _esz rn=%reg_movprfx +@rd_pg_rn_rm esz:2 . rm:5 ... pg:3 rn:5 rd:5 _esz # One register operand, with governing predicate, vector element size @rd_pg_rn esz:2 ... ... ... pg:3 rn:5 rd:5 _esz @@ -1336,6 +1337,11 @@ RSUBHNT 01000101 .. 1 . 011 111 . . @rd_rn_rm MATCH 01000101 .. 1 . 100 ... . 0 @pd_pg_rn_rm NMATCH 01000101 .. 1 . 100 ... . 1 @pd_pg_rn_rm +### SVE2 Histogram Computation + +HISTCNT 01000101 .. 1 . 110 ... . . @rd_pg_rn_rm +HISTSEG 01000101 .. 1 . 101 000 . . @rd_rn_rm + ## SVE2 floating-point pairwise operations FADDP 01100100 .. 010 00 0 100 ... . . @rdn_pg_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 4464c9af52..bc1c3ce1f0 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -6660,3 +6660,127 @@ DO_PPZZ_MATCH(sve2_nmatch_ppzz_b, MO_8, true) DO_PPZZ_MATCH(sve2_nmatch_ppzz_h, MO_16, true) #undef DO_PPZZ_MATCH + +void HELPER(sve2_histcnt_s)(void *vd, void *vn, void *vm, void *vg, +uint32_t desc) +{ +ARMVectorReg scratch; +intptr_t i, j; +intptr_t opr_sz = simd_oprsz(desc); +uint32_t *d = vd, *n = vn, *m = vm; +uint8_t *pg = vg; + +if (d == n) { +n = memcpy(, n, opr_sz); +if (d == m) { +m = n; +} +} else if (d == m) { +m = memcpy(, m, opr_sz); +} + +for (i = 0; i < opr_sz; i += 4) { +uint64_t count = 0; +uint8_t pred; + +pred = pg[H1(i >> 3)] >> (i & 7); +if (pred & 1) { +uint32_t nn = n[H4(i >> 2)]; + +for (j = 0; j <= i; j += 4) { +pred = pg[H1(j >> 3)] >> (j & 7); +if ((pred & 1) && nn == m[H4(j >> 2)]) { +++count; +} +} +} +d[H4(i >> 2)] = count; +} +} + +void HELPER(sve2_histcnt_d)(void *vd, void *vn, void *vm, void *vg, +uint32_t desc) +{ +ARMVectorReg scratch; +intptr_t i, j; +intptr_t opr_sz = simd_oprsz(desc); +uint64_t *d = vd, *n = vn, *m = vm; +uint8_t *pg = vg; + +if (d == n) { +n = memcpy(, n, opr_sz); +if (d == m) { +m = n; +} +} else if (d == m) { +m = memcpy(, m, opr_sz); +} + +for (i = 0; i < opr_sz / 8; ++i) { +uint64_t count = 0; +if (pg[H1(i)] & 1) { +uint64_t nn = n[i]; +for (j = 0; j <= i; ++j) { +if ((pg[H1(j)] & 1) && nn == m[j]) { +++count; +} +} +} +d[i] = count; +} +} + +/* + * Returns the number of bytes in m0 and m1 that match n. + * See comment for do_match2(). + * */ +static inline uint64_t do_histseg_cnt(uint8_t n, uint64_t m0, uint64_t m1) +{ +int esz = MO_8; +int bits = 8 << esz; +uint64_t ones = dup_const(esz, 1); +uint64_t signs = ones << (bits - 1); +uint64_t cmp0, cmp1; + +cmp1 = dup_const(esz, n); +cmp0 = cmp1 ^ m0; +cmp1 = cmp1 ^ m1; +cmp0 = (cmp0 - ones) & ~cmp0 & signs; +cmp1 = (cmp1 - ones) & ~cmp1 & signs; + +/* + * Combine the two compares in a way that the bits do + * not overlap, and so preserves the count of set bits. + * If the host has an efficient instruction for ctpop, + * then ctpop(x) +
[PATCH v2 055/100] target/arm: Implement SVE2 RADDHNB, RADDHNT
From: Stephen Long Signed-off-by: Stephen Long Message-Id: <20200417162231.10374-3-stepl...@quicinc.com> Signed-off-by: Richard Henderson --- v2: Fix round bit type (laurent desnogues) --- target/arm/helper-sve.h| 8 target/arm/sve.decode | 2 ++ target/arm/sve_helper.c| 10 ++ target/arm/translate-sve.c | 2 ++ 4 files changed, 22 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 84281e3f9d..7627e0cd5f 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2029,6 +2029,14 @@ DEF_HELPER_FLAGS_4(sve2_addhnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_addhnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_addhnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_raddhnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_raddhnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_raddhnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_raddhnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_raddhnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_raddhnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve2_match_ppzz_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_match_ppzz_h, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index af9e87e88d..a33825066c 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1324,6 +1324,8 @@ UQRSHRNT01000101 .. 1 . 00 . . @rd_rn_tszimm_shr ADDHNB 01000101 .. 1 . 011 000 . . @rd_rn_rm ADDHNT 01000101 .. 1 . 011 001 . . @rd_rn_rm +RADDHNB 01000101 .. 1 . 011 010 . . @rd_rn_rm +RADDHNT 01000101 .. 1 . 011 011 . . @rd_rn_rm ### SVE2 Character Match diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 7f3dd2dfdb..281a680134 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2114,6 +2114,7 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ } #define DO_ADDHN(N, M, SH) ((N + M) >> SH) +#define DO_RADDHN(N, M, SH) ((N + M + ((__typeof(N))1 << (SH - 1))) >> SH) DO_BINOPNB(sve2_addhnb_h, uint16_t, uint8_t, 8, DO_ADDHN) DO_BINOPNB(sve2_addhnb_s, uint32_t, uint16_t, 16, DO_ADDHN) @@ -2123,6 +2124,15 @@ DO_BINOPNT(sve2_addhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_ADDHN) DO_BINOPNT(sve2_addhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_ADDHN) DO_BINOPNT(sve2_addhnt_d, uint64_t, uint32_t, 32, , H1_4, DO_ADDHN) +DO_BINOPNB(sve2_raddhnb_h, uint16_t, uint8_t, 8, DO_RADDHN) +DO_BINOPNB(sve2_raddhnb_s, uint32_t, uint16_t, 16, DO_RADDHN) +DO_BINOPNB(sve2_raddhnb_d, uint64_t, uint32_t, 32, DO_RADDHN) + +DO_BINOPNT(sve2_raddhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_RADDHN) +DO_BINOPNT(sve2_raddhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_RADDHN) +DO_BINOPNT(sve2_raddhnt_d, uint64_t, uint32_t, 32, , H1_4, DO_RADDHN) + +#undef DO_RADDHN #undef DO_ADDHN #undef DO_BINOPNB diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 7dc30ed1bd..7e3ba2e4f7 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7035,6 +7035,8 @@ static bool trans_##NAME(DisasContext *s, arg_rrr_esz *a) \ DO_SVE2_ZZZ_NARROW(ADDHNB, addhnb) DO_SVE2_ZZZ_NARROW(ADDHNT, addhnt) +DO_SVE2_ZZZ_NARROW(RADDHNB, raddhnb) +DO_SVE2_ZZZ_NARROW(RADDHNT, raddhnt) static bool do_sve2_ppzz_flags(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_flags_4 *fn) -- 2.25.1
[PATCH v2 057/100] target/arm: Implement SVE2 RSUBHNB, RSUBHNT
From: Stephen Long This completes the section 'SVE2 integer add/subtract narrow high part' Signed-off-by: Stephen Long Message-Id: <20200417162231.10374-5-stepl...@quicinc.com> Signed-off-by: Richard Henderson --- v2: Fix round bit type (laurent desnogues) --- target/arm/helper-sve.h| 8 target/arm/sve.decode | 2 ++ target/arm/sve_helper.c| 10 ++ target/arm/translate-sve.c | 2 ++ 4 files changed, 22 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 82e23d6470..c47dea5920 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2045,6 +2045,14 @@ DEF_HELPER_FLAGS_4(sve2_subhnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_subhnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_subhnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_rsubhnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_rsubhnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_rsubhnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_rsubhnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_rsubhnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_rsubhnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve2_match_ppzz_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_match_ppzz_h, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 8ad2698bcf..3121eabbf8 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1328,6 +1328,8 @@ RADDHNB 01000101 .. 1 . 011 010 . . @rd_rn_rm RADDHNT 01000101 .. 1 . 011 011 . . @rd_rn_rm SUBHNB 01000101 .. 1 . 011 100 . . @rd_rn_rm SUBHNT 01000101 .. 1 . 011 101 . . @rd_rn_rm +RSUBHNB 01000101 .. 1 . 011 110 . . @rd_rn_rm +RSUBHNT 01000101 .. 1 . 011 111 . . @rd_rn_rm ### SVE2 Character Match diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 0b490e8de6..4464c9af52 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2116,6 +2116,7 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ #define DO_ADDHN(N, M, SH) ((N + M) >> SH) #define DO_RADDHN(N, M, SH) ((N + M + ((__typeof(N))1 << (SH - 1))) >> SH) #define DO_SUBHN(N, M, SH) ((N - M) >> SH) +#define DO_RSUBHN(N, M, SH) ((N - M + ((__typeof(N))1 << (SH - 1))) >> SH) DO_BINOPNB(sve2_addhnb_h, uint16_t, uint8_t, 8, DO_ADDHN) DO_BINOPNB(sve2_addhnb_s, uint32_t, uint16_t, 16, DO_ADDHN) @@ -2141,6 +2142,15 @@ DO_BINOPNT(sve2_subhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_SUBHN) DO_BINOPNT(sve2_subhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_SUBHN) DO_BINOPNT(sve2_subhnt_d, uint64_t, uint32_t, 32, , H1_4, DO_SUBHN) +DO_BINOPNB(sve2_rsubhnb_h, uint16_t, uint8_t, 8, DO_RSUBHN) +DO_BINOPNB(sve2_rsubhnb_s, uint32_t, uint16_t, 16, DO_RSUBHN) +DO_BINOPNB(sve2_rsubhnb_d, uint64_t, uint32_t, 32, DO_RSUBHN) + +DO_BINOPNT(sve2_rsubhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_RSUBHN) +DO_BINOPNT(sve2_rsubhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_RSUBHN) +DO_BINOPNT(sve2_rsubhnt_d, uint64_t, uint32_t, 32, , H1_4, DO_RSUBHN) + +#undef DO_RSUBHN #undef DO_SUBHN #undef DO_RADDHN #undef DO_ADDHN diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index f584e06d87..c8c4822d9e 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7040,6 +7040,8 @@ DO_SVE2_ZZZ_NARROW(RADDHNT, raddhnt) DO_SVE2_ZZZ_NARROW(SUBHNB, subhnb) DO_SVE2_ZZZ_NARROW(SUBHNT, subhnt) +DO_SVE2_ZZZ_NARROW(RSUBHNB, rsubhnb) +DO_SVE2_ZZZ_NARROW(RSUBHNT, rsubhnt) static bool do_sve2_ppzz_flags(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_flags_4 *fn) -- 2.25.1
[PATCH v2 068/100] target/arm: Pass separate addend to FCMLA helpers
For SVE, we potentially have a 4th argument coming from the movprfx instruction. Currently we do not optimize movprfx, so the problem is not visible. Signed-off-by: Richard Henderson --- target/arm/helper.h | 20 ++--- target/arm/translate-a64.c | 27 ++ target/arm/translate-neon.inc.c | 10 --- target/arm/translate-sve.c | 5 ++-- target/arm/vec_helper.c | 50 + 5 files changed, 61 insertions(+), 51 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index dd32a11b9d..331f77c908 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -595,16 +595,16 @@ DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_fcaddd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_fcmlah, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_fcmlah_idx, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_fcmlas, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_fcmlas_idx, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_fcmlad, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(gvec_fcmlah, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(gvec_fcmlah_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(gvec_fcmlas, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(gvec_fcmlas_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(gvec_fcmlad, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 7366553f8d..341b11f98d 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -616,6 +616,22 @@ static void gen_gvec_op4_ool(DisasContext *s, bool is_q, int rd, int rn, is_q ? 16 : 8, vec_full_reg_size(s), data, fn); } +/* Expand a 4-operand + fpstatus pointer + simd data value operation using + * an out-of-line helper. + */ +static void gen_gvec_op4_fpst(DisasContext *s, bool is_q, int rd, int rn, + int rm, int ra, bool is_fp16, int data, + gen_helper_gvec_4_ptr *fn) +{ +TCGv_ptr fpst = get_fpstatus_ptr(is_fp16); +tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + vec_full_reg_offset(s, ra), fpst, + is_q ? 16 : 8, vec_full_reg_size(s), data, fn); +tcg_temp_free_ptr(fpst); +} + /* Set ZF and NF based on a 64 bit result. This is alas fiddlier * than the 32 bit equivalent. */ @@ -11732,15 +11748,15 @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn) rot = extract32(opcode, 0, 2); switch (size) { case 1: -gen_gvec_op3_fpst(s, is_q, rd, rn, rm, true, rot, +gen_gvec_op4_fpst(s, is_q, rd, rn, rm, rd, true, rot, gen_helper_gvec_fcmlah); break; case 2: -gen_gvec_op3_fpst(s, is_q, rd, rn, rm, false, rot, +gen_gvec_op4_fpst(s, is_q, rd, rn, rm, rd, false, rot, gen_helper_gvec_fcmlas); break; case 3: -gen_gvec_op3_fpst(s, is_q, rd, rn, rm, false, rot, +gen_gvec_op4_fpst(s, is_q, rd, rn, rm, rd, false, rot, gen_helper_gvec_fcmlad); break; default: @@ -12994,9 +13010,10 @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn) { int rot = extract32(insn, 13, 2); int data = (index << 2) | rot; -tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd), +tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn), - vec_full_reg_offset(s, rm), fpst, + vec_full_reg_offset(s, rm), + vec_full_reg_offset(s, rd), fpst, is_q ? 16 : 8, vec_full_reg_size(s), data, size == MO_64 ? gen_helper_gvec_fcmlas_idx diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c index d4556dfb4e..f79995cf50 100644 --- a/target/arm/translate-neon.inc.c +++ b/target/arm/translate-neon.inc.c @@ -58,7 +58,7 @@ static bool
[PATCH v2 056/100] target/arm: Implement SVE2 SUBHNB, SUBHNT
From: Stephen Long Signed-off-by: Stephen Long Message-Id: <20200417162231.10374-4-stepl...@quicinc.com> Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 8 target/arm/sve.decode | 2 ++ target/arm/sve_helper.c| 10 ++ target/arm/translate-sve.c | 3 +++ 4 files changed, 23 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 7627e0cd5f..82e23d6470 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2037,6 +2037,14 @@ DEF_HELPER_FLAGS_4(sve2_raddhnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_raddhnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_raddhnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_subhnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_subhnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_subhnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_subhnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_subhnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_subhnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve2_match_ppzz_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_match_ppzz_h, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index a33825066c..8ad2698bcf 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1326,6 +1326,8 @@ ADDHNB 01000101 .. 1 . 011 000 . . @rd_rn_rm ADDHNT 01000101 .. 1 . 011 001 . . @rd_rn_rm RADDHNB 01000101 .. 1 . 011 010 . . @rd_rn_rm RADDHNT 01000101 .. 1 . 011 011 . . @rd_rn_rm +SUBHNB 01000101 .. 1 . 011 100 . . @rd_rn_rm +SUBHNT 01000101 .. 1 . 011 101 . . @rd_rn_rm ### SVE2 Character Match diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 281a680134..0b490e8de6 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2115,6 +2115,7 @@ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ #define DO_ADDHN(N, M, SH) ((N + M) >> SH) #define DO_RADDHN(N, M, SH) ((N + M + ((__typeof(N))1 << (SH - 1))) >> SH) +#define DO_SUBHN(N, M, SH) ((N - M) >> SH) DO_BINOPNB(sve2_addhnb_h, uint16_t, uint8_t, 8, DO_ADDHN) DO_BINOPNB(sve2_addhnb_s, uint32_t, uint16_t, 16, DO_ADDHN) @@ -2132,6 +2133,15 @@ DO_BINOPNT(sve2_raddhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_RADDHN) DO_BINOPNT(sve2_raddhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_RADDHN) DO_BINOPNT(sve2_raddhnt_d, uint64_t, uint32_t, 32, , H1_4, DO_RADDHN) +DO_BINOPNB(sve2_subhnb_h, uint16_t, uint8_t, 8, DO_SUBHN) +DO_BINOPNB(sve2_subhnb_s, uint32_t, uint16_t, 16, DO_SUBHN) +DO_BINOPNB(sve2_subhnb_d, uint64_t, uint32_t, 32, DO_SUBHN) + +DO_BINOPNT(sve2_subhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_SUBHN) +DO_BINOPNT(sve2_subhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_SUBHN) +DO_BINOPNT(sve2_subhnt_d, uint64_t, uint32_t, 32, , H1_4, DO_SUBHN) + +#undef DO_SUBHN #undef DO_RADDHN #undef DO_ADDHN diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 7e3ba2e4f7..f584e06d87 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7038,6 +7038,9 @@ DO_SVE2_ZZZ_NARROW(ADDHNT, addhnt) DO_SVE2_ZZZ_NARROW(RADDHNB, raddhnb) DO_SVE2_ZZZ_NARROW(RADDHNT, raddhnt) +DO_SVE2_ZZZ_NARROW(SUBHNB, subhnb) +DO_SVE2_ZZZ_NARROW(SUBHNT, subhnt) + static bool do_sve2_ppzz_flags(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_flags_4 *fn) { -- 2.25.1
[PATCH v2 054/100] target/arm: Implement SVE2 ADDHNB, ADDHNT
From: Stephen Long Signed-off-by: Stephen Long Message-Id: <20200417162231.10374-2-stepl...@quicinc.com> Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 8 target/arm/sve.decode | 5 + target/arm/sve_helper.c| 36 target/arm/translate-sve.c | 13 + 4 files changed, 62 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 4029093564..84281e3f9d 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2021,6 +2021,14 @@ DEF_HELPER_FLAGS_3(sve2_uqrshrnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_uqrshrnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_uqrshrnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_addhnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_addhnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_addhnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_addhnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_addhnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_addhnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve2_match_ppzz_b, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_match_ppzz_h, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index a03d6107da..af9e87e88d 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1320,6 +1320,11 @@ UQSHRNT 01000101 .. 1 . 00 1101 . . @rd_rn_tszimm_shr UQRSHRNB01000101 .. 1 . 00 1110 . . @rd_rn_tszimm_shr UQRSHRNT01000101 .. 1 . 00 . . @rd_rn_tszimm_shr +## SVE2 integer add/subtract narrow high part + +ADDHNB 01000101 .. 1 . 011 000 . . @rd_rn_rm +ADDHNT 01000101 .. 1 . 011 001 . . @rd_rn_rm + ### SVE2 Character Match MATCH 01000101 .. 1 . 100 ... . 0 @pd_pg_rn_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index b4613d90dc..7f3dd2dfdb 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -2091,6 +2091,42 @@ DO_SHRNT(sve2_uqrshrnt_d, uint64_t, uint32_t, , H1_4, DO_UQRSHRN_D) #undef DO_SHRNB #undef DO_SHRNT +#define DO_BINOPNB(NAME, TYPEW, TYPEN, SHIFT, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +for (i = 0; i < opr_sz; i += sizeof(TYPEW)) { \ +TYPEW nn = *(TYPEW *)(vn + i); \ +TYPEW mm = *(TYPEW *)(vm + i); \ +*(TYPEW *)(vd + i) = (TYPEN)OP(nn, mm, SHIFT); \ +} \ +} + +#define DO_BINOPNT(NAME, TYPEW, TYPEN, SHIFT, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +for (i = 0; i < opr_sz; i += sizeof(TYPEW)) { \ +TYPEW nn = *(TYPEW *)(vn + HW(i)); \ +TYPEW mm = *(TYPEW *)(vm + HW(i)); \ +*(TYPEN *)(vd + HN(i + sizeof(TYPEN))) = OP(nn, mm, SHIFT); \ +} \ +} + +#define DO_ADDHN(N, M, SH) ((N + M) >> SH) + +DO_BINOPNB(sve2_addhnb_h, uint16_t, uint8_t, 8, DO_ADDHN) +DO_BINOPNB(sve2_addhnb_s, uint32_t, uint16_t, 16, DO_ADDHN) +DO_BINOPNB(sve2_addhnb_d, uint64_t, uint32_t, 32, DO_ADDHN) + +DO_BINOPNT(sve2_addhnt_h, uint16_t, uint8_t, 8, H1_2, H1, DO_ADDHN) +DO_BINOPNT(sve2_addhnt_s, uint32_t, uint16_t, 16, H1_4, H1_2, DO_ADDHN) +DO_BINOPNT(sve2_addhnt_d, uint64_t, uint32_t, 32, , H1_4, DO_ADDHN) + +#undef DO_ADDHN + +#undef DO_BINOPNB + /* Fully general four-operand expander, controlled by a predicate. */ #define DO_ZPZZZ(NAME, TYPE, H, OP) \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 0ad55ad243..7dc30ed1bd 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7023,6 +7023,19 @@ static bool trans_UQRSHRNT(DisasContext *s, arg_rri_esz *a) return do_sve2_shr_narrow(s, a, ops); } +#define DO_SVE2_ZZZ_NARROW(NAME, name)\ +static bool trans_##NAME(DisasContext *s, arg_rrr_esz *a) \ +{ \ +
[PATCH v2 053/100] target/arm: Implement SVE2 complex integer multiply-add
Signed-off-by: Richard Henderson --- v2: Fix do_sqrdmlah_d (laurent desnogues) --- target/arm/helper-sve.h| 18 target/arm/vec_internal.h | 5 + target/arm/sve.decode | 5 + target/arm/sve_helper.c| 42 ++ target/arm/translate-sve.c | 32 + target/arm/vec_helper.c| 15 +++--- 6 files changed, 109 insertions(+), 8 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 8fc8b856e7..4029093564 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2113,3 +2113,21 @@ DEF_HELPER_FLAGS_5(sve2_umlsl_zzzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_umlsl_zzzw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_cmla__b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_cmla__h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_cmla__s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_cmla__d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h index 372fe76523..38ce31b4ca 100644 --- a/target/arm/vec_internal.h +++ b/target/arm/vec_internal.h @@ -168,4 +168,9 @@ static inline int64_t do_suqrshl_d(int64_t src, int64_t shift, return do_uqrshl_d(src, shift, round, sat); } +int8_t do_sqrdmlah_b(int8_t, int8_t, int8_t, bool, bool); +int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, bool, uint32_t *); +int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *); +int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool); + #endif /* TARGET_ARM_VEC_INTERNALS_H */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 19c5013ddd..a03d6107da 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1362,3 +1362,8 @@ SMLSLB_zzzw 01000100 .. 0 . 010 100 . . @rda_rn_rm SMLSLT_zzzw 01000100 .. 0 . 010 101 . . @rda_rn_rm UMLSLB_zzzw 01000100 .. 0 . 010 110 . . @rda_rn_rm UMLSLT_zzzw 01000100 .. 0 . 010 111 . . @rda_rn_rm + +## SVE2 complex integer multiply-add + +CMLA_ 01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5 ra=%reg_movprfx +SQRDCMLAH_ 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index dbf378d214..b4613d90dc 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1448,6 +1448,48 @@ DO_SQDMLAL(sve2_sqdmlsl_zzzw_d, int64_t, int32_t, , H1_4, #undef DO_SQDMLAL +#define DO_CMLA(NAME, TYPE, H, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc) / sizeof(TYPE); \ +int rot = simd_data(desc); \ +int sel_a = rot & 1, sel_b = sel_a ^ 1; \ +bool sub_r = rot == 1 || rot == 2; \ +bool sub_i = rot >= 2; \ +TYPE *d = vd, *n = vn, *m = vm, *a = va;\ +for (i = 0; i < opr_sz; i += 2) { \ +TYPE elt1_a = n[H(i + sel_a)]; \ +TYPE elt2_a = m[H(i + sel_a)]; \ +TYPE elt2_b = m[H(i + sel_b)]; \ +d[H(i)] = OP(elt1_a, elt2_a, a[H(i)], sub_r); \ +d[H(i + 1)] = OP(elt1_a, elt2_b, a[H(i + 1)], sub_i); \ +} \ +} + +#define do_cmla(N, M, A, S) (A + (N * M) * (S ? -1 : 1)) + +DO_CMLA(sve2_cmla__b, uint8_t, H1, do_cmla) +DO_CMLA(sve2_cmla__h, uint16_t, H2, do_cmla) +DO_CMLA(sve2_cmla__s, uint32_t, H4, do_cmla) +DO_CMLA(sve2_cmla__d, uint64_t, , do_cmla) + +#define DO_SQRDMLAH_B(N, M, A, S) \ +do_sqrdmlah_b(N, M, A, S, true) +#define DO_SQRDMLAH_H(N, M, A, S) \ +({ uint32_t discard; do_sqrdmlah_h(N, M, A, S, true, ); }) +#define DO_SQRDMLAH_S(N, M, A, S) \ +({ uint32_t discard; do_sqrdmlah_s(N, M, A, S, true, ); }) +#define DO_SQRDMLAH_D(N, M, A, S) \ +do_sqrdmlah_d(N, M, A, S, true) + +DO_CMLA(sve2_sqrdcmlah__b, int8_t, H1, DO_SQRDMLAH_B)
[PATCH v2 062/100] target/arm: Implement SVE2 FMMLA
From: Stephen Long Signed-off-by: Stephen Long Fixed the errors Richard pointed out. Message-Id: <20200422165503.13511-1-stepl...@quicinc.com> [rth: Fix indexing in helpers, expand macro to straight functions.] Signed-off-by: Richard Henderson --- target/arm/cpu.h | 10 ++ target/arm/helper-sve.h| 3 ++ target/arm/sve.decode | 4 +++ target/arm/sve_helper.c| 74 ++ target/arm/translate-sve.c | 33 + 5 files changed, 124 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 25ca3aed67..331c5cdd4b 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -3877,6 +3877,16 @@ static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id) return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0; } +static inline bool isar_feature_aa64_sve2_f32mm(const ARMISARegisters *id) +{ +return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, F32MM) != 0; +} + +static inline bool isar_feature_aa64_sve2_f64mm(const ARMISARegisters *id) +{ +return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, F64MM) != 0; +} + /* * Feature tests for "does this exist in either 32-bit or 64-bit?" */ diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 9f6095c884..2929ad48a7 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2174,3 +2174,6 @@ DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(fmmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(fmmla_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 1b5bd2d193..11e724d3a2 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1389,6 +1389,10 @@ UMLSLT_zzzw 01000100 .. 0 . 010 111 . . @rda_rn_rm CMLA_ 01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5 ra=%reg_movprfx SQRDCMLAH_ 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx +### SVE2 floating point matrix multiply accumulate + +FMMLA 01100100 .. 1 . 111001 . . @rda_rn_rm + ### SVE2 Memory Gather Load Group # SVE2 64-bit gather non-temporal load diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index a6c5ff8f79..b1bb2300f8 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -6823,3 +6823,77 @@ void HELPER(sve2_xar_s)(void *vd, void *vn, void *vm, uint32_t desc) d[i] = ror32(n[i] ^ m[i], shr); } } + +void HELPER(fmmla_s)(void *vd, void *vn, void *vm, void *va, + void *status, uint32_t desc) +{ +intptr_t s, opr_sz = simd_oprsz(desc) / (sizeof(float32) * 4); + +for (s = 0; s < opr_sz; ++s) { +float32 *n = vn + s * sizeof(float32) * 4; +float32 *m = vm + s * sizeof(float32) * 4; +float32 *a = va + s * sizeof(float32) * 4; +float32 *d = vd + s * sizeof(float32) * 4; +float32 n00 = n[H4(0)], n01 = n[H4(1)]; +float32 n10 = n[H4(2)], n11 = n[H4(3)]; +float32 m00 = m[H4(0)], m01 = m[H4(1)]; +float32 m10 = m[H4(2)], m11 = m[H4(3)]; +float32 p0, p1; + +/* i = 0, j = 0 */ +p0 = float32_mul(n00, m00, status); +p1 = float32_mul(n01, m01, status); +d[H4(0)] = float32_add(a[H4(0)], float32_add(p0, p1, status), status); + +/* i = 0, j = 1 */ +p0 = float32_mul(n00, m10, status); +p1 = float32_mul(n01, m11, status); +d[H4(1)] = float32_add(a[H4(1)], float32_add(p0, p1, status), status); + +/* i = 1, j = 0 */ +p0 = float32_mul(n10, m00, status); +p1 = float32_mul(n11, m01, status); +d[H4(2)] = float32_add(a[H4(2)], float32_add(p0, p1, status), status); + +/* i = 1, j = 1 */ +p0 = float32_mul(n10, m10, status); +p1 = float32_mul(n11, m11, status); +d[H4(3)] = float32_add(a[H4(3)], float32_add(p0, p1, status), status); +} +} + +void HELPER(fmmla_d)(void *vd, void *vn, void *vm, void *va, + void *status, uint32_t desc) +{ +intptr_t s, opr_sz = simd_oprsz(desc) / (sizeof(float64) * 4); + +for (s = 0; s < opr_sz; ++s) { +float64 *n = vn + s * sizeof(float64) * 4; +float64 *m = vm + s * sizeof(float64) * 4; +float64 *a = va + s * sizeof(float64) * 4; +float64 *d = vd + s * sizeof(float64) * 4; +float64 n00 = n[0], n01 = n[1], n10 = n[2], n11 = n[3]; +float64 m00 = m[0], m01 = m[1], m10 = m[2], m11 = m[3]; +float64 p0, p1; + +/* i = 0, j = 0 */ +p0 = float64_mul(n00, m00, status); +p1 = float64_mul(n01, m01, status); +d[0] = float64_add(a[0], float64_add(p0, p1, status), status); + +/* i = 0, j = 1 */ +p0 = float64_mul(n00, m10, status);
[PATCH v2 052/100] target/arm: Implement SVE2 integer multiply-add long
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 28 ++ target/arm/sve.decode | 11 ++ target/arm/sve_helper.c| 18 + target/arm/translate-sve.c | 76 ++ 4 files changed, 133 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index f85b7be12e..8fc8b856e7 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2085,3 +2085,31 @@ DEF_HELPER_FLAGS_5(sve2_sqdmlsl_zzzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_sqdmlsl_zzzw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_smlal_zzzw_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_smlal_zzzw_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_smlal_zzzw_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_umlal_zzzw_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_umlal_zzzw_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_umlal_zzzw_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_smlsl_zzzw_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_smlsl_zzzw_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_smlsl_zzzw_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_umlsl_zzzw_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_umlsl_zzzw_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_umlsl_zzzw_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 177b3cc803..19c5013ddd 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1351,3 +1351,14 @@ SQDMLSLBT 01000100 .. 0 . 1 1 . . @rda_rn_rm SQRDMLAH_ 01000100 .. 0 . 01110 0 . . @rda_rn_rm SQRDMLSH_ 01000100 .. 0 . 01110 1 . . @rda_rn_rm + +## SVE2 integer multiply-add long + +SMLALB_zzzw 01000100 .. 0 . 010 000 . . @rda_rn_rm +SMLALT_zzzw 01000100 .. 0 . 010 001 . . @rda_rn_rm +UMLALB_zzzw 01000100 .. 0 . 010 010 . . @rda_rn_rm +UMLALT_zzzw 01000100 .. 0 . 010 011 . . @rda_rn_rm +SMLSLB_zzzw 01000100 .. 0 . 010 100 . . @rda_rn_rm +SMLSLT_zzzw 01000100 .. 0 . 010 101 . . @rda_rn_rm +UMLSLB_zzzw 01000100 .. 0 . 010 110 . . @rda_rn_rm +UMLSLT_zzzw 01000100 .. 0 . 010 111 . . @rda_rn_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 4c8b0fe9f1..dbf378d214 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1308,6 +1308,24 @@ DO_ZZZW_ACC(sve2_uabal_h, uint16_t, uint8_t, H1_2, H1, DO_ABD) DO_ZZZW_ACC(sve2_uabal_s, uint32_t, uint16_t, H1_4, H1_2, DO_ABD) DO_ZZZW_ACC(sve2_uabal_d, uint64_t, uint32_t, , H1_4, DO_ABD) +DO_ZZZW_ACC(sve2_smlal_zzzw_h, int16_t, int8_t, H1_2, H1, DO_MUL) +DO_ZZZW_ACC(sve2_smlal_zzzw_s, int32_t, int16_t, H1_4, H1_2, DO_MUL) +DO_ZZZW_ACC(sve2_smlal_zzzw_d, int64_t, int32_t, , H1_4, DO_MUL) + +DO_ZZZW_ACC(sve2_umlal_zzzw_h, uint16_t, uint8_t, H1_2, H1, DO_MUL) +DO_ZZZW_ACC(sve2_umlal_zzzw_s, uint32_t, uint16_t, H1_4, H1_2, DO_MUL) +DO_ZZZW_ACC(sve2_umlal_zzzw_d, uint64_t, uint32_t, , H1_4, DO_MUL) + +#define DO_NMUL(N, M) -(N * M) + +DO_ZZZW_ACC(sve2_smlsl_zzzw_h, int16_t, int8_t, H1_2, H1, DO_NMUL) +DO_ZZZW_ACC(sve2_smlsl_zzzw_s, int32_t, int16_t, H1_4, H1_2, DO_NMUL) +DO_ZZZW_ACC(sve2_smlsl_zzzw_d, int64_t, int32_t, , H1_4, DO_NMUL) + +DO_ZZZW_ACC(sve2_umlsl_zzzw_h, uint16_t, uint8_t, H1_2, H1, DO_NMUL) +DO_ZZZW_ACC(sve2_umlsl_zzzw_s, uint32_t, uint16_t, H1_4, H1_2, DO_NMUL) +DO_ZZZW_ACC(sve2_umlsl_zzzw_d, uint64_t, uint32_t, , H1_4, DO_NMUL) + #undef DO_ZZZW_ACC #define DO_XTNB(NAME, TYPE, OP) \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 28dc89c3a4..054c9d4799 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7141,3 +7141,79 @@ static bool trans_SQRDMLSH_(DisasContext *s, arg__esz *a) }; return do_sve2__ool(s, a, fns[a->esz], 0); } + +static bool do_smlal_zzzw(DisasContext *s, arg__esz *a, bool sel) +{ +static gen_helper_gvec_4 * const fns[] = { +NULL, gen_helper_sve2_smlal_zzzw_h, +gen_helper_sve2_smlal_zzzw_s, gen_helper_sve2_smlal_zzzw_d, +}; +return do_sve2__ool(s, a, fns[a->esz], sel); +} + +static bool trans_SMLALB_zzzw(DisasContext *s, arg__esz *a) +{ +
[PATCH v2 047/100] target/arm: Implement SVE2 bitwise ternary operations
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 6 ++ target/arm/sve.decode | 12 +++ target/arm/sve_helper.c| 50 + target/arm/translate-sve.c | 213 + 4 files changed, 281 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 4c4a8b27b1..7e159fd0ef 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2055,3 +2055,9 @@ DEF_HELPER_FLAGS_6(sve2_fminp_zpzz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sve2_fminp_zpzz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_eor3, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_bcax, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_bsl1n, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_bsl2n, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_nbsl, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 19d503e2f4..a50afd40c2 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -124,6 +124,10 @@ @rda_rn_rm esz:2 . rm:5 ... ... rn:5 rd:5 \ _esz ra=%reg_movprfx +# Four operand with unused vector element size +@rdn_ra_rm_e0 ... rm:5 ... ... ra:5 rd:5 \ +_esz esz=0 rn=%reg_movprfx + # Three operand with "memory" size, aka immediate left shift @rd_rn_msz_rm ... rm:5 imm:2 rn:5 rd:5 @@ -379,6 +383,14 @@ ORR_zzz 0100 01 1 . 001 100 . . @rd_rn_rm_e0 EOR_zzz 0100 10 1 . 001 100 . . @rd_rn_rm_e0 BIC_zzz 0100 11 1 . 001 100 . . @rd_rn_rm_e0 +# SVE2 bitwise ternary operations +EOR30100 00 1 . 001 110 . . @rdn_ra_rm_e0 +BSL 0100 00 1 . 001 111 . . @rdn_ra_rm_e0 +BCAX0100 01 1 . 001 110 . . @rdn_ra_rm_e0 +BSL1N 0100 01 1 . 001 111 . . @rdn_ra_rm_e0 +BSL2N 0100 10 1 . 001 111 . . @rdn_ra_rm_e0 +NBSL0100 11 1 . 001 111 . . @rdn_ra_rm_e0 + ### SVE Index Generation Group # SVE index generation (immediate start, immediate increment) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 5d80dc8c58..c690e86d1d 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -6390,3 +6390,53 @@ DO_ST1_ZPZ_D(dd_be, zd, MO_64) #undef DO_ST1_ZPZ_S #undef DO_ST1_ZPZ_D + +void HELPER(sve2_eor3)(void *vd, void *vn, void *vm, void *vk, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc) / 8; +uint64_t *d = vd, *n = vn, *m = vm, *k = vk; + +for (i = 0; i < opr_sz; ++i) { +d[i] = n[i] ^ m[i] ^ k[i]; +} +} + +void HELPER(sve2_bcax)(void *vd, void *vn, void *vm, void *vk, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc) / 8; +uint64_t *d = vd, *n = vn, *m = vm, *k = vk; + +for (i = 0; i < opr_sz; ++i) { +d[i] = n[i] ^ (m[i] & ~k[i]); +} +} + +void HELPER(sve2_bsl1n)(void *vd, void *vn, void *vm, void *vk, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc) / 8; +uint64_t *d = vd, *n = vn, *m = vm, *k = vk; + +for (i = 0; i < opr_sz; ++i) { +d[i] = (~n[i] & k[i]) | (m[i] & ~k[i]); +} +} + +void HELPER(sve2_bsl2n)(void *vd, void *vn, void *vm, void *vk, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc) / 8; +uint64_t *d = vd, *n = vn, *m = vm, *k = vk; + +for (i = 0; i < opr_sz; ++i) { +d[i] = (n[i] & k[i]) | (~m[i] & ~k[i]); +} +} + +void HELPER(sve2_nbsl)(void *vd, void *vn, void *vm, void *vk, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc) / 8; +uint64_t *d = vd, *n = vn, *m = vm, *k = vk; + +for (i = 0; i < opr_sz; ++i) { +d[i] = ~((n[i] & k[i]) | (m[i] & ~k[i])); +} +} diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 97e26c8ff5..6fada01d22 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -217,6 +217,17 @@ static void gen_gvec_fn_zzz(DisasContext *s, GVecGen3Fn *gvec_fn, vec_full_reg_offset(s, rm), vsz, vsz); } +/* Invoke a vector expander on four Zregs. */ +static void gen_gvec_fn_(DisasContext *s, GVecGen4Fn *gvec_fn, + int esz, int rd, int rn, int rm, int ra) +{ +unsigned vsz = vec_full_reg_size(s); +gvec_fn(esz, vec_full_reg_offset(s, rd), +vec_full_reg_offset(s, rn), +vec_full_reg_offset(s, rm), +vec_full_reg_offset(s, ra), vsz, vsz); +} + /* Invoke a vector move on two Zregs. */ static bool do_mov_z(DisasContext *s, int rd, int rn) { @@ -329,6 +340,208 @@ static bool
[PATCH v2 059/100] target/arm: Implement SVE2 XAR
In addition, use the same vector generator interface for AdvSIMD. This fixes a bug in which the AdvSIMD insn failed to clear the high bits of the SVE register. Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 4 ++ target/arm/helper.h| 2 + target/arm/translate-a64.h | 3 ++ target/arm/sve.decode | 4 ++ target/arm/sve_helper.c| 39 ++ target/arm/translate-a64.c | 25 ++--- target/arm/translate-sve.c | 104 + target/arm/vec_helper.c| 12 + 8 files changed, 172 insertions(+), 21 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 1d5d272c5c..9f6095c884 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2070,6 +2070,10 @@ DEF_HELPER_FLAGS_5(sve2_histcnt_d, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_4(sve2_histseg, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_xar_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_xar_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_xar_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_s, TCG_CALL_NO_RWG, diff --git a/target/arm/helper.h b/target/arm/helper.h index 643fc3a017..7a29194052 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -783,6 +783,8 @@ DEF_HELPER_FLAGS_4(gvec_uaba_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_uaba_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_uaba_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_xar_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h index da0f59a2ce..e54e297c90 100644 --- a/target/arm/translate-a64.h +++ b/target/arm/translate-a64.h @@ -117,5 +117,8 @@ bool disas_sve(DisasContext *, uint32_t); void gen_gvec_rax1(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_xar(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t rm_ofs, int64_t shift, + uint32_t opr_sz, uint32_t max_sz); #endif /* TARGET_ARM_TRANSLATE_A64_H */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 0edb72d4fb..a375ce31f1 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -65,6 +65,7 @@ _dbm rd rn dbm rd rn rm imm _eszrd rn imm esz +_esz rd rn rm imm esz _eszrd rn rm esz _eszrd pg rn esz _s rd pg rn s @@ -384,6 +385,9 @@ ORR_zzz 0100 01 1 . 001 100 . . @rd_rn_rm_e0 EOR_zzz 0100 10 1 . 001 100 . . @rd_rn_rm_e0 BIC_zzz 0100 11 1 . 001 100 . . @rd_rn_rm_e0 +XAR 0100 .. 1 . 001 101 rm:5 rd:5 _esz \ +rn=%reg_movprfx esz=%tszimm16_esz imm=%tszimm16_shr + # SVE2 bitwise ternary operations EOR30100 00 1 . 001 110 . . @rdn_ra_rm_e0 BSL 0100 00 1 . 001 111 . . @rdn_ra_rm_e0 diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index bc1c3ce1f0..a6c5ff8f79 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -6784,3 +6784,42 @@ void HELPER(sve2_histseg)(void *vd, void *vn, void *vm, uint32_t desc) *(uint64_t *)(vd + i + 8) = out1; } } + +void HELPER(sve2_xar_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc) / 8; +int shr = simd_data(desc); +int shl = 8 - shr; +uint64_t mask = dup_const(MO_8, 0xff >> shr); +uint64_t *d = vd, *n = vn, *m = vm; + +for (i = 0; i < opr_sz; ++i) { +uint64_t t = n[i] ^ m[i]; +d[i] = ((t >> shr) & mask) | ((t << shl) & ~mask); +} +} + +void HELPER(sve2_xar_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc) / 8; +int shr = simd_data(desc); +int shl = 16 - shr; +uint64_t mask = dup_const(MO_16, 0x >> shr); +uint64_t *d = vd, *n = vn, *m = vm; + +for (i = 0; i < opr_sz; ++i) { +uint64_t t = n[i] ^ m[i]; +d[i] = ((t >> shr) & mask) | ((t << shl) & ~mask); +} +} + +void HELPER(sve2_xar_s)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc) / 4; +int shr = simd_data(desc); +uint32_t *d = vd, *n = vn, *m = vm; + +for (i = 0; i < opr_sz; ++i) { +d[i] = ror32(n[i] ^ m[i], shr); +} +} diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index b80ee9f734..4f5c433b47 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -13829,8
[PATCH v2 051/100] target/arm: Implement SVE2 saturating multiply-add high
SVE2 has two additional sizes of the operation and unlike NEON, there is no saturation flag. Create new entry points for SVE2 that do not set QC. Signed-off-by: Richard Henderson --- target/arm/helper.h| 17 target/arm/sve.decode | 5 ++ target/arm/translate-sve.c | 18 target/arm/vec_helper.c| 163 +++-- 4 files changed, 196 insertions(+), 7 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index 236fa438c6..643fc3a017 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -557,6 +557,23 @@ DEF_HELPER_FLAGS_5(gvec_qrdmlah_s32, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlah_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlah_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlah_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlah_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(gvec_sdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_udot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_sdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index d0d24978bb..177b3cc803 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1346,3 +1346,8 @@ SQDMLSLT_zzzw 01000100 .. 0 . 0110 11 . . @rda_rn_rm SQDMLALBT 01000100 .. 0 . 1 0 . . @rda_rn_rm SQDMLSLBT 01000100 .. 0 . 1 1 . . @rda_rn_rm + +## SVE2 saturating multiply-add high + +SQRDMLAH_ 01000100 .. 0 . 01110 0 . . @rda_rn_rm +SQRDMLSH_ 01000100 .. 0 . 01110 1 . . @rda_rn_rm diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 00488915aa..28dc89c3a4 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7123,3 +7123,21 @@ static bool trans_SQDMLSLBT(DisasContext *s, arg__esz *a) { return do_sqdmlsl_zzzw(s, a, false, true); } + +static bool trans_SQRDMLAH_(DisasContext *s, arg__esz *a) +{ +static gen_helper_gvec_4 * const fns[] = { +gen_helper_sve2_sqrdmlah_b, gen_helper_sve2_sqrdmlah_h, +gen_helper_sve2_sqrdmlah_s, gen_helper_sve2_sqrdmlah_d, +}; +return do_sve2__ool(s, a, fns[a->esz], 0); +} + +static bool trans_SQRDMLSH_(DisasContext *s, arg__esz *a) +{ +static gen_helper_gvec_4 * const fns[] = { +gen_helper_sve2_sqrdmlsh_b, gen_helper_sve2_sqrdmlsh_h, +gen_helper_sve2_sqrdmlsh_s, gen_helper_sve2_sqrdmlsh_d, +}; +return do_sve2__ool(s, a, fns[a->esz], 0); +} diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index b0ae51f95f..4b7afd7be5 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -22,6 +22,7 @@ #include "exec/helper-proto.h" #include "tcg/tcg-gvec-desc.h" #include "fpu/softfloat.h" +#include "qemu/int128.h" #include "vec_internal.h" /* Note that vector data is stored in host-endian 64-bit chunks, @@ -36,15 +37,55 @@ #define H4(x) (x) #endif +/* Signed saturating rounding doubling multiply-accumulate high half, 8-bit */ +static int8_t do_sqrdmlah_b(int8_t src1, int8_t src2, int8_t src3, +bool neg, bool round) +{ +/* + * Simplify: + * = ((a3 << 8) + ((e1 * e2) << 1) + (round << 7)) >> 8 + * = ((a3 << 7) + (e1 * e2) + (round << 6)) >> 7 + */ +int32_t ret = (int32_t)src1 * src2; +if (neg) { +ret = -ret; +} +ret += ((int32_t)src3 << 7) + (round << 6); +ret >>= 7; + +if (ret != (int8_t)ret) { +ret = (ret < 0 ? INT8_MIN : INT8_MAX); +} +return ret; +} + +void HELPER(sve2_sqrdmlah_b)(void *vd, void *vn, void *vm, + void *va, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int8_t *d = vd, *n = vn, *m = vm, *a = va; + +for (i = 0; i < opr_sz; ++i) { +d[i] = do_sqrdmlah_b(n[i], m[i], a[i], false, true); +} +} + +void HELPER(sve2_sqrdmlsh_b)(void *vd, void *vn, void *vm, + void *va, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int8_t *d = vd, *n = vn, *m = vm, *a = va; + +for (i = 0; i < opr_sz; ++i) { +d[i] = do_sqrdmlah_b(n[i], m[i], a[i], true,
[PATCH v2 049/100] target/arm: Implement SVE2 saturating multiply-add long
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 14 ++ target/arm/sve.decode | 14 ++ target/arm/sve_helper.c| 30 + target/arm/translate-sve.c | 54 ++ 4 files changed, 112 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 78172cb281..f85b7be12e 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2071,3 +2071,17 @@ DEF_HELPER_FLAGS_5(sve2_bcax, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_bsl1n, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_bsl2n, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_nbsl, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_sqdmlal_zzzw_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqdmlal_zzzw_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqdmlal_zzzw_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_sqdmlsl_zzzw_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqdmlsl_zzzw_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqdmlsl_zzzw_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 2207693d28..d0d24978bb 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1332,3 +1332,17 @@ FMAXNMP 01100100 .. 010 10 0 100 ... . . @rdn_pg_rm FMINNMP 01100100 .. 010 10 1 100 ... . . @rdn_pg_rm FMAXP 01100100 .. 010 11 0 100 ... . . @rdn_pg_rm FMINP 01100100 .. 010 11 1 100 ... . . @rdn_pg_rm + + SVE Integer Multiply-Add (unpredicated) + +## SVE2 saturating multiply-add long + +SQDMLALB_zzzw 01000100 .. 0 . 0110 00 . . @rda_rn_rm +SQDMLALT_zzzw 01000100 .. 0 . 0110 01 . . @rda_rn_rm +SQDMLSLB_zzzw 01000100 .. 0 . 0110 10 . . @rda_rn_rm +SQDMLSLT_zzzw 01000100 .. 0 . 0110 11 . . @rda_rn_rm + +## SVE2 saturating multiply-add interleaved long + +SQDMLALBT 01000100 .. 0 . 1 0 . . @rda_rn_rm +SQDMLSLBT 01000100 .. 0 . 1 1 . . @rda_rn_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 31538e4720..4c8b0fe9f1 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1400,6 +1400,36 @@ void HELPER(sve2_adcl_d)(void *vd, void *vn, void *vm, void *va, uint32_t desc) } } +#define DO_SQDMLAL(NAME, TYPEW, TYPEN, HW, HN, DMUL_OP, SUM_OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +int sel1 = extract32(desc, SIMD_DATA_SHIFT, 1) * sizeof(TYPEN); \ +int sel2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(TYPEN); \ +for (i = 0; i < opr_sz; i += sizeof(TYPEW)) { \ +TYPEW nn = *(TYPEN *)(vn + HN(i + sel1)); \ +TYPEW mm = *(TYPEN *)(vm + HN(i + sel2)); \ +TYPEW aa = *(TYPEW *)(va + HW(i)); \ +*(TYPEW *)(vd + HW(i)) = SUM_OP(aa, DMUL_OP(nn, mm)); \ +} \ +} + +DO_SQDMLAL(sve2_sqdmlal_zzzw_h, int16_t, int8_t, H1_2, H1, + do_sqdmull_h, DO_SQADD_H) +DO_SQDMLAL(sve2_sqdmlal_zzzw_s, int32_t, int16_t, H1_4, H1_2, + do_sqdmull_s, DO_SQADD_S) +DO_SQDMLAL(sve2_sqdmlal_zzzw_d, int64_t, int32_t, , H1_4, + do_sqdmull_d, do_sqadd_d) + +DO_SQDMLAL(sve2_sqdmlsl_zzzw_h, int16_t, int8_t, H1_2, H1, + do_sqdmull_h, DO_SQSUB_H) +DO_SQDMLAL(sve2_sqdmlsl_zzzw_s, int32_t, int16_t, H1_4, H1_2, + do_sqdmull_s, DO_SQSUB_S) +DO_SQDMLAL(sve2_sqdmlsl_zzzw_d, int64_t, int32_t, , H1_4, + do_sqdmull_d, do_sqsub_d) + +#undef DO_SQDMLAL + #define DO_BITPERM(NAME, TYPE, OP) \ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ { \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 15546d9ad2..00488915aa 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7069,3 +7069,57 @@ DO_SVE2_ZPZZ_FP(FMAXNMP, fmaxnmp) DO_SVE2_ZPZZ_FP(FMINNMP, fminnmp) DO_SVE2_ZPZZ_FP(FMAXP, fmaxp) DO_SVE2_ZPZZ_FP(FMINP, fminp) + +/* + * SVE Integer Multiply-Add (unpredicated) + */ + +static bool do_sqdmlal_zzzw(DisasContext *s, arg__esz *a, +bool sel1, bool sel2) +{ +static gen_helper_gvec_4 * const fns[] = { +NULL,
[PATCH v2 041/100] target/arm: Implement SVE2 SHRN, RSHRN
Signed-off-by: Richard Henderson --- v2: Fix typo in gen_shrnb_vec (laurent desnogues) --- target/arm/helper-sve.h| 16 target/arm/sve.decode | 8 ++ target/arm/sve_helper.c| 45 ++- target/arm/translate-sve.c | 160 + 4 files changed, 227 insertions(+), 2 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index cad45b0f16..3a7d7ff66d 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1956,6 +1956,22 @@ DEF_HELPER_FLAGS_3(sve2_sqxtunt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_sqxtunt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_sqxtunt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_shrnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_shrnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_shrnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_shrnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_shrnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_shrnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_rshrnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_rshrnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_rshrnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_rshrnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_rshrnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_rshrnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_s, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 54657d996a..7cc4b6cc43 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1285,6 +1285,14 @@ UQXTNT 01000101 .. 1 . 010 011 . . @rd_rn_tszimm_shl SQXTUNB 01000101 .. 1 . 010 100 . . @rd_rn_tszimm_shl SQXTUNT 01000101 .. 1 . 010 101 . . @rd_rn_tszimm_shl +## SVE2 bitwise shift right narrow + +# Bit 23 == 0 is handled by esz > 0 in the translator. +SHRNB 01000101 .. 1 . 00 0100 . . @rd_rn_tszimm_shr +SHRNT 01000101 .. 1 . 00 0101 . . @rd_rn_tszimm_shr +RSHRNB 01000101 .. 1 . 00 0110 . . @rd_rn_tszimm_shr +RSHRNT 01000101 .. 1 . 00 0111 . . @rd_rn_tszimm_shr + ## SVE2 floating-point pairwise operations FADDP 01100100 .. 010 00 0 100 ... . . @rdn_pg_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 27ba4e81fb..9b3d0d2ddd 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1882,12 +1882,53 @@ DO_ZPZI(sve_asrd_h, int16_t, H1_2, DO_ASRD) DO_ZPZI(sve_asrd_s, int32_t, H1_4, DO_ASRD) DO_ZPZI_D(sve_asrd_d, int64_t, DO_ASRD) -#undef DO_SHR -#undef DO_SHL #undef DO_ASRD #undef DO_ZPZI #undef DO_ZPZI_D +#define DO_SHRNB(NAME, TYPEW, TYPEN, OP) \ +void HELPER(NAME)(void *vd, void *vn, uint32_t desc) \ +{\ +intptr_t i, opr_sz = simd_oprsz(desc); \ +int shift = simd_data(desc); \ +for (i = 0; i < opr_sz; i += sizeof(TYPEW)) {\ +TYPEW nn = *(TYPEW *)(vn + i); \ +*(TYPEW *)(vd + i) = (TYPEN)OP(nn, shift); \ +}\ +} + +#define DO_SHRNT(NAME, TYPEW, TYPEN, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc);\ +int shift = simd_data(desc); \ +for (i = 0; i < opr_sz; i += sizeof(TYPEW)) { \ +TYPEW nn = *(TYPEW *)(vn + HW(i));\ +*(TYPEN *)(vd + HN(i + sizeof(TYPEN))) = OP(nn, shift); \ +} \ +} + +DO_SHRNB(sve2_shrnb_h, uint16_t, uint8_t, DO_SHR) +DO_SHRNB(sve2_shrnb_s, uint32_t, uint16_t, DO_SHR) +DO_SHRNB(sve2_shrnb_d, uint64_t, uint32_t, DO_SHR) + +DO_SHRNT(sve2_shrnt_h, uint16_t, uint8_t, H1_2, H1, DO_SHR) +DO_SHRNT(sve2_shrnt_s, uint32_t, uint16_t, H1_4, H1_2, DO_SHR) +DO_SHRNT(sve2_shrnt_d, uint64_t, uint32_t, , H1_4, DO_SHR) + +#define DO_RSHR(x, sh) ((x >> sh) + ((x >> (sh - 1)) & 1)) + +DO_SHRNB(sve2_rshrnb_h, uint16_t, uint8_t, DO_RSHR) +DO_SHRNB(sve2_rshrnb_s, uint32_t, uint16_t, DO_RSHR) +DO_SHRNB(sve2_rshrnb_d, uint64_t, uint32_t, DO_RSHR) + +DO_SHRNT(sve2_rshrnt_h, uint16_t, uint8_t, H1_2, H1, DO_RSHR) +DO_SHRNT(sve2_rshrnt_s, uint32_t, uint16_t, H1_4, H1_2, DO_RSHR)
[PATCH v2 050/100] target/arm: Generalize inl_qrdmlah_* helper functions
Unify add/sub helpers and add a parameter for rounding. This will allow saturating non-rounding to reuse this code. Signed-off-by: Richard Henderson --- target/arm/vec_helper.c | 80 +++-- 1 file changed, 29 insertions(+), 51 deletions(-) diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 32b1aace3d..b0ae51f95f 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -37,19 +37,24 @@ #endif /* Signed saturating rounding doubling multiply-accumulate high half, 16-bit */ -static int16_t inl_qrdmlah_s16(int16_t src1, int16_t src2, - int16_t src3, uint32_t *sat) +static int16_t do_sqrdmlah_h(int16_t src1, int16_t src2, int16_t src3, + bool neg, bool round, uint32_t *sat) { -/* Simplify: +/* + * Simplify: * = ((a3 << 16) + ((e1 * e2) << 1) + (1 << 15)) >> 16 * = ((a3 << 15) + (e1 * e2) + (1 << 14)) >> 15 */ int32_t ret = (int32_t)src1 * src2; -ret = ((int32_t)src3 << 15) + ret + (1 << 14); +if (neg) { +ret = -ret; +} +ret += ((int32_t)src3 << 15) + (round << 14); ret >>= 15; + if (ret != (int16_t)ret) { *sat = 1; -ret = (ret < 0 ? -0x8000 : 0x7fff); +ret = (ret < 0 ? INT16_MIN : INT16_MAX); } return ret; } @@ -58,8 +63,9 @@ uint32_t HELPER(neon_qrdmlah_s16)(CPUARMState *env, uint32_t src1, uint32_t src2, uint32_t src3) { uint32_t *sat = >vfp.qc[0]; -uint16_t e1 = inl_qrdmlah_s16(src1, src2, src3, sat); -uint16_t e2 = inl_qrdmlah_s16(src1 >> 16, src2 >> 16, src3 >> 16, sat); +uint16_t e1 = do_sqrdmlah_h(src1, src2, src3, false, true, sat); +uint16_t e2 = do_sqrdmlah_h(src1 >> 16, src2 >> 16, src3 >> 16, +false, true, sat); return deposit32(e1, 16, 16, e2); } @@ -73,35 +79,18 @@ void HELPER(gvec_qrdmlah_s16)(void *vd, void *vn, void *vm, uintptr_t i; for (i = 0; i < opr_sz / 2; ++i) { -d[i] = inl_qrdmlah_s16(n[i], m[i], d[i], vq); +d[i] = do_sqrdmlah_h(n[i], m[i], d[i], false, true, vq); } clear_tail(d, opr_sz, simd_maxsz(desc)); } -/* Signed saturating rounding doubling multiply-subtract high half, 16-bit */ -static int16_t inl_qrdmlsh_s16(int16_t src1, int16_t src2, - int16_t src3, uint32_t *sat) -{ -/* Similarly, using subtraction: - * = ((a3 << 16) - ((e1 * e2) << 1) + (1 << 15)) >> 16 - * = ((a3 << 15) - (e1 * e2) + (1 << 14)) >> 15 - */ -int32_t ret = (int32_t)src1 * src2; -ret = ((int32_t)src3 << 15) - ret + (1 << 14); -ret >>= 15; -if (ret != (int16_t)ret) { -*sat = 1; -ret = (ret < 0 ? -0x8000 : 0x7fff); -} -return ret; -} - uint32_t HELPER(neon_qrdmlsh_s16)(CPUARMState *env, uint32_t src1, uint32_t src2, uint32_t src3) { uint32_t *sat = >vfp.qc[0]; -uint16_t e1 = inl_qrdmlsh_s16(src1, src2, src3, sat); -uint16_t e2 = inl_qrdmlsh_s16(src1 >> 16, src2 >> 16, src3 >> 16, sat); +uint16_t e1 = do_sqrdmlah_h(src1, src2, src3, true, true, sat); +uint16_t e2 = do_sqrdmlah_h(src1 >> 16, src2 >> 16, src3 >> 16, +true, true, sat); return deposit32(e1, 16, 16, e2); } @@ -115,19 +104,23 @@ void HELPER(gvec_qrdmlsh_s16)(void *vd, void *vn, void *vm, uintptr_t i; for (i = 0; i < opr_sz / 2; ++i) { -d[i] = inl_qrdmlsh_s16(n[i], m[i], d[i], vq); +d[i] = do_sqrdmlah_h(n[i], m[i], d[i], true, true, vq); } clear_tail(d, opr_sz, simd_maxsz(desc)); } /* Signed saturating rounding doubling multiply-accumulate high half, 32-bit */ -static int32_t inl_qrdmlah_s32(int32_t src1, int32_t src2, - int32_t src3, uint32_t *sat) +static int32_t do_sqrdmlah_s(int32_t src1, int32_t src2, int32_t src3, + bool neg, bool round, uint32_t *sat) { /* Simplify similarly to int_qrdmlah_s16 above. */ int64_t ret = (int64_t)src1 * src2; -ret = ((int64_t)src3 << 31) + ret + (1 << 30); +if (neg) { +ret = -ret; +} +ret = ((int64_t)src3 << 31) + (round << 30); ret >>= 31; + if (ret != (int32_t)ret) { *sat = 1; ret = (ret < 0 ? INT32_MIN : INT32_MAX); @@ -139,7 +132,7 @@ uint32_t HELPER(neon_qrdmlah_s32)(CPUARMState *env, int32_t src1, int32_t src2, int32_t src3) { uint32_t *sat = >vfp.qc[0]; -return inl_qrdmlah_s32(src1, src2, src3, sat); +return do_sqrdmlah_s(src1, src2, src3, false, true, sat); } void HELPER(gvec_qrdmlah_s32)(void *vd, void *vn, void *vm, @@ -152,31 +145,16 @@ void HELPER(gvec_qrdmlah_s32)(void *vd, void *vn, void *vm, uintptr_t i; for (i = 0; i < opr_sz / 4; ++i) { -d[i] = inl_qrdmlah_s32(n[i], m[i], d[i], vq); +d[i] =
[PATCH v2 046/100] target/arm: Implement SVE2 WHILERW, WHILEWR
Signed-off-by: Richard Henderson --- v2: Fix decodetree typo --- target/arm/sve.decode | 3 ++ target/arm/translate-sve.c | 62 ++ 2 files changed, 65 insertions(+) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index b7038f9f57..19d503e2f4 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -702,6 +702,9 @@ CTERM 00100101 1 sf:1 1 rm:5 001000 rn:5 ne:1 # SVE integer compare scalar count and limit WHILE 00100101 esz:2 1 rm:5 000 sf:1 u:1 lt:1 rn:5 eq:1 rd:4 +# SVE2 pointer conflict compare +WHILE_ptr 00100101 esz:2 1 rm:5 001 100 rn:5 rw:1 rd:4 + ### SVE Integer Wide Immediate - Unpredicated Group # SVE broadcast floating-point immediate (unpredicated) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index dc6f39b5bb..97e26c8ff5 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3227,6 +3227,68 @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) return true; } +static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) +{ +TCGv_i64 op0, op1, diff, t1, tmax; +TCGv_i32 t2, t3; +TCGv_ptr ptr; +unsigned desc, vsz = vec_full_reg_size(s); + +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +if (!sve_access_check(s)) { +return true; +} + +op0 = read_cpu_reg(s, a->rn, 1); +op1 = read_cpu_reg(s, a->rm, 1); + +tmax = tcg_const_i64(vsz); +diff = tcg_temp_new_i64(); + +if (a->rw) { +/* WHILERW */ +/* diff = abs(op1 - op0), noting that op0/1 are unsigned. */ +t1 = tcg_temp_new_i64(); +tcg_gen_sub_i64(diff, op0, op1); +tcg_gen_sub_i64(t1, op1, op0); +tcg_gen_movcond_i64(TCG_COND_LTU, diff, op0, op1, diff, t1); +tcg_temp_free_i64(t1); +/* If op1 == op0, diff == 0, and the condition is always true. */ +tcg_gen_movcond_i64(TCG_COND_EQ, diff, op0, op1, tmax, diff); +} else { +/* WHILEWR */ +tcg_gen_sub_i64(diff, op1, op0); +/* If op0 >= op1, diff <= 0, the condition is always true. */ +tcg_gen_movcond_i64(TCG_COND_GEU, diff, op0, op1, tmax, diff); +} + +/* Bound to the maximum. */ +tcg_gen_umin_i64(diff, diff, tmax); +tcg_temp_free_i64(tmax); + +/* Since we're bounded, pass as a 32-bit type. */ +t2 = tcg_temp_new_i32(); +tcg_gen_extrl_i64_i32(t2, diff); +tcg_temp_free_i64(diff); + +desc = (vsz / 8) - 2; +desc = deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz); +t3 = tcg_const_i32(desc); + +ptr = tcg_temp_new_ptr(); +tcg_gen_addi_ptr(ptr, cpu_env, pred_full_reg_offset(s, a->rd)); + +gen_helper_sve_whilel(t2, ptr, t2, t3); +do_pred_flags(t2); + +tcg_temp_free_ptr(ptr); +tcg_temp_free_i32(t2); +tcg_temp_free_i32(t3); +return true; +} + /* *** SVE Integer Wide Immediate - Unpredicated Group */ -- 2.25.1
[PATCH v2 048/100] target/arm: Implement SVE2 MATCH, NMATCH
From: Stephen Long Reviewed-by: Richard Henderson Signed-off-by: Stephen Long Message-Id: <20200415145915.2859-1-stepl...@quicinc.com> [rth: Expanded comment for do_match2] Signed-off-by: Richard Henderson --- v2: Apply esz_mask to input pg to fix output flags. --- target/arm/helper-sve.h| 10 ++ target/arm/sve.decode | 5 +++ target/arm/sve_helper.c| 64 ++ target/arm/translate-sve.c | 22 + 4 files changed, 101 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 7e159fd0ef..78172cb281 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -2021,6 +2021,16 @@ DEF_HELPER_FLAGS_3(sve2_uqrshrnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_uqrshrnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_uqrshrnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_match_ppzz_b, TCG_CALL_NO_RWG, + i32, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_match_ppzz_h, TCG_CALL_NO_RWG, + i32, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_nmatch_ppzz_b, TCG_CALL_NO_RWG, + i32, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_nmatch_ppzz_h, TCG_CALL_NO_RWG, + i32, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_s, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index a50afd40c2..2207693d28 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1320,6 +1320,11 @@ UQSHRNT 01000101 .. 1 . 00 1101 . . @rd_rn_tszimm_shr UQRSHRNB01000101 .. 1 . 00 1110 . . @rd_rn_tszimm_shr UQRSHRNT01000101 .. 1 . 00 . . @rd_rn_tszimm_shr +### SVE2 Character Match + +MATCH 01000101 .. 1 . 100 ... . 0 @pd_pg_rn_rm +NMATCH 01000101 .. 1 . 100 ... . 1 @pd_pg_rn_rm + ## SVE2 floating-point pairwise operations FADDP 01100100 .. 010 00 0 100 ... . . @rdn_pg_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index c690e86d1d..31538e4720 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -6440,3 +6440,67 @@ void HELPER(sve2_nbsl)(void *vd, void *vn, void *vm, void *vk, uint32_t desc) d[i] = ~((n[i] & k[i]) | (m[i] & ~k[i])); } } + +/* + * Returns true if m0 or m1 contains the low uint8_t/uint16_t in n. + * See hasless(v,1) from + * https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord + */ +static inline bool do_match2(uint64_t n, uint64_t m0, uint64_t m1, int esz) +{ +int bits = 8 << esz; +uint64_t ones = dup_const(esz, 1); +uint64_t signs = ones << (bits - 1); +uint64_t cmp0, cmp1; + +cmp1 = dup_const(esz, n); +cmp0 = cmp1 ^ m0; +cmp1 = cmp1 ^ m1; +cmp0 = (cmp0 - ones) & ~cmp0; +cmp1 = (cmp1 - ones) & ~cmp1; +return (cmp0 | cmp1) & signs; +} + +static inline uint32_t do_match(void *vd, void *vn, void *vm, void *vg, +uint32_t desc, int esz, bool nmatch) +{ +uint16_t esz_mask = pred_esz_masks[esz]; +intptr_t opr_sz = simd_oprsz(desc); +uint32_t flags = PREDTEST_INIT; +intptr_t i, j, k; + +for (i = 0; i < opr_sz; i += 16) { +uint64_t m0 = *(uint64_t *)(vm + i); +uint64_t m1 = *(uint64_t *)(vm + i + 8); +uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)) & esz_mask; +uint16_t out = 0; + +for (j = 0; j < 16; j += 8) { +uint64_t n = *(uint64_t *)(vn + i + j); + +for (k = 0; k < 8; k += 1 << esz) { +if (pg & (1 << (j + k))) { +bool o = do_match2(n >> (k * 8), m0, m1, esz); +out |= (o ^ nmatch) << (j + k); +} +} +} +*(uint16_t *)(vd + H1_2(i >> 3)) = out; +flags = iter_predtest_fwd(out, pg, flags); +} +return flags; +} + +#define DO_PPZZ_MATCH(NAME, ESZ, INV) \ +uint32_t HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \ +{ \ +return do_match(vd, vn, vm, vg, desc, ESZ, INV); \ +} + +DO_PPZZ_MATCH(sve2_match_ppzz_b, MO_8, false) +DO_PPZZ_MATCH(sve2_match_ppzz_h, MO_16, false) + +DO_PPZZ_MATCH(sve2_nmatch_ppzz_b, MO_8, true) +DO_PPZZ_MATCH(sve2_nmatch_ppzz_h, MO_16, true) + +#undef DO_PPZZ_MATCH diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 6fada01d22..15546d9ad2 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -7023,6 +7023,28 @@ static bool trans_UQRSHRNT(DisasContext *s, arg_rri_esz *a) return do_sve2_shr_narrow(s, a, ops); } +static bool
[PATCH v2 045/100] target/arm: Implement SVE2 WHILEGT, WHILEGE, WHILEHI, WHILEHS
Rename the existing sve_while (less-than) helper to sve_whilel to make room for a new sve_whileg helper for greater-than. Signed-off-by: Richard Henderson --- v2: Use a new helper function to implement this. --- target/arm/helper-sve.h| 3 +- target/arm/sve.decode | 2 +- target/arm/sve_helper.c| 38 +- target/arm/translate-sve.c | 56 -- 4 files changed, 82 insertions(+), 17 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index acb967506d..4c4a8b27b1 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -913,7 +913,8 @@ DEF_HELPER_FLAGS_4(sve_brkns, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_cntp, TCG_CALL_NO_RWG, i64, ptr, ptr, i32) -DEF_HELPER_FLAGS_3(sve_while, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_whilel, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_whileg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_4(sve_subri_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 5f76a95139..b7038f9f57 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -700,7 +700,7 @@ SINCDECP_z 00100101 .. 1010 d:1 u:1 1 00 .@incdec2_pred CTERM 00100101 1 sf:1 1 rm:5 001000 rn:5 ne:1 # SVE integer compare scalar count and limit -WHILE 00100101 esz:2 1 rm:5 000 sf:1 u:1 1 rn:5 eq:1 rd:4 +WHILE 00100101 esz:2 1 rm:5 000 sf:1 u:1 lt:1 rn:5 eq:1 rd:4 ### SVE Integer Wide Immediate - Unpredicated Group diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index a79039ad52..5d80dc8c58 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -3711,7 +3711,7 @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) return sum; } -uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc) +uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) { uintptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2; intptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2); @@ -3737,6 +3737,42 @@ uint32_t HELPER(sve_while)(void *vd, uint32_t count, uint32_t pred_desc) return predtest_ones(d, oprsz, esz_mask); } +uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) +{ +uintptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2; +intptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2); +uint64_t esz_mask = pred_esz_masks[esz]; +ARMPredicateReg *d = vd; +intptr_t i, invcount, oprbits; +uint64_t bits; + +if (count == 0) { +return do_zero(d, oprsz); +} + +oprbits = oprsz * 8; +tcg_debug_assert(count <= oprbits); + +bits = esz_mask; +if (oprbits & 63) { +bits &= MAKE_64BIT_MASK(0, oprbits & 63); +} + +invcount = oprbits - count; +for (i = (oprsz - 1) / 8; i > invcount / 64; --i) { +d->p[i] = bits; +bits = esz_mask; +} + +d->p[i] = bits & MAKE_64BIT_MASK(invcount & 63, 64); + +while (--i >= 0) { +d->p[i] = 0; +} + +return predtest_ones(d, oprsz, esz_mask); +} + /* Recursive reduction on a function; * C.f. the ARM ARM function ReducePredicated. * diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 262194f163..dc6f39b5bb 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -3121,7 +3121,14 @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) TCGv_ptr ptr; unsigned desc, vsz = vec_full_reg_size(s); TCGCond cond; +uint64_t maxval; +/* Note that GE/HS has a->eq == 0 and GT/HI has a->eq == 1. */ +bool eq = a->eq == a->lt; +/* The greater-than conditions are all SVE2. */ +if (!a->lt && !dc_isar_feature(aa64_sve2, s)) { +return false; +} if (!sve_access_check(s)) { return true; } @@ -3144,22 +3151,42 @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) */ t0 = tcg_temp_new_i64(); t1 = tcg_temp_new_i64(); -tcg_gen_sub_i64(t0, op1, op0); + +if (a->lt) { +tcg_gen_sub_i64(t0, op1, op0); +if (a->u) { +maxval = a->sf ? UINT64_MAX : UINT32_MAX; +cond = eq ? TCG_COND_LEU : TCG_COND_LTU; +} else { +maxval = a->sf ? INT64_MAX : INT32_MAX; +cond = eq ? TCG_COND_LE : TCG_COND_LT; +} +} else { +tcg_gen_sub_i64(t0, op0, op1); +if (a->u) { +maxval = 0; +cond = eq ? TCG_COND_GEU : TCG_COND_GTU; +} else { +maxval = a->sf ? INT64_MIN : INT32_MIN; +cond = eq ? TCG_COND_GE : TCG_COND_GT; +} +} tmax = tcg_const_i64(vsz >> a->esz); -if (a->eq) { +if (eq) { /* Equality means one more iteration. */
[PATCH v2 035/100] target/arm: Implement SVE2 integer add/subtract long with carry
Signed-off-by: Richard Henderson --- v2: Fix sel indexing and argument order (laurent desnogues). --- target/arm/helper-sve.h| 3 +++ target/arm/sve.decode | 6 ++ target/arm/sve_helper.c| 34 ++ target/arm/translate-sve.c | 23 +++ 4 files changed, 66 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 7fe2f2c714..cfd90f83eb 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1928,3 +1928,6 @@ DEF_HELPER_FLAGS_5(sve2_uabal_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_uabal_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_adcl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_adcl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 6cf09847a0..f4f0c2ade6 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1247,3 +1247,9 @@ SABALB 01000101 .. 0 . 1100 00 . . @rda_rn_rm SABALT 01000101 .. 0 . 1100 01 . . @rda_rn_rm UABALB 01000101 .. 0 . 1100 10 . . @rda_rn_rm UABALT 01000101 .. 0 . 1100 11 . . @rda_rn_rm + +## SVE2 integer add/subtract long with carry + +# ADC and SBC decoded via size in helper dispatch. +ADCLB 01000101 .. 0 . 11010 0 . . @rda_rn_rm +ADCLT 01000101 .. 0 . 11010 1 . . @rda_rn_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 401fc55218..184b946a5b 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1264,6 +1264,40 @@ DO_ZZZW_ACC(sve2_uabal_d, uint64_t, uint32_t, , H1_4, DO_ABD) #undef DO_ZZZW_ACC +void HELPER(sve2_adcl_s)(void *vd, void *vn, void *vm, void *va, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int sel = H4(extract32(desc, SIMD_DATA_SHIFT, 1)); +uint32_t inv = -extract32(desc, SIMD_DATA_SHIFT + 1, 1); +uint32_t *a = va, *n = vn; +uint64_t *d = vd, *m = vm; + +for (i = 0; i < opr_sz / 8; ++i) { +uint32_t e1 = a[2 * i + H4(0)]; +uint32_t e2 = n[2 * i + sel] ^ inv; +uint64_t c = extract64(m[i], 32, 1); +/* Compute and store the entire 33-bit result at once. */ +d[i] = c + e1 + e2; +} +} + +void HELPER(sve2_adcl_d)(void *vd, void *vn, void *vm, void *va, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int sel = extract32(desc, SIMD_DATA_SHIFT, 1); +uint64_t inv = -(uint64_t)extract32(desc, SIMD_DATA_SHIFT + 1, 1); +uint64_t *d = vd, *a = va, *n = vn, *m = vm; + +for (i = 0; i < opr_sz / 8; i += 2) { +Int128 e1 = int128_make64(a[i]); +Int128 e2 = int128_make64(n[i + sel] ^ inv); +Int128 c = int128_make64(m[i + 1] & 1); +Int128 r = int128_add(int128_add(e1, e2), c); +d[i + 0] = int128_getlo(r); +d[i + 1] = int128_gethi(r); +} +} + #define DO_BITPERM(NAME, TYPE, OP) \ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ { \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 5e6ace1da6..9131b6d546 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5937,3 +5937,26 @@ static bool trans_UABALT(DisasContext *s, arg__esz *a) { return do_abal(s, a, true, true); } + +static bool do_adcl(DisasContext *s, arg__esz *a, bool sel) +{ +static gen_helper_gvec_4 * const fns[2] = { +gen_helper_sve2_adcl_s, +gen_helper_sve2_adcl_d, +}; +/* + * Note that in this case the ESZ field encodes both size and sign. + * Split out 'subtract' into bit 1 of the data field for the helper. + */ +return do_sve2__ool(s, a, fns[a->esz & 1], (a->esz & 2) | sel); +} + +static bool trans_ADCLB(DisasContext *s, arg__esz *a) +{ +return do_adcl(s, a, false); +} + +static bool trans_ADCLT(DisasContext *s, arg__esz *a) +{ +return do_adcl(s, a, true); +} -- 2.25.1
[PATCH v2 043/100] target/arm: Implement SVE2 UQSHRN, UQRSHRN
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 16 +++ target/arm/sve.decode | 4 ++ target/arm/sve_helper.c| 24 ++ target/arm/translate-sve.c | 93 ++ 4 files changed, 137 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 371a1b02e0..3fbee352d8 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1988,6 +1988,22 @@ DEF_HELPER_FLAGS_3(sve2_sqrshrunt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_sqrshrunt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_sqrshrunt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqshrnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqshrnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqshrnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_uqshrnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqshrnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqshrnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_uqrshrnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqrshrnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqrshrnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_uqrshrnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqrshrnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqrshrnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_s, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index cade628cfd..69915398e7 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1296,6 +1296,10 @@ SHRNB 01000101 .. 1 . 00 0100 . . @rd_rn_tszimm_shr SHRNT 01000101 .. 1 . 00 0101 . . @rd_rn_tszimm_shr RSHRNB 01000101 .. 1 . 00 0110 . . @rd_rn_tszimm_shr RSHRNT 01000101 .. 1 . 00 0111 . . @rd_rn_tszimm_shr +UQSHRNB 01000101 .. 1 . 00 1100 . . @rd_rn_tszimm_shr +UQSHRNT 01000101 .. 1 . 00 1101 . . @rd_rn_tszimm_shr +UQRSHRNB01000101 .. 1 . 00 1110 . . @rd_rn_tszimm_shr +UQRSHRNT01000101 .. 1 . 00 . . @rd_rn_tszimm_shr ## SVE2 floating-point pairwise operations diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 01c717e27e..bc2130f77e 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1950,6 +1950,30 @@ DO_SHRNT(sve2_sqrshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQRSHRUN_H) DO_SHRNT(sve2_sqrshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQRSHRUN_S) DO_SHRNT(sve2_sqrshrunt_d, int64_t, uint32_t, , H1_4, DO_SQRSHRUN_D) +#define DO_UQSHRN_H(x, sh) MIN(x >> sh, UINT8_MAX) +#define DO_UQSHRN_S(x, sh) MIN(x >> sh, UINT16_MAX) +#define DO_UQSHRN_D(x, sh) MIN(x >> sh, UINT32_MAX) + +DO_SHRNB(sve2_uqshrnb_h, uint16_t, uint8_t, DO_UQSHRN_H) +DO_SHRNB(sve2_uqshrnb_s, uint32_t, uint16_t, DO_UQSHRN_S) +DO_SHRNB(sve2_uqshrnb_d, uint64_t, uint32_t, DO_UQSHRN_D) + +DO_SHRNT(sve2_uqshrnt_h, uint16_t, uint8_t, H1_2, H1, DO_UQSHRN_H) +DO_SHRNT(sve2_uqshrnt_s, uint32_t, uint16_t, H1_4, H1_2, DO_UQSHRN_S) +DO_SHRNT(sve2_uqshrnt_d, uint64_t, uint32_t, , H1_4, DO_UQSHRN_D) + +#define DO_UQRSHRN_H(x, sh) MIN(DO_RSHR(x, sh), UINT8_MAX) +#define DO_UQRSHRN_S(x, sh) MIN(DO_RSHR(x, sh), UINT16_MAX) +#define DO_UQRSHRN_D(x, sh) MIN(DO_RSHR(x, sh), UINT32_MAX) + +DO_SHRNB(sve2_uqrshrnb_h, uint16_t, uint8_t, DO_UQRSHRN_H) +DO_SHRNB(sve2_uqrshrnb_s, uint32_t, uint16_t, DO_UQRSHRN_S) +DO_SHRNB(sve2_uqrshrnb_d, uint64_t, uint32_t, DO_UQRSHRN_D) + +DO_SHRNT(sve2_uqrshrnt_h, uint16_t, uint8_t, H1_2, H1, DO_UQRSHRN_H) +DO_SHRNT(sve2_uqrshrnt_s, uint32_t, uint16_t, H1_4, H1_2, DO_UQRSHRN_S) +DO_SHRNT(sve2_uqrshrnt_d, uint64_t, uint32_t, , H1_4, DO_UQRSHRN_D) + #undef DO_SHRNB #undef DO_SHRNT diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 7e89d7b9a8..5234f25eef 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -6522,6 +6522,99 @@ static bool trans_SQRSHRUNT(DisasContext *s, arg_rri_esz *a) return do_sve2_shr_narrow(s, a, ops); } +static void gen_uqshrnb_vec(unsigned vece, TCGv_vec d, +TCGv_vec n, int64_t shr) +{ +TCGv_vec t = tcg_temp_new_vec_matching(d); +int halfbits = 4 << vece; + +tcg_gen_shri_vec(vece, n, n, shr); +tcg_gen_dupi_vec(vece, t, MAKE_64BIT_MASK(0, halfbits)); +tcg_gen_umin_vec(vece, d, n, t); +tcg_temp_free_vec(t); +} + +static bool trans_UQSHRNB(DisasContext *s, arg_rri_esz *a) +{ +static const TCGOpcode vec_list[] = { +INDEX_op_shri_vec, INDEX_op_umin_vec, 0 +}; +
[PATCH v2 038/100] target/arm: Implement SVE2 integer absolute difference and accumulate
Signed-off-by: Richard Henderson --- target/arm/sve.decode | 6 ++ target/arm/translate-sve.c | 21 + 2 files changed, 27 insertions(+) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 90a9d6552a..b5450b1d4d 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1266,3 +1266,9 @@ URSRA 01000101 .. 0 . 1110 11 . . @rd_rn_tszimm_shr SRI 01000101 .. 0 . 0 0 . . @rd_rn_tszimm_shr SLI 01000101 .. 0 . 0 1 . . @rd_rn_tszimm_shl + +## SVE2 integer absolute difference and accumulate + +# TODO: Use @rda and %reg_movprfx here. +SABA01000101 .. 0 . 1 0 . . @rd_rn_rm +UABA01000101 .. 0 . 1 1 . . @rd_rn_rm diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 2bc20503e7..45b24826ac 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -6004,3 +6004,24 @@ static bool trans_SLI(DisasContext *s, arg_rri_esz *a) { return do_sve2_fn2i(s, a, gen_gvec_sli); } + +static bool do_sve2_fn_zzz(DisasContext *s, arg_rrr_esz *a, GVecGen3Fn *fn) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +if (sve_access_check(s)) { +gen_gvec_fn_zzz(s, fn, a->esz, a->rd, a->rn, a->rm); +} +return true; +} + +static bool trans_SABA(DisasContext *s, arg_rrr_esz *a) +{ +return do_sve2_fn_zzz(s, a, gen_gvec_saba); +} + +static bool trans_UABA(DisasContext *s, arg_rrr_esz *a) +{ +return do_sve2_fn_zzz(s, a, gen_gvec_uaba); +} -- 2.25.1
[PATCH v2 042/100] target/arm: Implement SVE2 SQSHRUN, SQRSHRUN
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 16 +++ target/arm/sve.decode | 4 ++ target/arm/sve_helper.c| 24 ++ target/arm/translate-sve.c | 98 ++ 4 files changed, 142 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 3a7d7ff66d..371a1b02e0 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1972,6 +1972,22 @@ DEF_HELPER_FLAGS_3(sve2_rshrnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_rshrnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_rshrnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqshrunb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqshrunb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqshrunb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_sqshrunt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqshrunt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqshrunt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_sqrshrunb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqrshrunb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqrshrunb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_sqrshrunt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqrshrunt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqrshrunt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_s, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 7cc4b6cc43..cade628cfd 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1288,6 +1288,10 @@ SQXTUNT 01000101 .. 1 . 010 101 . . @rd_rn_tszimm_shl ## SVE2 bitwise shift right narrow # Bit 23 == 0 is handled by esz > 0 in the translator. +SQSHRUNB01000101 .. 1 . 00 . . @rd_rn_tszimm_shr +SQSHRUNT01000101 .. 1 . 00 0001 . . @rd_rn_tszimm_shr +SQRSHRUNB 01000101 .. 1 . 00 0010 . . @rd_rn_tszimm_shr +SQRSHRUNT 01000101 .. 1 . 00 0011 . . @rd_rn_tszimm_shr SHRNB 01000101 .. 1 . 00 0100 . . @rd_rn_tszimm_shr SHRNT 01000101 .. 1 . 00 0101 . . @rd_rn_tszimm_shr RSHRNB 01000101 .. 1 . 00 0110 . . @rd_rn_tszimm_shr diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 9b3d0d2ddd..01c717e27e 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1926,6 +1926,30 @@ DO_SHRNT(sve2_rshrnt_h, uint16_t, uint8_t, H1_2, H1, DO_RSHR) DO_SHRNT(sve2_rshrnt_s, uint32_t, uint16_t, H1_4, H1_2, DO_RSHR) DO_SHRNT(sve2_rshrnt_d, uint64_t, uint32_t, , H1_4, DO_RSHR) +#define DO_SQSHRUN_H(x, sh) MIN(MAX(x >> sh, 0), UINT8_MAX) +#define DO_SQSHRUN_S(x, sh) MIN(MAX(x >> sh, 0), UINT16_MAX) +#define DO_SQSHRUN_D(x, sh) MIN(MAX(x >> sh, 0), UINT32_MAX) + +DO_SHRNB(sve2_sqshrunb_h, int16_t, uint8_t, DO_SQSHRUN_H) +DO_SHRNB(sve2_sqshrunb_s, int32_t, uint16_t, DO_SQSHRUN_S) +DO_SHRNB(sve2_sqshrunb_d, int64_t, uint32_t, DO_SQSHRUN_D) + +DO_SHRNT(sve2_sqshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRUN_H) +DO_SHRNT(sve2_sqshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRUN_S) +DO_SHRNT(sve2_sqshrunt_d, int64_t, uint32_t, , H1_4, DO_SQSHRUN_D) + +#define DO_SQRSHRUN_H(x, sh) MIN(MAX(DO_RSHR(x, sh), 0), UINT8_MAX) +#define DO_SQRSHRUN_S(x, sh) MIN(MAX(DO_RSHR(x, sh), 0), UINT16_MAX) +#define DO_SQRSHRUN_D(x, sh) MIN(MAX(DO_RSHR(x, sh), 0), UINT32_MAX) + +DO_SHRNB(sve2_sqrshrunb_h, int16_t, uint8_t, DO_SQRSHRUN_H) +DO_SHRNB(sve2_sqrshrunb_s, int32_t, uint16_t, DO_SQRSHRUN_S) +DO_SHRNB(sve2_sqrshrunb_d, int64_t, uint32_t, DO_SQRSHRUN_D) + +DO_SHRNT(sve2_sqrshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQRSHRUN_H) +DO_SHRNT(sve2_sqrshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQRSHRUN_S) +DO_SHRNT(sve2_sqrshrunt_d, int64_t, uint32_t, , H1_4, DO_SQRSHRUN_D) + #undef DO_SHRNB #undef DO_SHRNT diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 81e44bb818..7e89d7b9a8 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -6424,6 +6424,104 @@ static bool trans_RSHRNT(DisasContext *s, arg_rri_esz *a) return do_sve2_shr_narrow(s, a, ops); } +static void gen_sqshrunb_vec(unsigned vece, TCGv_vec d, + TCGv_vec n, int64_t shr) +{ +TCGv_vec t = tcg_temp_new_vec_matching(d); +int halfbits = 4 << vece; + +tcg_gen_sari_vec(vece, n, n, shr); +tcg_gen_dupi_vec(vece, t, 0); +tcg_gen_smax_vec(vece, n, n, t); +tcg_gen_dupi_vec(vece, t, MAKE_64BIT_MASK(0, halfbits)); +tcg_gen_umin_vec(vece, d, n, t); +tcg_temp_free_vec(t); +} + +static
[PATCH v2 034/100] target/arm: Implement SVE2 integer absolute difference and accumulate long
Signed-off-by: Richard Henderson --- v2: Fix select offsetting and argument order (laurent desnogues). --- target/arm/helper-sve.h| 14 ++ target/arm/sve.decode | 12 + target/arm/sve_helper.c| 23 target/arm/translate-sve.c | 55 ++ 4 files changed, 104 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 4e5ee9a75c..7fe2f2c714 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1914,3 +1914,17 @@ DEF_HELPER_FLAGS_4(sve2_sqcadd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqcadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqcadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqcadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_sabal_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sabal_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sabal_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_uabal_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uabal_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uabal_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 655cb5c12f..6cf09847a0 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -70,6 +70,7 @@ _s rd pg rn s _s rd pg rn rm s _esz rd pg rn rm esz +_esz rd ra rn rm esz _esz rd pg rn rm ra esz _esz rd pg rn imm esz rd esz pat s @@ -119,6 +120,10 @@ @rdn_i8s esz:2 .. ... imm:s8 rd:5 \ _esz rn=%reg_movprfx +# Four operand, vector element size +@rda_rn_rm esz:2 . rm:5 ... ... rn:5 rd:5 \ +_esz ra=%reg_movprfx + # Three operand with "memory" size, aka immediate left shift @rd_rn_msz_rm ... rm:5 imm:2 rn:5 rd:5 @@ -1235,3 +1240,10 @@ CADD_rot90 01000101 .. 0 0 11011 0 . . @rdn_rm CADD_rot270 01000101 .. 0 0 11011 1 . . @rdn_rm SQCADD_rot9001000101 .. 0 1 11011 0 . . @rdn_rm SQCADD_rot270 01000101 .. 0 1 11011 1 . . @rdn_rm + +## SVE2 integer absolute difference and accumulate long + +SABALB 01000101 .. 0 . 1100 00 . . @rda_rn_rm +SABALT 01000101 .. 0 . 1100 01 . . @rda_rn_rm +UABALB 01000101 .. 0 . 1100 10 . . @rda_rn_rm +UABALT 01000101 .. 0 . 1100 11 . . @rda_rn_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 2043084c0a..401fc55218 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1241,6 +1241,29 @@ DO_ZZZ_NTB(sve2_eoril_d, uint64_t, , DO_EOR) #undef DO_ZZZ_NTB +#define DO_ZZZW_ACC(NAME, TYPEW, TYPEN, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +intptr_t sel1 = simd_data(desc) * sizeof(TYPEN);\ +for (i = 0; i < opr_sz; i += sizeof(TYPEW)) { \ +TYPEW nn = *(TYPEN *)(vn + HN(i + sel1)); \ +TYPEW mm = *(TYPEN *)(vm + HN(i + sel1)); \ +TYPEW aa = *(TYPEW *)(va + HW(i)); \ +*(TYPEW *)(vd + HW(i)) = OP(nn, mm) + aa; \ +} \ +} + +DO_ZZZW_ACC(sve2_sabal_h, int16_t, int8_t, H1_2, H1, DO_ABD) +DO_ZZZW_ACC(sve2_sabal_s, int32_t, int16_t, H1_4, H1_2, DO_ABD) +DO_ZZZW_ACC(sve2_sabal_d, int64_t, int32_t, , H1_4, DO_ABD) + +DO_ZZZW_ACC(sve2_uabal_h, uint16_t, uint8_t, H1_2, H1, DO_ABD) +DO_ZZZW_ACC(sve2_uabal_s, uint32_t, uint16_t, H1_4, H1_2, DO_ABD) +DO_ZZZW_ACC(sve2_uabal_d, uint64_t, uint32_t, , H1_4, DO_ABD) + +#undef DO_ZZZW_ACC + #define DO_BITPERM(NAME, TYPE, OP) \ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ { \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 07948f5e63..5e6ace1da6 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -163,6 +163,18 @@ static void gen_gvec_ool_zzz(DisasContext *s, gen_helper_gvec_3 *fn, vsz, vsz, data, fn); } +/* Invoke an out-of-line helper on 4 Zregs. */ +static void gen_gvec_ool_(DisasContext *s, gen_helper_gvec_4 *fn, + int rd, int rn, int rm, int ra, int data) +{ +unsigned vsz = vec_full_reg_size(s); +
[PATCH v2 037/100] target/arm: Implement SVE2 bitwise shift and insert
Signed-off-by: Richard Henderson --- target/arm/sve.decode | 5 + target/arm/translate-sve.c | 10 ++ 2 files changed, 15 insertions(+) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 7783e9f0d3..90a9d6552a 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1261,3 +1261,8 @@ SSRA01000101 .. 0 . 1110 00 . . @rd_rn_tszimm_shr USRA01000101 .. 0 . 1110 01 . . @rd_rn_tszimm_shr SRSRA 01000101 .. 0 . 1110 10 . . @rd_rn_tszimm_shr URSRA 01000101 .. 0 . 1110 11 . . @rd_rn_tszimm_shr + +## SVE2 bitwise shift and insert + +SRI 01000101 .. 0 . 0 0 . . @rd_rn_tszimm_shr +SLI 01000101 .. 0 . 0 1 . . @rd_rn_tszimm_shl diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 3dcc67740f..2bc20503e7 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5994,3 +5994,13 @@ static bool trans_URSRA(DisasContext *s, arg_rri_esz *a) { return do_sve2_fn2i(s, a, gen_gvec_ursra); } + +static bool trans_SRI(DisasContext *s, arg_rri_esz *a) +{ +return do_sve2_fn2i(s, a, gen_gvec_sri); +} + +static bool trans_SLI(DisasContext *s, arg_rri_esz *a) +{ +return do_sve2_fn2i(s, a, gen_gvec_sli); +} -- 2.25.1
[PATCH v2 033/100] target/arm: Implement SVE2 complex integer add
Signed-off-by: Richard Henderson --- v2: Fix subtraction ordering (laurent desnogues). --- target/arm/helper-sve.h| 10 + target/arm/sve.decode | 9 target/arm/sve_helper.c| 42 ++ target/arm/translate-sve.c | 31 4 files changed, 92 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 1af6454228..4e5ee9a75c 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1904,3 +1904,13 @@ DEF_HELPER_FLAGS_4(sve2_bgrp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_bgrp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_bgrp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_bgrp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_cadd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_cadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_cadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_cadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_sqcadd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqcadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqcadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqcadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index b316610bbb..655cb5c12f 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1226,3 +1226,12 @@ EORTB 01000101 .. 0 . 10010 1 . . @rd_rn_rm BEXT01000101 .. 0 . 1011 00 . . @rd_rn_rm BDEP01000101 .. 0 . 1011 01 . . @rd_rn_rm BGRP01000101 .. 0 . 1011 10 . . @rd_rn_rm + + SVE2 Accumulate + +## SVE2 complex integer add + +CADD_rot90 01000101 .. 0 0 11011 0 . . @rdn_rm +CADD_rot270 01000101 .. 0 0 11011 1 . . @rdn_rm +SQCADD_rot9001000101 .. 0 1 11011 0 . . @rdn_rm +SQCADD_rot270 01000101 .. 0 1 11011 1 . . @rdn_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index bccadce451..2043084c0a 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1314,6 +1314,48 @@ DO_BITPERM(sve2_bgrp_d, uint64_t, bitgroup) #undef DO_BITPERM +#define DO_CADD(NAME, TYPE, H, ADD_OP, SUB_OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +int sub_r = simd_data(desc);\ +if (sub_r) {\ +for (i = 0; i < opr_sz; i += 2 * sizeof(TYPE)) {\ +TYPE acc_r = *(TYPE *)(vn + H(i)); \ +TYPE acc_i = *(TYPE *)(vn + H(i + sizeof(TYPE))); \ +TYPE el2_r = *(TYPE *)(vm + H(i)); \ +TYPE el2_i = *(TYPE *)(vm + H(i + sizeof(TYPE))); \ +acc_r = ADD_OP(acc_r, el2_i); \ +acc_i = SUB_OP(acc_i, el2_r); \ +*(TYPE *)(vd + H(i)) = acc_r; \ +*(TYPE *)(vd + H(i + sizeof(TYPE))) = acc_i;\ +} \ +} else {\ +for (i = 0; i < opr_sz; i += 2 * sizeof(TYPE)) {\ +TYPE acc_r = *(TYPE *)(vn + H(i)); \ +TYPE acc_i = *(TYPE *)(vn + H(i + sizeof(TYPE))); \ +TYPE el2_r = *(TYPE *)(vm + H(i)); \ +TYPE el2_i = *(TYPE *)(vm + H(i + sizeof(TYPE))); \ +acc_r = SUB_OP(acc_r, el2_i); \ +acc_i = ADD_OP(acc_i, el2_r); \ +*(TYPE *)(vd + H(i)) = acc_r; \ +*(TYPE *)(vd + H(i + sizeof(TYPE))) = acc_i;\ +} \ +} \ +} + +DO_CADD(sve2_cadd_b, int8_t, H1, DO_ADD, DO_SUB) +DO_CADD(sve2_cadd_h, int16_t, H1_2, DO_ADD, DO_SUB) +DO_CADD(sve2_cadd_s, int32_t, H1_4, DO_ADD, DO_SUB) +DO_CADD(sve2_cadd_d, int64_t, , DO_ADD, DO_SUB) + +DO_CADD(sve2_sqcadd_b, int8_t, H1, DO_SQADD_B, DO_SQSUB_B) +DO_CADD(sve2_sqcadd_h, int16_t, H1_2, DO_SQADD_H, DO_SQSUB_H) +DO_CADD(sve2_sqcadd_s, int32_t, H1_4, DO_SQADD_S, DO_SQSUB_S) +DO_CADD(sve2_sqcadd_d, int64_t, , do_sqadd_d, do_sqsub_d) + +#undef DO_CADD + #define DO_ZZI_SHLL(NAME, TYPEW, TYPEN, HW, HN) \ void HELPER(NAME)(void *vd, void *vn, uint32_t desc) \ {
[PATCH v2 044/100] target/arm: Implement SVE2 SQSHRN, SQRSHRN
This completes the section "SVE2 bitwise shift right narrow". Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 16 ++ target/arm/sve.decode | 4 ++ target/arm/sve_helper.c| 24 + target/arm/translate-sve.c | 105 + 4 files changed, 149 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 3fbee352d8..acb967506d 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1988,6 +1988,22 @@ DEF_HELPER_FLAGS_3(sve2_sqrshrunt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_sqrshrunt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_sqrshrunt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqshrnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqshrnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqshrnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_sqshrnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqshrnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqshrnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_sqrshrnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqrshrnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqrshrnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_sqrshrnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqrshrnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqrshrnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(sve2_uqshrnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_uqshrnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_uqshrnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 69915398e7..5f76a95139 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1296,6 +1296,10 @@ SHRNB 01000101 .. 1 . 00 0100 . . @rd_rn_tszimm_shr SHRNT 01000101 .. 1 . 00 0101 . . @rd_rn_tszimm_shr RSHRNB 01000101 .. 1 . 00 0110 . . @rd_rn_tszimm_shr RSHRNT 01000101 .. 1 . 00 0111 . . @rd_rn_tszimm_shr +SQSHRNB 01000101 .. 1 . 00 1000 . . @rd_rn_tszimm_shr +SQSHRNT 01000101 .. 1 . 00 1001 . . @rd_rn_tszimm_shr +SQRSHRNB01000101 .. 1 . 00 1010 . . @rd_rn_tszimm_shr +SQRSHRNT01000101 .. 1 . 00 1011 . . @rd_rn_tszimm_shr UQSHRNB 01000101 .. 1 . 00 1100 . . @rd_rn_tszimm_shr UQSHRNT 01000101 .. 1 . 00 1101 . . @rd_rn_tszimm_shr UQRSHRNB01000101 .. 1 . 00 1110 . . @rd_rn_tszimm_shr diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index bc2130f77e..a79039ad52 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1950,6 +1950,30 @@ DO_SHRNT(sve2_sqrshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQRSHRUN_H) DO_SHRNT(sve2_sqrshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQRSHRUN_S) DO_SHRNT(sve2_sqrshrunt_d, int64_t, uint32_t, , H1_4, DO_SQRSHRUN_D) +#define DO_SQSHRN_H(x, sh) MIN(MAX(x >> sh, INT8_MIN), INT8_MAX) +#define DO_SQSHRN_S(x, sh) MIN(MAX(x >> sh, INT16_MIN), INT16_MAX) +#define DO_SQSHRN_D(x, sh) MIN(MAX(x >> sh, INT32_MIN), INT32_MAX) + +DO_SHRNB(sve2_sqshrnb_h, int16_t, uint8_t, DO_SQSHRN_H) +DO_SHRNB(sve2_sqshrnb_s, int32_t, uint16_t, DO_SQSHRN_S) +DO_SHRNB(sve2_sqshrnb_d, int64_t, uint32_t, DO_SQSHRN_D) + +DO_SHRNT(sve2_sqshrnt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRN_H) +DO_SHRNT(sve2_sqshrnt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRN_S) +DO_SHRNT(sve2_sqshrnt_d, int64_t, uint32_t, , H1_4, DO_SQSHRN_D) + +#define DO_SQRSHRN_H(x, sh) MIN(MAX(DO_RSHR(x, sh), INT8_MIN), INT8_MAX) +#define DO_SQRSHRN_S(x, sh) MIN(MAX(DO_RSHR(x, sh), INT16_MIN), INT16_MAX) +#define DO_SQRSHRN_D(x, sh) MIN(MAX(DO_RSHR(x, sh), INT32_MIN), INT32_MAX) + +DO_SHRNB(sve2_sqrshrnb_h, int16_t, uint8_t, DO_SQRSHRN_H) +DO_SHRNB(sve2_sqrshrnb_s, int32_t, uint16_t, DO_SQRSHRN_S) +DO_SHRNB(sve2_sqrshrnb_d, int64_t, uint32_t, DO_SQRSHRN_D) + +DO_SHRNT(sve2_sqrshrnt_h, int16_t, uint8_t, H1_2, H1, DO_SQRSHRN_H) +DO_SHRNT(sve2_sqrshrnt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQRSHRN_S) +DO_SHRNT(sve2_sqrshrnt_d, int64_t, uint32_t, , H1_4, DO_SQRSHRN_D) + #define DO_UQSHRN_H(x, sh) MIN(x >> sh, UINT8_MAX) #define DO_UQSHRN_S(x, sh) MIN(x >> sh, UINT16_MAX) #define DO_UQSHRN_D(x, sh) MIN(x >> sh, UINT32_MAX) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 5234f25eef..262194f163 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -6522,6 +6522,111 @@ static bool trans_SQRSHRUNT(DisasContext *s, arg_rri_esz *a) return do_sve2_shr_narrow(s, a, ops); } +static void gen_sqshrnb_vec(unsigned
[PATCH v2 040/100] target/arm: Implement SVE2 floating-point pairwise
From: Stephen Long Signed-off-by: Stephen Long Reviewed-by: Richard Henderson Signed-off-by: Richard Henderson --- v2: Load all inputs before writing any output (laurent desnogues) --- target/arm/helper-sve.h| 35 + target/arm/sve.decode | 8 +++ target/arm/sve_helper.c| 46 ++ target/arm/translate-sve.c | 25 + 4 files changed, 114 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 3fff364778..cad45b0f16 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1955,3 +1955,38 @@ DEF_HELPER_FLAGS_3(sve2_uqxtnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_sqxtunt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_sqxtunt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_sqxtunt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_faddp_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve2_fmaxnmp_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_fmaxnmp_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_fmaxnmp_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve2_fminnmp_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_fminnmp_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_fminnmp_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve2_fmaxp_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_fmaxp_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_fmaxp_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sve2_fminp_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_fminp_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sve2_fminp_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 733f9a3db4..54657d996a 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1284,3 +1284,11 @@ UQXTNB 01000101 .. 1 . 010 010 . . @rd_rn_tszimm_shl UQXTNT 01000101 .. 1 . 010 011 . . @rd_rn_tszimm_shl SQXTUNB 01000101 .. 1 . 010 100 . . @rd_rn_tszimm_shl SQXTUNT 01000101 .. 1 . 010 101 . . @rd_rn_tszimm_shl + +## SVE2 floating-point pairwise operations + +FADDP 01100100 .. 010 00 0 100 ... . . @rdn_pg_rm +FMAXNMP 01100100 .. 010 10 0 100 ... . . @rdn_pg_rm +FMINNMP 01100100 .. 010 10 1 100 ... . . @rdn_pg_rm +FMAXP 01100100 .. 010 11 0 100 ... . . @rdn_pg_rm +FMINP 01100100 .. 010 11 1 100 ... . . @rdn_pg_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 0861fd0277..27ba4e81fb 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -890,6 +890,52 @@ DO_ZPZZ_PAIR_D(sve2_sminp_zpzz_d, int64_t, DO_MIN) #undef DO_ZPZZ_PAIR #undef DO_ZPZZ_PAIR_D +#define DO_ZPZZ_PAIR_FP(NAME, TYPE, H, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, \ + void *status, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +for (i = 0; i < opr_sz; ) { \ +uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ +do {\ +TYPE n0 = *(TYPE *)(vn + H(i)); \ +TYPE m0 = *(TYPE *)(vm + H(i)); \ +TYPE n1 = *(TYPE *)(vn + H(i + sizeof(TYPE))); \ +TYPE m1 = *(TYPE *)(vm + H(i + sizeof(TYPE))); \ +if (pg & 1) { \ +*(TYPE *)(vd + H(i)) = OP(n0, n1, status); \ +} \ +i += sizeof(TYPE), pg >>= sizeof(TYPE); \ +
Re: [PATCH 0/2] use helper when using abstract QOM parent functions
Hi, On 10/14/19 5:12 PM, Auger Eric wrote: Hi, On 10/12/19 11:43 AM, Mao Zhongyi wrote: Philippe introduced a series of helpers to make the device class_init() easier to understand when a device class change the parent hooks, some devices in the source tree missed helper, so convert it. Cc: eric.au...@redhat.com Cc: peter.mayd...@linaro.org Cc: hpous...@reactos.org Cc: f4...@amsat.org Mao Zhongyi (2): arm/smmuv3: use helpers to be more easier to understand when using abstract QOM parent functions. isa/pc87312: use helpers to be more easier to understand when using abstract QOM parent functions. hw/arm/smmuv3.c | 3 +-- hw/isa/pc87312.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) for the series: Reviewed-by: Eric Auger ping... Eric
[PATCH v2 026/100] target/arm: Implement SVE2 integer add/subtract wide
Signed-off-by: Richard Henderson --- v2: Fix select offsets (laurent desnogues). --- target/arm/helper-sve.h| 16 target/arm/sve.decode | 12 target/arm/sve_helper.c| 30 ++ target/arm/translate-sve.c | 20 4 files changed, 78 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index d16d85d2d7..e662191767 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1391,6 +1391,22 @@ DEF_HELPER_FLAGS_4(sve2_uabdl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_uabdl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_uabdl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_saddw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_saddw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_saddw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_ssubw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_ssubw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_ssubw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_uaddw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_uaddw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_uaddw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_usubw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_usubw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_usubw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve_ld1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 91e45f2d32..71babd2fad 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1187,3 +1187,15 @@ UABDLT 01000101 .. 0 . 00 . . @rd_rn_rm SADDLBT 01000101 .. 0 . 1000 00 . . @rd_rn_rm SSUBLBT 01000101 .. 0 . 1000 10 . . @rd_rn_rm SSUBLTB 01000101 .. 0 . 1000 11 . . @rd_rn_rm + +## SVE2 integer add/subtract wide + +SADDWB 01000101 .. 0 . 010 000 . . @rd_rn_rm +SADDWT 01000101 .. 0 . 010 001 . . @rd_rn_rm +UADDWB 01000101 .. 0 . 010 010 . . @rd_rn_rm +UADDWT 01000101 .. 0 . 010 011 . . @rd_rn_rm + +SSUBWB 01000101 .. 0 . 010 100 . . @rd_rn_rm +SSUBWT 01000101 .. 0 . 010 101 . . @rd_rn_rm +USUBWB 01000101 .. 0 . 010 110 . . @rd_rn_rm +USUBWT 01000101 .. 0 . 010 111 . . @rd_rn_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 8653e1ed05..87b637179b 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1156,6 +1156,36 @@ DO_ZZZ_TB(sve2_uabdl_d, uint64_t, uint32_t, , H1_4, DO_ABD) #undef DO_ZZZ_TB +#define DO_ZZZ_WTB(NAME, TYPEW, TYPEN, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +int sel2 = extract32(desc, SIMD_DATA_SHIFT, 1) * sizeof(TYPEN); \ +for (i = 0; i < opr_sz; i += sizeof(TYPEW)) { \ +TYPEW nn = *(TYPEW *)(vn + HW(i)); \ +TYPEW mm = *(TYPEN *)(vm + HN(i + sel2)); \ +*(TYPEW *)(vd + HW(i)) = OP(nn, mm); \ +} \ +} + +DO_ZZZ_WTB(sve2_saddw_h, int16_t, int8_t, H1_2, H1, DO_ADD) +DO_ZZZ_WTB(sve2_saddw_s, int32_t, int16_t, H1_4, H1_2, DO_ADD) +DO_ZZZ_WTB(sve2_saddw_d, int64_t, int32_t, , H1_4, DO_ADD) + +DO_ZZZ_WTB(sve2_ssubw_h, int16_t, int8_t, H1_2, H1, DO_SUB) +DO_ZZZ_WTB(sve2_ssubw_s, int32_t, int16_t, H1_4, H1_2, DO_SUB) +DO_ZZZ_WTB(sve2_ssubw_d, int64_t, int32_t, , H1_4, DO_SUB) + +DO_ZZZ_WTB(sve2_uaddw_h, uint16_t, uint8_t, H1_2, H1, DO_ADD) +DO_ZZZ_WTB(sve2_uaddw_s, uint32_t, uint16_t, H1_4, H1_2, DO_ADD) +DO_ZZZ_WTB(sve2_uaddw_d, uint64_t, uint32_t, , H1_4, DO_ADD) + +DO_ZZZ_WTB(sve2_usubw_h, uint16_t, uint8_t, H1_2, H1, DO_SUB) +DO_ZZZ_WTB(sve2_usubw_s, uint32_t, uint16_t, H1_4, H1_2, DO_SUB) +DO_ZZZ_WTB(sve2_usubw_d, uint64_t, uint32_t, , H1_4, DO_SUB) + +#undef DO_ZZZ_WTB + /* Two-operand reduction expander, controlled by a predicate. * The difference between TYPERED and TYPERET has to do with * sign-extension. E.g. for SMAX, TYPERED must be signed, diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 14508cab07..fed7774c1e 100644 --- a/target/arm/translate-sve.c +++
[PATCH v2 039/100] target/arm: Implement SVE2 saturating extract narrow
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 24 target/arm/sve.decode | 12 ++ target/arm/sve_helper.c| 56 + target/arm/translate-sve.c | 248 - 4 files changed, 335 insertions(+), 5 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index cfd90f83eb..3fff364778 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1931,3 +1931,27 @@ DEF_HELPER_FLAGS_5(sve2_uabal_d, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve2_adcl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_adcl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_sqxtnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqxtnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqxtnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_uqxtnb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqxtnb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqxtnb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_sqxtunb_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqxtunb_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqxtunb_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_sqxtnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqxtnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqxtnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_uqxtnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqxtnt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_uqxtnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_sqxtunt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqxtunt_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sqxtunt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index b5450b1d4d..733f9a3db4 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1272,3 +1272,15 @@ SLI 01000101 .. 0 . 0 1 . . @rd_rn_tszimm_shl # TODO: Use @rda and %reg_movprfx here. SABA01000101 .. 0 . 1 0 . . @rd_rn_rm UABA01000101 .. 0 . 1 1 . . @rd_rn_rm + + SVE2 Narrowing + +## SVE2 saturating extract narrow + +# Bits 23, 18-16 are zero, limited in the translator via esz < 3 & imm == 0. +SQXTNB 01000101 .. 1 . 010 000 . . @rd_rn_tszimm_shl +SQXTNT 01000101 .. 1 . 010 001 . . @rd_rn_tszimm_shl +UQXTNB 01000101 .. 1 . 010 010 . . @rd_rn_tszimm_shl +UQXTNT 01000101 .. 1 . 010 011 . . @rd_rn_tszimm_shl +SQXTUNB 01000101 .. 1 . 010 100 . . @rd_rn_tszimm_shl +SQXTUNT 01000101 .. 1 . 010 101 . . @rd_rn_tszimm_shl diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 184b946a5b..0861fd0277 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1264,6 +1264,62 @@ DO_ZZZW_ACC(sve2_uabal_d, uint64_t, uint32_t, , H1_4, DO_ABD) #undef DO_ZZZW_ACC +#define DO_XTNB(NAME, TYPE, OP) \ +void HELPER(NAME)(void *vd, void *vn, uint32_t desc) \ +{\ +intptr_t i, opr_sz = simd_oprsz(desc); \ +for (i = 0; i < opr_sz; i += sizeof(TYPE)) { \ +TYPE nn = *(TYPE *)(vn + i); \ +nn = OP(nn) & MAKE_64BIT_MASK(0, sizeof(TYPE) * 4); \ +*(TYPE *)(vd + i) = nn; \ +}\ +} + +#define DO_XTNT(NAME, TYPE, TYPEN, H, OP) \ +void HELPER(NAME)(void *vd, void *vn, uint32_t desc)\ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc), odd = H(sizeof(TYPEN)); \ +for (i = 0; i < opr_sz; i += sizeof(TYPE)) {\ +TYPE nn = *(TYPE *)(vn + i);\ +*(TYPEN *)(vd + i + odd) = OP(nn); \ +} \ +} + +#define DO_SQXTN_H(n) do_sat_bhs(n, INT8_MIN, INT8_MAX) +#define DO_SQXTN_S(n) do_sat_bhs(n, INT16_MIN, INT16_MAX) +#define DO_SQXTN_D(n) do_sat_bhs(n, INT32_MIN, INT32_MAX) + +DO_XTNB(sve2_sqxtnb_h, int16_t, DO_SQXTN_H) +DO_XTNB(sve2_sqxtnb_s, int32_t, DO_SQXTN_S) +DO_XTNB(sve2_sqxtnb_d, int64_t, DO_SQXTN_D) + +DO_XTNT(sve2_sqxtnt_h, int16_t, int8_t, H1, DO_SQXTN_H) +DO_XTNT(sve2_sqxtnt_s, int32_t, int16_t, H1_2, DO_SQXTN_S) +DO_XTNT(sve2_sqxtnt_d, int64_t, int32_t, H1_4, DO_SQXTN_D) + +#define DO_UQXTN_H(n) do_sat_bhs(n,
[PATCH v2 032/100] target/arm: Implement SVE2 bitwise permute
Signed-off-by: Richard Henderson --- target/arm/cpu.h | 5 +++ target/arm/helper-sve.h| 15 target/arm/sve.decode | 6 target/arm/sve_helper.c| 73 ++ target/arm/translate-sve.c | 36 +++ 5 files changed, 135 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index f7574cb757..25ca3aed67 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -3872,6 +3872,11 @@ static inline bool isar_feature_aa64_sve2_pmull128(const ARMISARegisters *id) return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, AES) >= 2; } +static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id) +{ +return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0; +} + /* * Feature tests for "does this exist in either 32-bit or 64-bit?" */ diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 6a0d7a3784..1af6454228 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1889,3 +1889,18 @@ DEF_HELPER_FLAGS_4(sve2_eoril_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_eoril_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_eoril_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_eoril_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_bext_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_bext_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_bext_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_bext_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_bdep_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_bdep_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_bdep_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_bdep_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_bgrp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_bgrp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_bgrp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_bgrp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 79d915cf5b..b316610bbb 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1220,3 +1220,9 @@ USHLLT 01000101 .. 0 . 1010 11 . . @rd_rn_tszimm_shl EORBT 01000101 .. 0 . 10010 0 . . @rd_rn_rm EORTB 01000101 .. 0 . 10010 1 . . @rd_rn_rm + +## SVE2 bitwise permute + +BEXT01000101 .. 0 . 1011 00 . . @rd_rn_rm +BDEP01000101 .. 0 . 1011 01 . . @rd_rn_rm +BGRP01000101 .. 0 . 1011 10 . . @rd_rn_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index c7c90cd39f..bccadce451 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1241,6 +1241,79 @@ DO_ZZZ_NTB(sve2_eoril_d, uint64_t, , DO_EOR) #undef DO_ZZZ_NTB +#define DO_BITPERM(NAME, TYPE, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +for (i = 0; i < opr_sz; i += sizeof(TYPE)) { \ +TYPE nn = *(TYPE *)(vn + i); \ +TYPE mm = *(TYPE *)(vm + i); \ +*(TYPE *)(vd + i) = OP(nn, mm, sizeof(TYPE) * 8); \ +} \ +} + +static uint64_t bitextract(uint64_t data, uint64_t mask, int n) +{ +uint64_t res = 0; +int db, rb = 0; + +for (db = 0; db < n; ++db) { +if ((mask >> db) & 1) { +res |= ((data >> db) & 1) << rb; +++rb; +} +} +return res; +} + +DO_BITPERM(sve2_bext_b, uint8_t, bitextract) +DO_BITPERM(sve2_bext_h, uint16_t, bitextract) +DO_BITPERM(sve2_bext_s, uint32_t, bitextract) +DO_BITPERM(sve2_bext_d, uint64_t, bitextract) + +static uint64_t bitdeposit(uint64_t data, uint64_t mask, int n) +{ +uint64_t res = 0; +int rb, db = 0; + +for (rb = 0; rb < n; ++rb) { +if ((mask >> rb) & 1) { +res |= ((data >> db) & 1) << rb; +++db; +} +} +return res; +} + +DO_BITPERM(sve2_bdep_b, uint8_t, bitdeposit) +DO_BITPERM(sve2_bdep_h, uint16_t, bitdeposit) +DO_BITPERM(sve2_bdep_s, uint32_t, bitdeposit) +DO_BITPERM(sve2_bdep_d, uint64_t, bitdeposit) + +static uint64_t bitgroup(uint64_t data, uint64_t mask, int n) +{ +uint64_t resm = 0, resu = 0; +int db, rbm = 0, rbu = 0; + +for (db = 0; db < n; ++db) { +uint64_t val = (data >> db) & 1; +if ((mask >> db) & 1) { +resm |= val << rbm++; +} else { +resu |= val << rbu++; +} +
[PATCH v2 025/100] target/arm: Implement SVE2 integer add/subtract interleaved long
Signed-off-by: Richard Henderson --- target/arm/sve.decode | 6 ++ target/arm/translate-sve.c | 4 2 files changed, 10 insertions(+) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 84fc0ade2c..91e45f2d32 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1181,3 +1181,9 @@ SABDLB 01000101 .. 0 . 00 1100 . . @rd_rn_rm SABDLT 01000101 .. 0 . 00 1101 . . @rd_rn_rm UABDLB 01000101 .. 0 . 00 1110 . . @rd_rn_rm UABDLT 01000101 .. 0 . 00 . . @rd_rn_rm + +## SVE2 integer add/subtract interleaved long + +SADDLBT 01000101 .. 0 . 1000 00 . . @rd_rn_rm +SSUBLBT 01000101 .. 0 . 1000 10 . . @rd_rn_rm +SSUBLTB 01000101 .. 0 . 1000 11 . . @rd_rn_rm diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 13b3ef1a2c..14508cab07 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5582,3 +5582,7 @@ DO_SVE2_ZZZ_TB(SABDLT, sabdl, true, true) DO_SVE2_ZZZ_TB(UADDLT, uaddl, true, true) DO_SVE2_ZZZ_TB(USUBLT, usubl, true, true) DO_SVE2_ZZZ_TB(UABDLT, uabdl, true, true) + +DO_SVE2_ZZZ_TB(SADDLBT, saddl, false, true) +DO_SVE2_ZZZ_TB(SSUBLBT, ssubl, false, true) +DO_SVE2_ZZZ_TB(SSUBLTB, ssubl, true, false) -- 2.25.1
[PATCH v2 029/100] target/arm: Tidy SVE tszimm shift formats
Rather than require the user to fill in the immediate (shl or shr), create full formats that include the immediate. Signed-off-by: Richard Henderson --- target/arm/sve.decode | 35 --- 1 file changed, 16 insertions(+), 19 deletions(-) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 079ba0ec62..417b11fdd5 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -150,13 +150,17 @@ @rd_rn_i6 ... rn:5 . imm:s6 rd:5 # Two register operand, one immediate operand, with predicate, -# element size encoded as TSZHL. User must fill in imm. -@rdn_pg_tszimm .. ... ... ... pg:3 . rd:5 \ -_esz rn=%reg_movprfx esz=%tszimm_esz +# element size encoded as TSZHL. +@rdn_pg_tszimm_shl .. ... ... ... pg:3 . rd:5 \ +_esz rn=%reg_movprfx esz=%tszimm_esz imm=%tszimm_shl +@rdn_pg_tszimm_shr .. ... ... ... pg:3 . rd:5 \ +_esz rn=%reg_movprfx esz=%tszimm_esz imm=%tszimm_shr # Similarly without predicate. -@rd_rn_tszimm .. ... ... .. rn:5 rd:5 \ -_esz esz=%tszimm16_esz +@rd_rn_tszimm_shl .. ... ... .. rn:5 rd:5 \ +_esz esz=%tszimm16_esz imm=%tszimm16_shl +@rd_rn_tszimm_shr .. ... ... .. rn:5 rd:5 \ +_esz esz=%tszimm16_esz imm=%tszimm16_shr # Two register operand, one immediate operand, with 4-bit predicate. # User must fill in imm. @@ -289,14 +293,10 @@ UMINV 0100 .. 001 011 001 ... . . @rd_pg_rn ### SVE Shift by Immediate - Predicated Group # SVE bitwise shift by immediate (predicated) -ASR_zpzi0100 .. 000 000 100 ... .. ... . \ -@rdn_pg_tszimm imm=%tszimm_shr -LSR_zpzi0100 .. 000 001 100 ... .. ... . \ -@rdn_pg_tszimm imm=%tszimm_shr -LSL_zpzi0100 .. 000 011 100 ... .. ... . \ -@rdn_pg_tszimm imm=%tszimm_shl -ASRD0100 .. 000 100 100 ... .. ... . \ -@rdn_pg_tszimm imm=%tszimm_shr +ASR_zpzi0100 .. 000 000 100 ... .. ... . @rdn_pg_tszimm_shr +LSR_zpzi0100 .. 000 001 100 ... .. ... . @rdn_pg_tszimm_shr +LSL_zpzi0100 .. 000 011 100 ... .. ... . @rdn_pg_tszimm_shl +ASRD0100 .. 000 100 100 ... .. ... . @rdn_pg_tszimm_shr # SVE bitwise shift by vector (predicated) ASR_zpzz0100 .. 010 000 100 ... . . @rdn_pg_rm @@ -400,12 +400,9 @@ RDVL0100 101 1 01010 imm:s6 rd:5 ### SVE Bitwise Shift - Unpredicated Group # SVE bitwise shift by immediate (unpredicated) -ASR_zzi 0100 .. 1 . 1001 00 . . \ -@rd_rn_tszimm imm=%tszimm16_shr -LSR_zzi 0100 .. 1 . 1001 01 . . \ -@rd_rn_tszimm imm=%tszimm16_shr -LSL_zzi 0100 .. 1 . 1001 11 . . \ -@rd_rn_tszimm imm=%tszimm16_shl +ASR_zzi 0100 .. 1 . 1001 00 . . @rd_rn_tszimm_shr +LSR_zzi 0100 .. 1 . 1001 01 . . @rd_rn_tszimm_shr +LSL_zzi 0100 .. 1 . 1001 11 . . @rd_rn_tszimm_shl # SVE bitwise shift by wide elements (unpredicated) # Note esz != 3 -- 2.25.1
[PATCH v2 030/100] target/arm: Implement SVE2 bitwise shift left long
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 8 ++ target/arm/sve.decode | 8 ++ target/arm/sve_helper.c| 26 ++ target/arm/translate-sve.c | 159 + 4 files changed, 201 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index e7b539df21..7cd75150e0 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1876,3 +1876,11 @@ DEF_HELPER_FLAGS_4(sve2_umull_zzz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_pmull_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_sshll_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sshll_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_sshll_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sve2_ushll_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_ushll_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sve2_ushll_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 417b11fdd5..851b336c7b 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1207,3 +1207,11 @@ SMULLB_zzz 01000101 .. 0 . 011 100 . . @rd_rn_rm SMULLT_zzz 01000101 .. 0 . 011 101 . . @rd_rn_rm UMULLB_zzz 01000101 .. 0 . 011 110 . . @rd_rn_rm UMULLT_zzz 01000101 .. 0 . 011 111 . . @rd_rn_rm + +## SVE2 bitwise shift left long + +# Note bit23 == 0 is handled by esz > 0 in do_sve2_shll_tb. +SSHLLB 01000101 .. 0 . 1010 00 . . @rd_rn_tszimm_shl +SSHLLT 01000101 .. 0 . 1010 01 . . @rd_rn_tszimm_shl +USHLLB 01000101 .. 0 . 1010 10 . . @rd_rn_tszimm_shl +USHLLT 01000101 .. 0 . 1010 11 . . @rd_rn_tszimm_shl diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index cb2c425104..670fd4ed15 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -625,6 +625,8 @@ DO_ZPZZ(sve2_sqrshl_zpzz_h, int16_t, H1_2, do_sqrshl_h) DO_ZPZZ(sve2_sqrshl_zpzz_s, int32_t, H1_4, do_sqrshl_s) DO_ZPZZ_D(sve2_sqrshl_zpzz_d, int64_t, do_sqrshl_d) +#undef do_sqrshl_d + #define do_uqrshl_b(n, m) \ ({ uint32_t discard; do_uqrshl_bhs(n, (int8_t)m, 8, true, ); }) #define do_uqrshl_h(n, m) \ @@ -639,6 +641,8 @@ DO_ZPZZ(sve2_uqrshl_zpzz_h, uint16_t, H1_2, do_uqrshl_h) DO_ZPZZ(sve2_uqrshl_zpzz_s, uint32_t, H1_4, do_uqrshl_s) DO_ZPZZ_D(sve2_uqrshl_zpzz_d, uint64_t, do_uqrshl_d) +#undef do_uqrshl_d + #define DO_HADD_BHS(n, m) (((int64_t)n + m) >> 1) #define DO_HADD_D(n, m)((n >> 1) + (m >> 1) + (n & m & 1)) @@ -1217,6 +1221,28 @@ DO_ZZZ_WTB(sve2_usubw_d, uint64_t, uint32_t, , H1_4, DO_SUB) #undef DO_ZZZ_WTB +#define DO_ZZI_SHLL(NAME, TYPEW, TYPEN, HW, HN) \ +void HELPER(NAME)(void *vd, void *vn, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +intptr_t sel = (simd_data(desc) & 1) * sizeof(TYPEN); \ +int shift = simd_data(desc) >> 1; \ +for (i = 0; i < opr_sz; i += sizeof(TYPEW)) { \ +TYPEW nn = *(TYPEN *)(vn + HN(i + sel)); \ +*(TYPEW *)(vd + HW(i)) = nn << shift; \ +} \ +} + +DO_ZZI_SHLL(sve2_sshll_h, int16_t, int8_t, H1_2, H1) +DO_ZZI_SHLL(sve2_sshll_s, int32_t, int16_t, H1_4, H1_2) +DO_ZZI_SHLL(sve2_sshll_d, int64_t, int32_t, , H1_4) + +DO_ZZI_SHLL(sve2_ushll_h, uint16_t, uint8_t, H1_2, H1) +DO_ZZI_SHLL(sve2_ushll_s, uint32_t, uint16_t, H1_4, H1_2) +DO_ZZI_SHLL(sve2_ushll_d, uint64_t, uint32_t, , H1_4) + +#undef DO_ZZI_SHLL + /* Two-operand reduction expander, controlled by a predicate. * The difference between TYPERED and TYPERET has to do with * sign-extension. E.g. for SMAX, TYPERED must be signed, diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index db2081130d..ef212b01f1 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5637,3 +5637,162 @@ DO_SVE2_ZZZ_WTB(UADDWB, uaddw, false) DO_SVE2_ZZZ_WTB(UADDWT, uaddw, true) DO_SVE2_ZZZ_WTB(USUBWB, usubw, false) DO_SVE2_ZZZ_WTB(USUBWT, usubw, true) + +static void gen_sshll_vec(unsigned vece, TCGv_vec d, TCGv_vec n, int64_t imm) +{ +int top = imm & 1; +int shl = imm >> 1; +int halfbits = 4 << vece; + +if (top) { +if (shl == halfbits) { +TCGv_vec t = tcg_temp_new_vec_matching(d); +tcg_gen_dupi_vec(vece, t, MAKE_64BIT_MASK(halfbits, halfbits)); +tcg_gen_and_vec(vece, d, n, t); +tcg_temp_free_vec(t); +} else { +tcg_gen_sari_vec(vece, d, n, halfbits); +tcg_gen_shli_vec(vece, d, d,
[PATCH v2 036/100] target/arm: Implement SVE2 bitwise shift right and accumulate
Signed-off-by: Richard Henderson --- target/arm/sve.decode | 8 target/arm/translate-sve.c | 34 ++ 2 files changed, 42 insertions(+) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index f4f0c2ade6..7783e9f0d3 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1253,3 +1253,11 @@ UABALT 01000101 .. 0 . 1100 11 . . @rda_rn_rm # ADC and SBC decoded via size in helper dispatch. ADCLB 01000101 .. 0 . 11010 0 . . @rda_rn_rm ADCLT 01000101 .. 0 . 11010 1 . . @rda_rn_rm + +## SVE2 bitwise shift right and accumulate + +# TODO: Use @rda and %reg_movprfx here. +SSRA01000101 .. 0 . 1110 00 . . @rd_rn_tszimm_shr +USRA01000101 .. 0 . 1110 01 . . @rd_rn_tszimm_shr +SRSRA 01000101 .. 0 . 1110 10 . . @rd_rn_tszimm_shr +URSRA 01000101 .. 0 . 1110 11 . . @rd_rn_tszimm_shr diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 9131b6d546..3dcc67740f 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5960,3 +5960,37 @@ static bool trans_ADCLT(DisasContext *s, arg__esz *a) { return do_adcl(s, a, true); } + +static bool do_sve2_fn2i(DisasContext *s, arg_rri_esz *a, GVecGen2iFn *fn) +{ +if (a->esz < 0 || !dc_isar_feature(aa64_sve2, s)) { +return false; +} +if (sve_access_check(s)) { +unsigned vsz = vec_full_reg_size(s); +unsigned rd_ofs = vec_full_reg_offset(s, a->rd); +unsigned rn_ofs = vec_full_reg_offset(s, a->rn); +fn(a->esz, rd_ofs, rn_ofs, a->imm, vsz, vsz); +} +return true; +} + +static bool trans_SSRA(DisasContext *s, arg_rri_esz *a) +{ +return do_sve2_fn2i(s, a, gen_gvec_ssra); +} + +static bool trans_USRA(DisasContext *s, arg_rri_esz *a) +{ +return do_sve2_fn2i(s, a, gen_gvec_usra); +} + +static bool trans_SRSRA(DisasContext *s, arg_rri_esz *a) +{ +return do_sve2_fn2i(s, a, gen_gvec_srsra); +} + +static bool trans_URSRA(DisasContext *s, arg_rri_esz *a) +{ +return do_sve2_fn2i(s, a, gen_gvec_ursra); +} -- 2.25.1
[PATCH v2 018/100] target/arm: Implement SVE2 integer unary operations (predicated)
Signed-off-by: Richard Henderson --- v2: Fix sqabs, sqneg (laurent desnogues) --- target/arm/helper-sve.h| 13 +++ target/arm/sve.decode | 7 ++ target/arm/sve_helper.c| 29 +++ target/arm/translate-sve.c | 47 ++ 4 files changed, 92 insertions(+), 4 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 83840168b9..97abd39af0 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -502,6 +502,19 @@ DEF_HELPER_FLAGS_4(sve_rbit_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_rbit_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_rbit_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqabs_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_sqneg_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_urecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_ursqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_splice, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_cmpeq_ppzz_b, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 15762f836b..9788a2b472 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1108,3 +1108,10 @@ PMUL_zzz0100 00 1 . 0110 01 . . @rd_rn_rm_e0 SADALP_zpzz 01000100 .. 000 100 101 ... . . @rdm_pg_rn UADALP_zpzz 01000100 .. 000 101 101 ... . . @rdm_pg_rn + +### SVE2 integer unary operations (predicated) + +URECPE 01000100 .. 000 000 101 ... . . @rd_pg_rn +URSQRTE 01000100 .. 000 001 101 ... . . @rd_pg_rn +SQABS 01000100 .. 001 000 101 ... . . @rd_pg_rn +SQNEG 01000100 .. 001 001 101 ... . . @rd_pg_rn diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 7fa2e9f67c..250415f366 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -535,8 +535,8 @@ static inline uint64_t do_sadalp_d(uint64_t n, uint64_t m) return m + n1 + n2; } -DO_ZPZZ(sve2_sadalp_zpzz_h, int16_t, H1_2, do_sadalp_h) -DO_ZPZZ(sve2_sadalp_zpzz_s, int32_t, H1_4, do_sadalp_s) +DO_ZPZZ(sve2_sadalp_zpzz_h, uint16_t, H1_2, do_sadalp_h) +DO_ZPZZ(sve2_sadalp_zpzz_s, uint32_t, H1_4, do_sadalp_s) DO_ZPZZ_D(sve2_sadalp_zpzz_d, uint64_t, do_sadalp_d) static inline uint16_t do_uadalp_h(uint16_t n, uint16_t m) @@ -557,8 +557,8 @@ static inline uint64_t do_uadalp_d(uint64_t n, uint64_t m) return m + n1 + n2; } -DO_ZPZZ(sve2_uadalp_zpzz_h, int16_t, H1_2, do_uadalp_h) -DO_ZPZZ(sve2_uadalp_zpzz_s, int32_t, H1_4, do_uadalp_s) +DO_ZPZZ(sve2_uadalp_zpzz_h, uint16_t, H1_2, do_uadalp_h) +DO_ZPZZ(sve2_uadalp_zpzz_s, uint32_t, H1_4, do_uadalp_s) DO_ZPZZ_D(sve2_uadalp_zpzz_d, uint64_t, do_uadalp_d) #undef DO_ZPZZ @@ -728,6 +728,27 @@ DO_ZPZ(sve_rbit_h, uint16_t, H1_2, revbit16) DO_ZPZ(sve_rbit_s, uint32_t, H1_4, revbit32) DO_ZPZ_D(sve_rbit_d, uint64_t, revbit64) +#define DO_SQABS(X) \ +({ __typeof(X) x_ = (X), min_ = 1ull << (sizeof(X) * 8 - 1); \ + x_ >= 0 ? x_ : x_ == min_ ? -min_ - 1 : -x_; }) + +DO_ZPZ(sve2_sqabs_b, int8_t, H1, DO_SQABS) +DO_ZPZ(sve2_sqabs_h, int16_t, H1_2, DO_SQABS) +DO_ZPZ(sve2_sqabs_s, int32_t, H1_4, DO_SQABS) +DO_ZPZ_D(sve2_sqabs_d, int64_t, DO_SQABS) + +#define DO_SQNEG(X) \ +({ __typeof(X) x_ = (X), min_ = 1ull << (sizeof(X) * 8 - 1); \ + x_ == min_ ? -min_ - 1 : -x_; }) + +DO_ZPZ(sve2_sqneg_b, uint8_t, H1, DO_SQNEG) +DO_ZPZ(sve2_sqneg_h, uint16_t, H1_2, DO_SQNEG) +DO_ZPZ(sve2_sqneg_s, uint32_t, H1_4, DO_SQNEG) +DO_ZPZ_D(sve2_sqneg_d, uint64_t, DO_SQNEG) + +DO_ZPZ(sve2_urecpe_s, uint32_t, H1_4, helper_recpe_u32) +DO_ZPZ(sve2_ursqrte_s, uint32_t, H1_4, helper_rsqrte_u32) + /* Three-operand expander, unpredicated, in which the third operand is "wide". */ #define DO_ZZW(NAME, TYPE, TYPEW, H, OP) \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 5eac71d849..b2845e2043 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5450,3 +5450,50 @@ static bool trans_UADALP_zpzz(DisasContext *s, arg_rprr_esz *a) } return do_sve2_zpzz_ool(s, a, fns[a->esz - 1]); } + +/* + * SVE2 integer unary operations (predicated) + */ + +static bool do_sve2_zpz_ool(DisasContext *s, arg_rpr_esz *a, +gen_helper_gvec_3 *fn) +{ +if
[PATCH v2 024/100] target/arm: Implement SVE2 integer add/subtract long
Signed-off-by: Richard Henderson --- v2: Fix select offsets (laurent desnogues). --- target/arm/helper-sve.h| 24 target/arm/sve.decode | 19 target/arm/sve_helper.c| 43 +++ target/arm/translate-sve.c | 46 ++ 4 files changed, 132 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index be5b0aec5b..d16d85d2d7 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1367,6 +1367,30 @@ DEF_HELPER_FLAGS_5(sve_ftmad_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_ftmad_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_ftmad_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_saddl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_saddl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_saddl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_ssubl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_ssubl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_ssubl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_sabdl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sabdl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sabdl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_uaddl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_uaddl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_uaddl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_usubl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_usubl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_usubl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_uabdl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_uabdl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_uabdl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve_ld1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 7a287bd8a6..84fc0ade2c 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1162,3 +1162,22 @@ SUQADD 01000100 .. 011 100 100 ... . . @rdn_pg_rm USQADD 01000100 .. 011 101 100 ... . . @rdn_pg_rm SQSUB_zpzz 01000100 .. 011 110 100 ... . . @rdm_pg_rn # SQSUBR UQSUB_zpzz 01000100 .. 011 111 100 ... . . @rdm_pg_rn # UQSUBR + + SVE2 Widening Integer Arithmetic + +## SVE2 integer add/subtract long + +SADDLB 01000101 .. 0 . 00 . . @rd_rn_rm +SADDLT 01000101 .. 0 . 00 0001 . . @rd_rn_rm +UADDLB 01000101 .. 0 . 00 0010 . . @rd_rn_rm +UADDLT 01000101 .. 0 . 00 0011 . . @rd_rn_rm + +SSUBLB 01000101 .. 0 . 00 0100 . . @rd_rn_rm +SSUBLT 01000101 .. 0 . 00 0101 . . @rd_rn_rm +USUBLB 01000101 .. 0 . 00 0110 . . @rd_rn_rm +USUBLT 01000101 .. 0 . 00 0111 . . @rd_rn_rm + +SABDLB 01000101 .. 0 . 00 1100 . . @rd_rn_rm +SABDLT 01000101 .. 0 . 00 1101 . . @rd_rn_rm +UABDLB 01000101 .. 0 . 00 1110 . . @rd_rn_rm +UABDLT 01000101 .. 0 . 00 . . @rd_rn_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index ba80d24b21..8653e1ed05 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1113,6 +1113,49 @@ DO_ZZW(sve_lsl_zzw_s, uint32_t, uint64_t, H1_4, DO_LSL) #undef DO_ZPZ #undef DO_ZPZ_D +/* + * Three-operand expander, unpredicated, in which the two inputs are + * selected from the top or bottom half of the wide column. + */ +#define DO_ZZZ_TB(NAME, TYPEW, TYPEN, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +int sel1 = extract32(desc, SIMD_DATA_SHIFT, 1) * sizeof(TYPEN); \ +int sel2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(TYPEN); \ +for (i = 0; i < opr_sz; i += sizeof(TYPEW)) { \ +TYPEW nn = *(TYPEN *)(vn + HN(i + sel1)); \ +TYPEW mm = *(TYPEN *)(vm + HN(i + sel2)); \ +*(TYPEW *)(vd + HW(i)) = OP(nn, mm);\ +} \
[PATCH v2 027/100] target/arm: Implement SVE2 integer multiply long
Exclude PMULL from this category for the moment. Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 15 +++ target/arm/sve.decode | 9 + target/arm/sve_helper.c| 31 +++ target/arm/translate-sve.c | 9 + 4 files changed, 64 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index e662191767..cb1d4f2443 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1859,4 +1859,19 @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd, TCG_CALL_NO_WG, DEF_HELPER_FLAGS_6(sve_stdd_be_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve2_sqdmull_zzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqdmull_zzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_sqdmull_zzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_smull_zzz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_smull_zzz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_smull_zzz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_umull_zzz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_umull_zzz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_umull_zzz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve2_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 71babd2fad..32370d7b76 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1199,3 +1199,12 @@ SSUBWB 01000101 .. 0 . 010 100 . . @rd_rn_rm SSUBWT 01000101 .. 0 . 010 101 . . @rd_rn_rm USUBWB 01000101 .. 0 . 010 110 . . @rd_rn_rm USUBWT 01000101 .. 0 . 010 111 . . @rd_rn_rm + +## SVE2 integer multiply long + +SQDMULLB_zzz01000101 .. 0 . 011 000 . . @rd_rn_rm +SQDMULLT_zzz01000101 .. 0 . 011 001 . . @rd_rn_rm +SMULLB_zzz 01000101 .. 0 . 011 100 . . @rd_rn_rm +SMULLT_zzz 01000101 .. 0 . 011 101 . . @rd_rn_rm +UMULLB_zzz 01000101 .. 0 . 011 110 . . @rd_rn_rm +UMULLT_zzz 01000101 .. 0 . 011 111 . . @rd_rn_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 87b637179b..cb2c425104 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1154,6 +1154,37 @@ DO_ZZZ_TB(sve2_uabdl_h, uint16_t, uint8_t, H1_2, H1, DO_ABD) DO_ZZZ_TB(sve2_uabdl_s, uint32_t, uint16_t, H1_4, H1_2, DO_ABD) DO_ZZZ_TB(sve2_uabdl_d, uint64_t, uint32_t, , H1_4, DO_ABD) +DO_ZZZ_TB(sve2_smull_zzz_h, int16_t, int8_t, H1_2, H1, DO_MUL) +DO_ZZZ_TB(sve2_smull_zzz_s, int32_t, int16_t, H1_4, H1_2, DO_MUL) +DO_ZZZ_TB(sve2_smull_zzz_d, int64_t, int32_t, , H1_4, DO_MUL) + +DO_ZZZ_TB(sve2_umull_zzz_h, uint16_t, uint8_t, H1_2, H1, DO_MUL) +DO_ZZZ_TB(sve2_umull_zzz_s, uint32_t, uint16_t, H1_4, H1_2, DO_MUL) +DO_ZZZ_TB(sve2_umull_zzz_d, uint64_t, uint32_t, , H1_4, DO_MUL) + +/* Note that the multiply cannot overflow, but the doubling can. */ +static inline int16_t do_sqdmull_h(int16_t n, int16_t m) +{ +int16_t val = n * m; +return DO_SQADD_H(val, val); +} + +static inline int32_t do_sqdmull_s(int32_t n, int32_t m) +{ +int32_t val = n * m; +return DO_SQADD_S(val, val); +} + +static inline int64_t do_sqdmull_d(int64_t n, int64_t m) +{ +int64_t val = n * m; +return do_sqadd_d(val, val); +} + +DO_ZZZ_TB(sve2_sqdmull_zzz_h, int16_t, int8_t, H1_2, H1, do_sqdmull_h) +DO_ZZZ_TB(sve2_sqdmull_zzz_s, int32_t, int16_t, H1_4, H1_2, do_sqdmull_s) +DO_ZZZ_TB(sve2_sqdmull_zzz_d, int64_t, int32_t, , H1_4, do_sqdmull_d) + #undef DO_ZZZ_TB #define DO_ZZZ_WTB(NAME, TYPEW, TYPEN, HW, HN, OP) \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index fed7774c1e..0712a25de7 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5587,6 +5587,15 @@ DO_SVE2_ZZZ_TB(SADDLBT, saddl, false, true) DO_SVE2_ZZZ_TB(SSUBLBT, ssubl, false, true) DO_SVE2_ZZZ_TB(SSUBLTB, ssubl, true, false) +DO_SVE2_ZZZ_TB(SQDMULLB_zzz, sqdmull_zzz, false, false) +DO_SVE2_ZZZ_TB(SQDMULLT_zzz, sqdmull_zzz, true, true) + +DO_SVE2_ZZZ_TB(SMULLB_zzz, smull_zzz, false, false) +DO_SVE2_ZZZ_TB(SMULLT_zzz, smull_zzz, true, true) + +DO_SVE2_ZZZ_TB(UMULLB_zzz, umull_zzz, false, false) +DO_SVE2_ZZZ_TB(UMULLT_zzz, umull_zzz, true, true) + #define DO_SVE2_ZZZ_WTB(NAME, name, SEL2) \ static bool trans_##NAME(DisasContext *s, arg_rrr_esz *a) \ { \ -- 2.25.1
[PATCH v2 031/100] target/arm: Implement SVE2 bitwise exclusive-or interleaved
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 5 + target/arm/sve.decode | 5 + target/arm/sve_helper.c| 20 target/arm/translate-sve.c | 19 +++ 4 files changed, 49 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 7cd75150e0..6a0d7a3784 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1884,3 +1884,8 @@ DEF_HELPER_FLAGS_3(sve2_sshll_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_ushll_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_ushll_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve2_ushll_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2_eoril_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_eoril_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_eoril_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_eoril_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 851b336c7b..79d915cf5b 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1215,3 +1215,8 @@ SSHLLB 01000101 .. 0 . 1010 00 . . @rd_rn_tszimm_shl SSHLLT 01000101 .. 0 . 1010 01 . . @rd_rn_tszimm_shl USHLLB 01000101 .. 0 . 1010 10 . . @rd_rn_tszimm_shl USHLLT 01000101 .. 0 . 1010 11 . . @rd_rn_tszimm_shl + +## SVE2 bitwise exclusive-or interleaved + +EORBT 01000101 .. 0 . 10010 0 . . @rd_rn_rm +EORTB 01000101 .. 0 . 10010 1 . . @rd_rn_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 670fd4ed15..c7c90cd39f 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1221,6 +1221,26 @@ DO_ZZZ_WTB(sve2_usubw_d, uint64_t, uint32_t, , H1_4, DO_SUB) #undef DO_ZZZ_WTB +#define DO_ZZZ_NTB(NAME, TYPE, H, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +intptr_t sel1 = extract32(desc, SIMD_DATA_SHIFT, 1) * sizeof(TYPE); \ +intptr_t sel2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(TYPE); \ +for (i = 0; i < opr_sz; i += 2 * sizeof(TYPE)) {\ +TYPE nn = *(TYPE *)(vn + H(i + sel1)); \ +TYPE mm = *(TYPE *)(vm + H(i + sel2)); \ +*(TYPE *)(vd + H(i + sel1)) = OP(nn, mm); \ +} \ +} + +DO_ZZZ_NTB(sve2_eoril_b, uint8_t, H1, DO_EOR) +DO_ZZZ_NTB(sve2_eoril_h, uint16_t, H1_2, DO_EOR) +DO_ZZZ_NTB(sve2_eoril_s, uint32_t, H1_4, DO_EOR) +DO_ZZZ_NTB(sve2_eoril_d, uint64_t, , DO_EOR) + +#undef DO_ZZZ_NTB + #define DO_ZZI_SHLL(NAME, TYPEW, TYPEN, HW, HN) \ void HELPER(NAME)(void *vd, void *vn, uint32_t desc) \ { \ diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index ef212b01f1..1982d43d81 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5596,6 +5596,25 @@ DO_SVE2_ZZZ_TB(SMULLT_zzz, smull_zzz, true, true) DO_SVE2_ZZZ_TB(UMULLB_zzz, umull_zzz, false, false) DO_SVE2_ZZZ_TB(UMULLT_zzz, umull_zzz, true, true) +static bool do_eor_tb(DisasContext *s, arg_rrr_esz *a, bool sel1) +{ +static gen_helper_gvec_3 * const fns[4] = { +gen_helper_sve2_eoril_b, gen_helper_sve2_eoril_h, +gen_helper_sve2_eoril_s, gen_helper_sve2_eoril_d, +}; +return do_sve2_zzw_ool(s, a, fns[a->esz], (!sel1 << 1) | sel1); +} + +static bool trans_EORBT(DisasContext *s, arg_rrr_esz *a) +{ +return do_eor_tb(s, a, false); +} + +static bool trans_EORTB(DisasContext *s, arg_rrr_esz *a) +{ +return do_eor_tb(s, a, true); +} + static bool do_trans_pmull(DisasContext *s, arg_rrr_esz *a, bool sel) { static gen_helper_gvec_3 * const fns[4] = { -- 2.25.1
[PATCH v2 016/100] target/arm: Implement SVE2 Integer Multiply - Unpredicated
For MUL, we can rely on generic support. For SMULH and UMULH, create some trivial helpers. For PMUL, back in a21bb78e5817, we organized helper_gvec_pmul_b in preparation for this use. Signed-off-by: Richard Henderson --- target/arm/helper.h| 10 target/arm/sve.decode | 10 target/arm/translate-sve.c | 50 target/arm/vec_helper.c| 96 ++ 4 files changed, 166 insertions(+) diff --git a/target/arm/helper.h b/target/arm/helper.h index 2a20c8174c..236fa438c6 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -686,6 +686,16 @@ DEF_HELPER_FLAGS_3(gvec_cgt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_cge0_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_cge0_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_smulh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_smulh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_smulh_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_smulh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(gvec_umulh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_umulh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_umulh_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_umulh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 4f580a25e7..31f67e0955 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1093,3 +1093,13 @@ ST1_zprz1110010 .. 00 . 100 ... . . \ @rprr_scatter_store xs=0 esz=3 scale=0 ST1_zprz1110010 .. 00 . 110 ... . . \ @rprr_scatter_store xs=1 esz=3 scale=0 + + SVE2 Support + +### SVE2 Integer Multiply - Unpredicated + +# SVE2 integer multiply vectors (unpredicated) +MUL_zzz 0100 .. 1 . 0110 00 . . @rd_rn_rm +SMULH_zzz 0100 .. 1 . 0110 10 . . @rd_rn_rm +UMULH_zzz 0100 .. 1 . 0110 11 . . @rd_rn_rm +PMUL_zzz0100 00 1 . 0110 01 . . @rd_rn_rm_e0 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 92a4e3f030..850e2fda15 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5361,3 +5361,53 @@ static bool trans_MOVPRFX_z(DisasContext *s, arg_rpr_esz *a) { return do_movz_zpz(s, a->rd, a->rn, a->pg, a->esz, false); } + +/* + * SVE2 Integer Multiply - Unpredicated + */ + +static bool trans_MUL_zzz(DisasContext *s, arg_rrr_esz *a) +{ +if (!dc_isar_feature(aa64_sve2, s)) { +return false; +} +if (sve_access_check(s)) { +gen_gvec_fn_zzz(s, tcg_gen_gvec_mul, a->esz, a->rd, a->rn, a->rm); +} +return true; +} + +static bool do_sve2_zzz_ool(DisasContext *s, arg_rrr_esz *a, +gen_helper_gvec_3 *fn) +{ +if (fn == NULL || !dc_isar_feature(aa64_sve2, s)) { +return false; +} +if (sve_access_check(s)) { +gen_gvec_ool_zzz(s, fn, a->rd, a->rn, a->rm, 0); +} +return true; +} + +static bool trans_SMULH_zzz(DisasContext *s, arg_rrr_esz *a) +{ +static gen_helper_gvec_3 * const fns[4] = { +gen_helper_gvec_smulh_b, gen_helper_gvec_smulh_h, +gen_helper_gvec_smulh_s, gen_helper_gvec_smulh_d, +}; +return do_sve2_zzz_ool(s, a, fns[a->esz]); +} + +static bool trans_UMULH_zzz(DisasContext *s, arg_rrr_esz *a) +{ +static gen_helper_gvec_3 * const fns[4] = { +gen_helper_gvec_umulh_b, gen_helper_gvec_umulh_h, +gen_helper_gvec_umulh_s, gen_helper_gvec_umulh_d, +}; +return do_sve2_zzz_ool(s, a, fns[a->esz]); +} + +static bool trans_PMUL_zzz(DisasContext *s, arg_rrr_esz *a) +{ +return do_sve2_zzz_ool(s, a, gen_helper_gvec_pmul_b); +} diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 7d76412ee0..cd58bfb84f 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -1452,3 +1452,99 @@ DO_ABA(gvec_uaba_s, uint32_t) DO_ABA(gvec_uaba_d, uint64_t) #undef DO_ABA + +/* + * NxN -> N highpart multiply + * + * TODO: expose this as a generic vector operation. + */ + +void HELPER(gvec_smulh_b)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int8_t *d = vd, *n = vn, *m = vm; + +for (i = 0; i < opr_sz; ++i) { +d[i] = ((int32_t)n[i] * m[i]) >> 8; +} +clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_smulh_h)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t i, opr_sz = simd_oprsz(desc); +int16_t *d = vd, *n = vn, *m = vm; + +
[PATCH v2 021/100] target/arm: Implement SVE2 integer halving add/subtract (predicated)
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 54 ++ target/arm/sve.decode | 11 target/arm/sve_helper.c| 39 +++ target/arm/translate-sve.c | 8 ++ 4 files changed, 112 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 53c5c1d3f9..e02c3661c3 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -226,6 +226,60 @@ DEF_HELPER_FLAGS_5(sve2_uqrshl_zpzz_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve2_uqrshl_zpzz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_shadd_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_shadd_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_shadd_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_shadd_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_uhadd_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uhadd_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uhadd_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uhadd_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_srhadd_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_srhadd_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_srhadd_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_srhadd_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_urhadd_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_urhadd_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_urhadd_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_urhadd_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_shsub_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_shsub_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_shsub_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_shsub_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_uhsub_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uhsub_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uhsub_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uhsub_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_sdiv_zpzz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_sdiv_zpzz_d, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 2bf3eff1cf..2d5104b84f 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1132,3 +1132,14 @@ SQRSHL 01000100 .. 001 010 100 ... . . @rdn_pg_rm UQRSHL 01000100 .. 001 011 100 ... . . @rdn_pg_rm SQRSHL 01000100 .. 001 110 100 ... . . @rdm_pg_rn # SQRSHLR UQRSHL 01000100 .. 001 111 100 ... . . @rdm_pg_rn # UQRSHLR + +### SVE2 integer halving add/subtract (predicated) + +SHADD 01000100 .. 010 000 100 ... . . @rdn_pg_rm +UHADD 01000100 .. 010 001 100 ... . . @rdn_pg_rm +SHSUB 01000100 .. 010 010 100 ... . . @rdn_pg_rm +UHSUB 01000100 .. 010 011 100 ... . . @rdn_pg_rm +SRHADD 01000100 .. 010 100 100 ... . . @rdn_pg_rm +URHADD 01000100 .. 010 101 100 ... . . @rdn_pg_rm +SHSUB 01000100 .. 010 110 100 ... . . @rdm_pg_rn # SHSUBR +UHSUB 01000100 .. 010 111 100 ... . . @rdm_pg_rn # UHSUBR diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 341ec23491..b578556a22 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -639,6 +639,45 @@ DO_ZPZZ(sve2_uqrshl_zpzz_h, uint16_t, H1_2, do_uqrshl_h) DO_ZPZZ(sve2_uqrshl_zpzz_s, uint32_t, H1_4, do_uqrshl_s) DO_ZPZZ_D(sve2_uqrshl_zpzz_d, uint64_t, do_uqrshl_d) +#define DO_HADD_BHS(n, m) (((int64_t)n + m) >> 1) +#define DO_HADD_D(n, m)((n >> 1) + (m >> 1) + (n & m & 1)) + +DO_ZPZZ(sve2_shadd_zpzz_b, int8_t, H1_2,
[PATCH v2 028/100] target/arm: Implement PMULLB and PMULLT
Signed-off-by: Richard Henderson --- target/arm/cpu.h | 10 ++ target/arm/helper-sve.h| 1 + target/arm/sve.decode | 2 ++ target/arm/translate-sve.c | 22 ++ target/arm/vec_helper.c| 24 5 files changed, 59 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index e9f56e67c7..f7574cb757 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -3862,6 +3862,16 @@ static inline bool isar_feature_aa64_sve2(const ARMISARegisters *id) return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, SVEVER) != 0; } +static inline bool isar_feature_aa64_sve2_aes(const ARMISARegisters *id) +{ +return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, AES) != 0; +} + +static inline bool isar_feature_aa64_sve2_pmull128(const ARMISARegisters *id) +{ +return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, AES) >= 2; +} + /* * Feature tests for "does this exist in either 32-bit or 64-bit?" */ diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index cb1d4f2443..e7b539df21 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -1875,3 +1875,4 @@ DEF_HELPER_FLAGS_4(sve2_umull_zzz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_umull_zzz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2_pmull_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 32370d7b76..079ba0ec62 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1204,6 +1204,8 @@ USUBWT 01000101 .. 0 . 010 111 . . @rd_rn_rm SQDMULLB_zzz01000101 .. 0 . 011 000 . . @rd_rn_rm SQDMULLT_zzz01000101 .. 0 . 011 001 . . @rd_rn_rm +PMULLB 01000101 .. 0 . 011 010 . . @rd_rn_rm +PMULLT 01000101 .. 0 . 011 011 . . @rd_rn_rm SMULLB_zzz 01000101 .. 0 . 011 100 . . @rd_rn_rm SMULLT_zzz 01000101 .. 0 . 011 101 . . @rd_rn_rm UMULLB_zzz 01000101 .. 0 . 011 110 . . @rd_rn_rm diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 0712a25de7..db2081130d 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -5596,6 +5596,28 @@ DO_SVE2_ZZZ_TB(SMULLT_zzz, smull_zzz, true, true) DO_SVE2_ZZZ_TB(UMULLB_zzz, umull_zzz, false, false) DO_SVE2_ZZZ_TB(UMULLT_zzz, umull_zzz, true, true) +static bool do_trans_pmull(DisasContext *s, arg_rrr_esz *a, bool sel) +{ +static gen_helper_gvec_3 * const fns[4] = { +gen_helper_gvec_pmull_q, gen_helper_sve2_pmull_h, +NULL,gen_helper_sve2_pmull_d, +}; +if (a->esz == 0 && !dc_isar_feature(aa64_sve2_pmull128, s)) { +return false; +} +return do_sve2_zzw_ool(s, a, fns[a->esz], sel); +} + +static bool trans_PMULLB(DisasContext *s, arg_rrr_esz *a) +{ +return do_trans_pmull(s, a, false); +} + +static bool trans_PMULLT(DisasContext *s, arg_rrr_esz *a) +{ +return do_trans_pmull(s, a, true); +} + #define DO_SVE2_ZZZ_WTB(NAME, name, SEL2) \ static bool trans_##NAME(DisasContext *s, arg_rrr_esz *a) \ { \ diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index cd58bfb84f..32b1aace3d 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -1378,6 +1378,30 @@ void HELPER(sve2_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc) d[i] = pmull_h(nn, mm); } } + +static uint64_t pmull_d(uint64_t op1, uint64_t op2) +{ +uint64_t result = 0; +int i; + +for (i = 0; i < 32; ++i) { +uint64_t mask = -((op1 >> i) & 1); +result ^= (op2 << i) & mask; +} +return result; +} + +void HELPER(sve2_pmull_d)(void *vd, void *vn, void *vm, uint32_t desc) +{ +intptr_t sel = H4(simd_data(desc)); +intptr_t i, opr_sz = simd_oprsz(desc); +uint32_t *n = vn, *m = vm; +uint64_t *d = vd; + +for (i = 0; i < opr_sz / 8; ++i) { +d[i] = pmull_d(n[2 * i + sel], m[2 * i + sel]); +} +} #endif #define DO_CMP0(NAME, TYPE, OP) \ -- 2.25.1
[PATCH v2 023/100] target/arm: Implement SVE2 saturating add/subtract (predicated)
Signed-off-by: Richard Henderson --- target/arm/helper-sve.h| 54 +++ target/arm/sve.decode | 11 +++ target/arm/sve_helper.c| 194 ++--- target/arm/translate-sve.c | 7 ++ 4 files changed, 210 insertions(+), 56 deletions(-) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index c722bd39ee..be5b0aec5b 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -371,6 +371,60 @@ DEF_HELPER_FLAGS_5(sve2_uminp_zpzz_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve2_uminp_zpzz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqadd_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqadd_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqadd_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqadd_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_uqadd_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqadd_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqadd_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqadd_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_sqsub_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqsub_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqsub_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqsub_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_uqsub_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqsub_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqsub_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqsub_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_suqadd_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_suqadd_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_suqadd_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_suqadd_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_usqadd_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_usqadd_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_usqadd_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_usqadd_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_asr_zpzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_asr_zpzw_h, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 6f091897d1..7a287bd8a6 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1151,3 +1151,14 @@ SMAXP 01000100 .. 010 100 101 ... . . @rdn_pg_rm UMAXP 01000100 .. 010 101 101 ... . . @rdn_pg_rm SMINP 01000100 .. 010 110 101 ... . . @rdn_pg_rm UMINP 01000100 .. 010 111 101 ... . . @rdn_pg_rm + +### SVE2 saturating add/subtract (predicated) + +SQADD_zpzz 01000100 .. 011 000 100 ... . . @rdn_pg_rm +UQADD_zpzz 01000100 .. 011 001 100 ... . . @rdn_pg_rm +SQSUB_zpzz 01000100 .. 011 010 100 ... . . @rdn_pg_rm +UQSUB_zpzz 01000100 .. 011 011 100 ... . . @rdn_pg_rm +SUQADD 01000100 .. 011 100 100 ... . . @rdn_pg_rm +USQADD 01000100 .. 011 101 100 ... . . @rdn_pg_rm +SQSUB_zpzz 01000100 .. 011 110 100 ... . . @rdm_pg_rn # SQSUBR +UQSUB_zpzz 01000100 .. 011 111 100 ... . . @rdm_pg_rn # UQSUBR diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 5427327c11..ba80d24b21 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -678,6 +678,135 @@ DO_ZPZZ(sve2_uhsub_zpzz_h, uint16_t, H1_2, DO_HSUB_BHS) DO_ZPZZ(sve2_uhsub_zpzz_s, uint32_t, H1_4, DO_HSUB_BHS) DO_ZPZZ_D(sve2_uhsub_zpzz_d, uint64_t, DO_HSUB_D) +static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max) +{ +return val >= max ? max : val <= min ? min : val; +} + +#define DO_SQADD_B(n, m) do_sat_bhs((int64_t)n + m, INT8_MIN,
[PATCH v2 015/100] target/arm: Enable SVE2 and some extensions
Sort to the end of the patch series for final commit. Signed-off-by: Richard Henderson --- target/arm/cpu64.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index 778cecc2e6..7389b6e5ab 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -670,6 +670,17 @@ static void aarch64_max_initfn(Object *obj) t = FIELD_DP64(t, ID_AA64MMFR2, CNP, 1); /* TTCNP */ cpu->isar.id_aa64mmfr2 = t; +t = cpu->isar.id_aa64zfr0; +t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1); +t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2); /* PMULL */ +t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1); +t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1); +t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1); +t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1); +t = FIELD_DP64(t, ID_AA64ZFR0, F32MM, 1); +t = FIELD_DP64(t, ID_AA64ZFR0, F64MM, 1); +cpu->isar.id_aa64zfr0 = t; + /* Replicate the same data to the 32-bit id registers. */ u = cpu->isar.id_isar5; u = FIELD_DP32(u, ID_ISAR5, AES, 2); /* AES + PMULL */ -- 2.25.1
[PATCH v2 020/100] target/arm: Implement SVE2 saturating/rounding bitwise shift left (predicated)
Signed-off-by: Richard Henderson --- v2: Shift values are always signed (laurent desnogues). --- target/arm/helper-sve.h| 54 ++ target/arm/sve.decode | 17 + target/arm/sve_helper.c| 78 ++ target/arm/translate-sve.c | 18 + 4 files changed, 167 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index 97abd39af0..53c5c1d3f9 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -172,6 +172,60 @@ DEF_HELPER_FLAGS_5(sve2_uadalp_zpzz_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve2_uadalp_zpzz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_srshl_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_srshl_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_srshl_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_srshl_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_urshl_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_urshl_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_urshl_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_urshl_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_sqshl_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqshl_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqshl_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqshl_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_uqshl_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqshl_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqshl_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqshl_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_sqrshl_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrshl_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrshl_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sqrshl_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_uqrshl_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqrshl_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqrshl_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uqrshl_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_sdiv_zpzz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_sdiv_zpzz_d, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 9788a2b472..2bf3eff1cf 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1115,3 +1115,20 @@ URECPE 01000100 .. 000 000 101 ... . . @rd_pg_rn URSQRTE 01000100 .. 000 001 101 ... . . @rd_pg_rn SQABS 01000100 .. 001 000 101 ... . . @rd_pg_rn SQNEG 01000100 .. 001 001 101 ... . . @rd_pg_rn + +### SVE2 saturating/rounding bitwise shift left (predicated) + +SRSHL 01000100 .. 000 010 100 ... . . @rdn_pg_rm +URSHL 01000100 .. 000 011 100 ... . . @rdn_pg_rm +SRSHL 01000100 .. 000 110 100 ... . . @rdm_pg_rn # SRSHLR +URSHL 01000100 .. 000 111 100 ... . . @rdm_pg_rn # URSHLR + +SQSHL 01000100 .. 001 000 100 ... . . @rdn_pg_rm +UQSHL 01000100 .. 001 001 100 ... . . @rdn_pg_rm +SQSHL 01000100 .. 001 100 100 ... . . @rdm_pg_rn # SQSHLR +UQSHL 01000100 .. 001 101 100 ... . . @rdm_pg_rn # UQSHLR + +SQRSHL 01000100 .. 001 010 100 ... . . @rdn_pg_rm +UQRSHL 01000100 .. 001 011 100 ... . . @rdn_pg_rm +SQRSHL 01000100 .. 001 110 100 ... . . @rdm_pg_rn # SQRSHLR +UQRSHL 01000100 .. 001 111 100 ... . . @rdm_pg_rn # UQRSHLR diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index 250415f366..341ec23491 100644 --- a/target/arm/sve_helper.c +++
[PATCH v2 022/100] target/arm: Implement SVE2 integer pairwise arithmetic
Signed-off-by: Richard Henderson --- v2: Load all inputs before writing any output (laurent desnogues) --- target/arm/helper-sve.h| 45 ++ target/arm/sve.decode | 8 target/arm/sve_helper.c| 76 ++ target/arm/translate-sve.c | 6 +++ 4 files changed, 135 insertions(+) diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index e02c3661c3..c722bd39ee 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -326,6 +326,51 @@ DEF_HELPER_FLAGS_5(sve_sel_zpzz_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve_sel_zpzz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_addp_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_addp_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_addp_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_addp_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_smaxp_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_smaxp_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_smaxp_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_smaxp_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_umaxp_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_umaxp_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_umaxp_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_umaxp_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_sminp_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sminp_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sminp_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_sminp_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2_uminp_zpzz_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uminp_zpzz_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uminp_zpzz_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sve2_uminp_zpzz_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_asr_zpzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_asr_zpzw_h, TCG_CALL_NO_RWG, diff --git a/target/arm/sve.decode b/target/arm/sve.decode index 2d5104b84f..6f091897d1 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -1143,3 +1143,11 @@ SRHADD 01000100 .. 010 100 100 ... . . @rdn_pg_rm URHADD 01000100 .. 010 101 100 ... . . @rdn_pg_rm SHSUB 01000100 .. 010 110 100 ... . . @rdm_pg_rn # SHSUBR UHSUB 01000100 .. 010 111 100 ... . . @rdm_pg_rn # UHSUBR + +### SVE2 integer pairwise arithmetic + +ADDP01000100 .. 010 001 101 ... . . @rdn_pg_rm +SMAXP 01000100 .. 010 100 101 ... . . @rdn_pg_rm +UMAXP 01000100 .. 010 101 101 ... . . @rdn_pg_rm +SMINP 01000100 .. 010 110 101 ... . . @rdn_pg_rm +UMINP 01000100 .. 010 111 101 ... . . @rdn_pg_rm diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index b578556a22..5427327c11 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -681,6 +681,82 @@ DO_ZPZZ_D(sve2_uhsub_zpzz_d, uint64_t, DO_HSUB_D) #undef DO_ZPZZ #undef DO_ZPZZ_D +/* + * Three operand expander, operating on element pairs. + * If the slot I is even, the elements from from VN {I, I+1}. + * If the slot I is odd, the elements from from VM {I-1, I}. + * Load all of the input elements in each pair before overwriting output. + */ +#define DO_ZPZZ_PAIR(NAME, TYPE, H, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \ +{ \ +intptr_t i, opr_sz = simd_oprsz(desc); \ +for (i = 0; i < opr_sz; ) { \ +uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \ +do {\ +TYPE n0 = *(TYPE *)(vn + H(i)); \ +TYPE m0 = *(TYPE *)(vm + H(i)); \ +TYPE n1 = *(TYPE *)(vn
[PATCH v2 019/100] target/arm: Split out saturating/rounding shifts from neon
Split these operations out into a header that can be shared between neon and sve. The "sat" pointer acts both as a boolean for control of saturating behavior and controls the difference in behavior between neon and sve -- QC bit or no QC bit. Widen the shift operand in the new helpers, as the SVE2 insns treat the whole input element as significant. For the neon uses, truncate the shift to int8_t while passing the parameter. Implement right-shift rounding as tmp = src >> (shift - 1); dst = (tmp >> 1) + (tmp & 1); This is the same number of instructions as the current tmp = 1 << (shift - 1); dst = (src + tmp) >> shift; without any possibility of intermediate overflow. Signed-off-by: Richard Henderson --- v2: Widen the shift operand (laurent desnouges) --- target/arm/vec_internal.h | 138 +++ target/arm/neon_helper.c | 507 +++--- 2 files changed, 221 insertions(+), 424 deletions(-) diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h index 00a8277765..372fe76523 100644 --- a/target/arm/vec_internal.h +++ b/target/arm/vec_internal.h @@ -30,4 +30,142 @@ static inline void clear_tail(void *vd, uintptr_t opr_sz, uintptr_t max_sz) } } +static inline int32_t do_sqrshl_bhs(int32_t src, int32_t shift, int bits, +bool round, uint32_t *sat) +{ +if (shift <= -bits) { +/* Rounding the sign bit always produces 0. */ +if (round) { +return 0; +} +return src >> 31; +} else if (shift < 0) { +if (round) { +src >>= -shift - 1; +return (src >> 1) + (src & 1); +} +return src >> -shift; +} else if (shift < bits) { +int32_t val = src << shift; +if (bits == 32) { +if (!sat || val >> shift == src) { +return val; +} +} else { +int32_t extval = sextract32(val, 0, bits); +if (!sat || val == extval) { +return extval; +} +} +} else if (!sat || src == 0) { +return 0; +} + +*sat = 1; +return (1u << (bits - 1)) - (src >= 0); +} + +static inline uint32_t do_uqrshl_bhs(uint32_t src, int32_t shift, int bits, + bool round, uint32_t *sat) +{ +if (shift <= -(bits + round)) { +return 0; +} else if (shift < 0) { +if (round) { +src >>= -shift - 1; +return (src >> 1) + (src & 1); +} +return src >> -shift; +} else if (shift < bits) { +uint32_t val = src << shift; +if (bits == 32) { +if (!sat || val >> shift == src) { +return val; +} +} else { +uint32_t extval = extract32(val, 0, bits); +if (!sat || val == extval) { +return extval; +} +} +} else if (!sat || src == 0) { +return 0; +} + +*sat = 1; +return MAKE_64BIT_MASK(0, bits); +} + +static inline int32_t do_suqrshl_bhs(int32_t src, int32_t shift, int bits, + bool round, uint32_t *sat) +{ +if (src < 0) { +*sat = 1; +return 0; +} +return do_uqrshl_bhs(src, shift, bits, round, sat); +} + +static inline int64_t do_sqrshl_d(int64_t src, int64_t shift, + bool round, uint32_t *sat) +{ +if (shift <= -64) { +/* Rounding the sign bit always produces 0. */ +if (round) { +return 0; +} +return src >> 63; +} else if (shift < 0) { +if (round) { +src >>= -shift - 1; +return (src >> 1) + (src & 1); +} +return src >> -shift; +} else if (shift < 64) { +int64_t val = src << shift; +if (!sat || val >> shift == src) { +return val; +} +} else if (!sat || src == 0) { +return 0; +} + +*sat = 1; +return src < 0 ? INT64_MIN : INT64_MAX; +} + +static inline uint64_t do_uqrshl_d(uint64_t src, int64_t shift, + bool round, uint32_t *sat) +{ +if (shift <= -(64 + round)) { +return 0; +} else if (shift < 0) { +if (round) { +src >>= -shift - 1; +return (src >> 1) + (src & 1); +} +return src >> -shift; +} else if (shift < 64) { +uint64_t val = src << shift; +if (!sat || val >> shift == src) { +return val; +} +} else if (!sat || src == 0) { +return 0; +} + +*sat = 1; +return UINT64_MAX; +} + +static inline int64_t do_suqrshl_d(int64_t src, int64_t shift, + bool round, uint32_t *sat) +{ +if (src < 0) { +*sat = 1; +return 0; +} +return do_uqrshl_d(src, shift, round, sat); +} + #endif /* TARGET_ARM_VEC_INTERNALS_H */ diff --git
[PATCH v2 007/100] target/arm: Clean up 4-operand predicate expansion
Move the check for !S into do__flags, which allows to merge in do_vecop4_p. Split out gen_gvec_fn_ppp without sve_access_check, to mirror gen_gvec_fn_zzz. Signed-off-by: Richard Henderson --- target/arm/translate-sve.c | 111 ++--- 1 file changed, 43 insertions(+), 68 deletions(-) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index b649b9d0b5..6d1a69c365 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -179,31 +179,13 @@ static void do_dupi_z(DisasContext *s, int rd, uint64_t word) } /* Invoke a vector expander on three Pregs. */ -static bool do_vector3_p(DisasContext *s, GVecGen3Fn *gvec_fn, - int esz, int rd, int rn, int rm) +static void gen_gvec_fn_ppp(DisasContext *s, GVecGen3Fn *gvec_fn, +int rd, int rn, int rm) { -if (sve_access_check(s)) { -unsigned psz = pred_gvec_reg_size(s); -gvec_fn(esz, pred_full_reg_offset(s, rd), -pred_full_reg_offset(s, rn), -pred_full_reg_offset(s, rm), psz, psz); -} -return true; -} - -/* Invoke a vector operation on four Pregs. */ -static bool do_vecop4_p(DisasContext *s, const GVecGen4 *gvec_op, -int rd, int rn, int rm, int rg) -{ -if (sve_access_check(s)) { -unsigned psz = pred_gvec_reg_size(s); -tcg_gen_gvec_4(pred_full_reg_offset(s, rd), - pred_full_reg_offset(s, rn), - pred_full_reg_offset(s, rm), - pred_full_reg_offset(s, rg), - psz, psz, gvec_op); -} -return true; +unsigned psz = pred_gvec_reg_size(s); +gvec_fn(MO_64, pred_full_reg_offset(s, rd), +pred_full_reg_offset(s, rn), +pred_full_reg_offset(s, rm), psz, psz); } /* Invoke a vector move on two Pregs. */ @@ -1067,6 +1049,11 @@ static bool do__flags(DisasContext *s, arg_rprr_s *a, int mofs = pred_full_reg_offset(s, a->rm); int gofs = pred_full_reg_offset(s, a->pg); +if (!a->s) { +tcg_gen_gvec_4(dofs, nofs, mofs, gofs, psz, psz, gvec_op); +return true; +} + if (psz == 8) { /* Do the operation and the flags generation in temps. */ TCGv_i64 pd = tcg_temp_new_i64(); @@ -1126,19 +1113,24 @@ static bool trans_AND_(DisasContext *s, arg_rprr_s *a) .fno = gen_helper_sve_and_, .prefer_i64 = TCG_TARGET_REG_BITS == 64, }; -if (a->s) { -return do__flags(s, a, ); -} else if (a->rn == a->rm) { -if (a->pg == a->rn) { -return do_mov_p(s, a->rd, a->rn); -} else { -return do_vector3_p(s, tcg_gen_gvec_and, 0, a->rd, a->rn, a->pg); + +if (!a->s) { +if (!sve_access_check(s)) { +return true; +} +if (a->rn == a->rm) { +if (a->pg == a->rn) { +do_mov_p(s, a->rd, a->rn); +} else { +gen_gvec_fn_ppp(s, tcg_gen_gvec_and, a->rd, a->rn, a->pg); +} +return true; +} else if (a->pg == a->rn || a->pg == a->rm) { +gen_gvec_fn_ppp(s, tcg_gen_gvec_and, a->rd, a->rn, a->rm); +return true; } -} else if (a->pg == a->rn || a->pg == a->rm) { -return do_vector3_p(s, tcg_gen_gvec_and, 0, a->rd, a->rn, a->rm); -} else { -return do_vecop4_p(s, , a->rd, a->rn, a->rm, a->pg); } +return do__flags(s, a, ); } static void gen_bic_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg) @@ -1162,13 +1154,14 @@ static bool trans_BIC_(DisasContext *s, arg_rprr_s *a) .fno = gen_helper_sve_bic_, .prefer_i64 = TCG_TARGET_REG_BITS == 64, }; -if (a->s) { -return do__flags(s, a, ); -} else if (a->pg == a->rn) { -return do_vector3_p(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm); -} else { -return do_vecop4_p(s, , a->rd, a->rn, a->rm, a->pg); + +if (!a->s && a->pg == a->rn) { +if (sve_access_check(s)) { +gen_gvec_fn_ppp(s, tcg_gen_gvec_andc, a->rd, a->rn, a->rm); +} +return true; } +return do__flags(s, a, ); } static void gen_eor_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg) @@ -1192,11 +1185,7 @@ static bool trans_EOR_(DisasContext *s, arg_rprr_s *a) .fno = gen_helper_sve_eor_, .prefer_i64 = TCG_TARGET_REG_BITS == 64, }; -if (a->s) { -return do__flags(s, a, ); -} else { -return do_vecop4_p(s, , a->rd, a->rn, a->rm, a->pg); -} +return do__flags(s, a, ); } static void gen_sel_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg) @@ -1222,11 +1211,11 @@ static bool trans_SEL_(DisasContext *s, arg_rprr_s *a) .fno = gen_helper_sve_sel_, .prefer_i64 = TCG_TARGET_REG_BITS == 64, }; + if