[Bug target/106060] Inefficient constant broadcast on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060 Roger Sayle changed: What|Removed |Added Resolution|--- |FIXED Known to work||15.0 Status|ASSIGNED|RESOLVED --- Comment #7 from Roger Sayle --- This has now been fixed on mainline for GCC 15. There are still improvements that can be made to vector constant materialization/initialization on x86_64, but the issues/ideas described in this bugzilla PR are all now implemented. Thanks.
[Bug target/106060] Inefficient constant broadcast on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060 --- Comment #6 from GCC Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:79649a5dcd81bc05c0ba591068c9075de43bd417 commit r15-222-g79649a5dcd81bc05c0ba591068c9075de43bd417 Author: Roger Sayle Date: Tue May 7 07:14:40 2024 +0100 PR target/106060: Improved SSE vector constant materialization on x86. This patch resolves PR target/106060 by providing efficient methods for materializing/synthesizing special "vector" constants on x86. Currently there are three methods of materializing a vector constant; the most general is to load a vector from the constant pool, secondly "duplicated" constants can be synthesized by moving an integer between units and broadcasting (of shuffling it), and finally the special cases of the all-zeros vector and all-ones vectors can be loaded via a single SSE instruction. This patch handle additional cases that can be synthesized in two instructions, loading an all-ones vector followed by another SSE instruction. Following my recent patch for PR target/112992, there's conveniently a single place in i386-expand.cc where these special cases can be handled. Two examples are given in the original bugzilla PR for 106060. __m256i should_be_cmpeq_abs () { return _mm256_set1_epi8 (1); } is now generated (with -O3 -march=x86-64-v3) as: vpcmpeqd%ymm0, %ymm0, %ymm0 vpabsb %ymm0, %ymm0 ret and __m256i should_be_cmpeq_add () { return _mm256_set1_epi8 (-2); } is now generated as: vpcmpeqd%ymm0, %ymm0, %ymm0 vpaddb %ymm0, %ymm0, %ymm0 ret 2024-05-07 Roger Sayle Hongtao Liu gcc/ChangeLog PR target/106060 * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New. (struct ix86_vec_bcast_map_simode_t): New type for table below. (ix86_vec_bcast_map_simode): Table of SImode constants that may be efficiently synthesized by a ix86_vec_bcast_alg method. (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch. (ix86_vector_duplicate_simode_const): Efficiently synthesize V4SImode and V8SImode constants that duplicate special constants. (ix86_vector_duplicate_value): Attempt to synthesize "special" vector constants using ix86_vector_duplicate_simode_const. * config/i386/i386.cc (ix86_rtx_costs) : ABS of a vector integer mode costs with a single SSE instruction. gcc/testsuite/ChangeLog PR target/106060 * gcc.target/i386/auto-init-8.c: Update test case. * gcc.target/i386/avx512fp16-13.c: Likewise. * gcc.target/i386/pr100865-9a.c: Likewise. * gcc.target/i386/pr101796-1.c: Likewise. * gcc.target/i386/pr106060-1.c: New test case. * gcc.target/i386/pr106060-2.c: Likewise. * gcc.target/i386/pr106060-3.c: Likewise. * gcc.target/i386/pr70314.c: Update test case. * gcc.target/i386/vect-shiftv4qi.c: Likewise. * gcc.target/i386/vect-shiftv8qi.c: Likewise.
[Bug target/106060] Inefficient constant broadcast on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060 Roger Sayle changed: What|Removed |Added Target Milestone|--- |15.0 --- Comment #5 from Roger Sayle --- For the record (so it doesn't get lost) the final patch was posted at https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643973.html and approved (for stage 1) at https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643996.html
[Bug target/106060] Inefficient constant broadcast on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060 Roger Sayle changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |roger at nextmovesoftware dot com Status|NEW |ASSIGNED --- Comment #4 from Roger Sayle --- I have a patch for better materialization of vector constants (including cmpeq+abs and cmpeq+abs), but now that we've transitioned for stage 3 (bug fixing) to stage 4 (regression fixing), this will have to wait for GCC 15's stage 1. I'm happy to post the patch here or to gcc-patches, if anyone would like to pre-review it and/or benchmark the proposed changes.
[Bug target/106060] Inefficient constant broadcast on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug target/106060] Inefficient constant broadcast on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2023-05-17 --- Comment #3 from Andrew Pinski --- Confirmed, I think HJL's patch definitely improves things. Though I wonder if they could be improved further.
[Bug target/106060] Inefficient constant broadcast on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060 --- Comment #2 from H.J. Lu --- Created attachment 53196 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53196&action=edit A patch This generates: : 0: b8 7b 00 00 00 mov$0x7b,%eax 5: c5 f9 6e c0 vmovd %eax,%xmm0 9: c4 e2 7d 78 c0 vpbroadcastb %xmm0,%ymm0 e: c3 ret f: 90 nop 0010 : 10: b8 01 00 00 00 mov$0x1,%eax 15: c5 f9 6e c0 vmovd %eax,%xmm0 19: c4 e2 7d 78 c0 vpbroadcastb %xmm0,%ymm0 1e: c3 ret 1f: 90 nop 0020 : 20: b8 fe ff ff ff mov$0xfffe,%eax 25: c5 f9 6e c0 vmovd %eax,%xmm0 29: c4 e2 7d 78 c0 vpbroadcastb %xmm0,%ymm0 2e: c3 ret
[Bug target/106060] Inefficient constant broadcast on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1 from Hongtao.liu --- I remember it's on purpose by r12-1958-gedafb35bdadf30, related PR100865.