[Bug target/106060] Inefficient constant broadcast on x86_64

2024-05-12 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Roger Sayle  changed:

   What|Removed |Added

 Resolution|--- |FIXED
  Known to work||15.0
 Status|ASSIGNED|RESOLVED

--- Comment #7 from Roger Sayle  ---
This has now been fixed on mainline for GCC 15.  There are still improvements
that can be made to vector constant materialization/initialization on x86_64,
but the issues/ideas described in this bugzilla PR are all now implemented. 
Thanks.

[Bug target/106060] Inefficient constant broadcast on x86_64

2024-05-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

--- Comment #6 from GCC Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:79649a5dcd81bc05c0ba591068c9075de43bd417

commit r15-222-g79649a5dcd81bc05c0ba591068c9075de43bd417
Author: Roger Sayle 
Date:   Tue May 7 07:14:40 2024 +0100

PR target/106060: Improved SSE vector constant materialization on x86.

This patch resolves PR target/106060 by providing efficient methods for
materializing/synthesizing special "vector" constants on x86.  Currently
there are three methods of materializing a vector constant; the most
general is to load a vector from the constant pool, secondly "duplicated"
constants can be synthesized by moving an integer between units and
broadcasting (of shuffling it), and finally the special cases of the
all-zeros vector and all-ones vectors can be loaded via a single SSE
instruction.   This patch handle additional cases that can be synthesized
in two instructions, loading an all-ones vector followed by another SSE
instruction.  Following my recent patch for PR target/112992, there's
conveniently a single place in i386-expand.cc where these special cases
can be handled.

Two examples are given in the original bugzilla PR for 106060.

__m256i should_be_cmpeq_abs ()
{
  return _mm256_set1_epi8 (1);
}

is now generated (with -O3 -march=x86-64-v3) as:

vpcmpeqd%ymm0, %ymm0, %ymm0
vpabsb  %ymm0, %ymm0
ret

and

__m256i should_be_cmpeq_add ()
{
  return _mm256_set1_epi8 (-2);
}

is now generated as:

vpcmpeqd%ymm0, %ymm0, %ymm0
vpaddb  %ymm0, %ymm0, %ymm0
ret

2024-05-07  Roger Sayle  
Hongtao Liu  

gcc/ChangeLog
PR target/106060
* config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New.
(struct ix86_vec_bcast_map_simode_t): New type for table below.
(ix86_vec_bcast_map_simode): Table of SImode constants that may
be efficiently synthesized by a ix86_vec_bcast_alg method.
(ix86_vec_bcast_map_simode_cmp): New comparator for bsearch.
(ix86_vector_duplicate_simode_const): Efficiently synthesize
V4SImode and V8SImode constants that duplicate special constants.
(ix86_vector_duplicate_value): Attempt to synthesize "special"
vector constants using ix86_vector_duplicate_simode_const.
* config/i386/i386.cc (ix86_rtx_costs) : ABS of a
vector integer mode costs with a single SSE instruction.

gcc/testsuite/ChangeLog
PR target/106060
* gcc.target/i386/auto-init-8.c: Update test case.
* gcc.target/i386/avx512fp16-13.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr101796-1.c: Likewise.
* gcc.target/i386/pr106060-1.c: New test case.
* gcc.target/i386/pr106060-2.c: Likewise.
* gcc.target/i386/pr106060-3.c: Likewise.
* gcc.target/i386/pr70314.c: Update test case.
* gcc.target/i386/vect-shiftv4qi.c: Likewise.
* gcc.target/i386/vect-shiftv8qi.c: Likewise.

[Bug target/106060] Inefficient constant broadcast on x86_64

2024-02-16 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Roger Sayle  changed:

   What|Removed |Added

   Target Milestone|--- |15.0

--- Comment #5 from Roger Sayle  ---
For the record (so it doesn't get lost) the final patch was posted at
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643973.html
and approved (for stage 1) at
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643996.html

[Bug target/106060] Inefficient constant broadcast on x86_64

2024-01-14 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Roger Sayle  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |roger at 
nextmovesoftware dot com
 Status|NEW |ASSIGNED

--- Comment #4 from Roger Sayle  ---
I have a patch for better materialization of vector constants (including
cmpeq+abs  and cmpeq+abs), but now that we've transitioned for stage 3 (bug
fixing) to stage 4 (regression fixing), this will have to wait for GCC 15's
stage 1.  I'm happy to post the patch here or to gcc-patches, if anyone would
like to pre-review it and/or benchmark the proposed changes.

[Bug target/106060] Inefficient constant broadcast on x86_64

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug target/106060] Inefficient constant broadcast on x86_64

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-05-17

--- Comment #3 from Andrew Pinski  ---
Confirmed, I think HJL's patch definitely improves things. Though I wonder if
they could be improved further.

[Bug target/106060] Inefficient constant broadcast on x86_64

2022-06-23 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

--- Comment #2 from H.J. Lu  ---
Created attachment 53196
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53196&action=edit
A patch

This generates:

 :
   0:   b8 7b 00 00 00  mov$0x7b,%eax
   5:   c5 f9 6e c0 vmovd  %eax,%xmm0
   9:   c4 e2 7d 78 c0  vpbroadcastb %xmm0,%ymm0
   e:   c3  ret
   f:   90  nop

0010 :
  10:   b8 01 00 00 00  mov$0x1,%eax
  15:   c5 f9 6e c0 vmovd  %eax,%xmm0
  19:   c4 e2 7d 78 c0  vpbroadcastb %xmm0,%ymm0
  1e:   c3  ret
  1f:   90  nop

0020 :
  20:   b8 fe ff ff ff  mov$0xfffe,%eax
  25:   c5 f9 6e c0 vmovd  %eax,%xmm0
  29:   c4 e2 7d 78 c0  vpbroadcastb %xmm0,%ymm0
  2e:   c3  ret

[Bug target/106060] Inefficient constant broadcast on x86_64

2022-06-23 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #1 from Hongtao.liu  ---
I remember it's on purpose by r12-1958-gedafb35bdadf30, related PR100865.