[Bug target/103861] [i386] vectorize v2qi vectors

2023-10-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #15 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:7eed861e8ca3f533e56dea6348573caa09f16f5e

commit r14-4964-g7eed861e8ca3f533e56dea6348573caa09f16f5e
Author: liuhongt 
Date:   Mon Oct 23 13:40:10 2023 +0800

Support vec_cmpmn/vcondmn for v2hf/v4hf.

gcc/ChangeLog:

PR target/103861
* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle
V2HF/V2BF/V4HF/V4BFmode.
* config/i386/i386.cc (ix86_get_mask_mode): Return QImode when
data_mode is V4HF/V2HFmode.
* config/i386/mmx.md (vec_cmpv4hfqi): New expander.
(vcond_mask_v4hi): Ditto.
(vcond_mask_qi): Ditto.
(vec_cmpv2hfqi): Ditto.
(vcond_mask_v2hi): Ditto.
(mmx_plendvb_): Add 2 combine splitters after the
patterns.
(mmx_pblendvb_v8qi): Ditto.
(v2hi3): Add a combine splitter after the pattern.
(3): Ditto.
(v8qi3): Ditto.
(3): Ditto.
* config/i386/sse.md (vcond): Merge this with ..
(vcond): .. this into ..
(vcond): .. this,
and extend to V8BF/V16BF/V32BFmode.

gcc/testsuite/ChangeLog:

* g++.target/i386/part-vect-vcondhf.C: New test.
* gcc.target/i386/part-vect-vec_cmphf.c: New test.

[Bug target/103861] [i386] vectorize v2qi vectors

2022-08-18 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

Uroš Bizjak  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #14 from Uroš Bizjak  ---
Let's say this is done now.

[Bug target/103861] [i386] vectorize v2qi vectors

2022-01-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:7a7d8c3f6167fd45658ddbfa32adcfd2acc98eb4

commit r12-6562-g7a7d8c3f6167fd45658ddbfa32adcfd2acc98eb4
Author: Uros Bizjak 
Date:   Thu Jan 13 20:48:18 2022 +0100

i386: Introduce V2QImode vectorized shifts [PR103861]

Add V2QImode shift operations and split them to synthesized
double HI/LO QImode operations with integer registers.

Also robustify arithmetic split patterns.

2022-01-13  Uroš Bizjak  

gcc/ChangeLog:

PR target/103861
* config/i386/i386.md (*ashlqi_ext_2): New insn pattern.
(*qi_ext_2): Ditto.
* config/i386/mmx.md (v2qi):
New insn_and_split pattern.

gcc/testsuite/ChangeLog:

PR target/103861
* gcc.target/i386/pr103861.c (shl,ashr,lshr): New tests.

[Bug target/103861] [i386] vectorize v2qi vectors

2022-01-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:b5193e352981fab8441c600b0a50efe1f30c1d30

commit r12-6533-gb5193e352981fab8441c600b0a50efe1f30c1d30
Author: Uros Bizjak 
Date:   Wed Jan 12 19:59:57 2022 +0100

i386: Add CC clobber and splits for 32-bit vector mode logic insns
[PR100673, PR103861]

Add CC clobber to 32-bit vector mode logic insns to allow variants with
general-purpose registers.  Also improve ix86_sse_movcc to emit insn with
CC clobber for narrow vector modes in order to re-enable conditional moves
for 16-bit and 32-bit narrow vector modes with -msse2.

2022-01-12  Uroš Bizjak  

gcc/ChangeLog:

PR target/100637
PR target/103861
* config/i386/i386-expand.c (ix86_emit_vec_binop): New static
function.
(ix86_expand_sse_movcc): Use ix86_emit_vec_binop instead of
gen_rtx_X
when constructing vector logic RTXes.
(expand_vec_perm_pshufb2): Ditto.
* config/i386/mmx.md (negv2qi): Disparage GPR alternative a bit.
(v2qi3): Ditto.
(vcond): Re-enable for TARGET_SSE2.
(vcondu): Ditto.
(vcond_mask_): Ditto.
(one_cmpl2): Remove expander.
(one_cmpl2): Rename from one_cmplv2qi.
Use VI_16_32 mode iterator.
(one_cmpl2 splitters): Use VI_16_32 mode iterator.
Use lowpart_subreg instead of gen_lowpart to create subreg.
(*andnot3): Merge from "*andnot" and
"*andnotv2qi3" insn patterns using VI_16_32 mode iterator.
Disparage GPR alternative a bit.  Add CC clobber.
(*andnot3 splitters): Use VI_16_32 mode iterator.
Use lowpart_subreg instead of gen_lowpart to create subreg.
(*3): Merge from
"*" and "*v2qi3" insn
patterns
using VI_16_32 mode iterator.  Disparage GPR alternative a bit.
Add CC clobber.
(*3 splitters):Use VI_16_32 mode
iterator.  Use lowpart_subreg instead of gen_lowpart to create
subreg.

gcc/testsuite/ChangeLog:

PR target/100637
PR target/103861
* g++.target/i386/pr100637-1b.C (dg-options):
Use -msse2 instead of -msse4.1.
* g++.target/i386/pr100637-1w.C (dg-options): Ditto.
* g++.target/i386/pr103861-1.C (dg-options): Ditto.
* gcc.target/i386/pr100637-4b.c (dg-options): Ditto.
* gcc.target/i386/pr103861-4.c (dg-options): Ditto.
* gcc.target/i386/pr100637-1b.c: Remove scan-assembler
directives for logic instructions.
* gcc.target/i386/pr100637-1w.c: Ditto.
* gcc.target/i386/warn-vect-op-2.c:
Update dg-warning for vector logic operation.

[Bug target/103861] [i386] vectorize v2qi vectors

2022-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:820ac79e8448ad6c631e1387ba51a93dcf2b4e89

commit r12-6488-g820ac79e8448ad6c631e1387ba51a93dcf2b4e89
Author: Uros Bizjak 
Date:   Tue Jan 11 19:23:15 2022 +0100

i386: Introduce V2QImode vector cmove for -msse4.1 [PR103861]

This patch also moves V2HI and V4QImode vector conditional moves
to SSE4.1 targets.  Vector cmoves are implemented with SSE logic functions
without -msse4.1, and they are hardly worthwile for narrow vector modes.
More important, we would like to keep vector logic functions for GPR
registers, and the current RTX description of 32-bit vector modes logic
insns does not include the necessary CC reg clobber.  Solve these issues by
restricting vector cmove insns for these modes to -msse4.1, where logic
instructions are avoided, and pblend insn is used instead.

A follow-up patch will add clobbers and necessary splits to 32-bit
vector mode logic insns, and in a future patch, ix86_sse_movcc will be
improved to use expand_simple_{unop,binop} to emit logic insns, allowing
us to re-enable 16-bit and 32-bit narrow vector cmoves for -msse2.

2022-01-11  Uroš Bizjak  

gcc/ChangeLog:

PR target/103861
* config/i386/mmx.md (vcond):
Use VI_16_32 mode iterator.  Enable for TARGET_SSE4_1.
(vcondu): Ditto.
(vcond_mask_): Ditto.
(mmx_pblendvb_v8qi): Rename from mmx_pblendvb64.
(mmx_pblendvb_): Rename from mmx_pblendvb32.
Use VI_16_32 mode iterator.
* config/i386/i386-expand.c (ix86_expand_sse_movcc):
Update for rename.  Handle V2QImode.
(expand_vec_perm_blend): Update for rename.

gcc/testsuite/ChangeLog:

PR target/103861
* g++.target/i386/pr100637-1b.C (dg-options):
Use -msse4 instead of -msse2.
* g++.target/i386/pr100637-1w.C (dg-options): Ditto.
* g++.target/i386/pr103861-1.C: New test.
* gcc.target/i386/pr100637-4b.c (dg-options):
Use -msse4 instead of -msse2.
* gcc.target/i386/pr103861-4.c: New test.

[Bug target/103861] [i386] vectorize v2qi vectors

2022-01-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:04a745556021b7a1c6e81a41d0a12b60a4d9475d

commit r12-6426-g04a745556021b7a1c6e81a41d0a12b60a4d9475d
Author: Uros Bizjak 
Date:   Mon Jan 10 20:59:02 2022 +0100

i386: Introduce V2QImode vector compares [PR103861]

Add V2QImode vector compares with SSE registers.

2022-01-10  Uroš Bizjak  

gcc/ChangeLog:

PR target/103861
* config/i386/i386-expand.c (ix86_expand_int_sse_cmp):
Handle V2QImode.
* config/i386/mmx.md (3):
Use VI1_16_32 mode iterator.
(*eq3): Ditto.
(*gt3): Ditto.
(*xop_maskcmp3): Ditto.
(*xop_maskcmp_uns3): Ditto.
(vec_cmp): Ditto.
(vec_cmpu): Ditto.

gcc/testsuite/ChangeLog:

PR target/103861
* gcc.target/i386/pr103861-2.c: New test.

[Bug target/103861] [i386] vectorize v2qi vectors

2022-01-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:c166632bd22d7da66354121502019fc9c92ef07f

commit r12-6273-gc166632bd22d7da66354121502019fc9c92ef07f
Author: Uros Bizjak 
Date:   Wed Jan 5 23:16:34 2022 +0100

i386: Introduce V2QImode minmax, abs and uavgv2hi3_ceil [PR103861]

Add V2QImode minmax, abs and uavxv2qi3_ceil operations with SSE registers.

2022-01-05  Uroš Bizjak  

gcc/ChangeLog:

PR target/103861
* config/i386/mmx.md (VI_16_32): New mode iterator.
(VI1_16_32): Ditto.
(mmxvecsize): Handle V2QI mode.
(3): Rename from v4qi3.
Use VI1_16_32 mode iterator.
(3): Rename from v4qi3.
Use VI1_16_32 mode iterator.
(abs2): Use VI_16_32 mode iterator.
(uavgv2qi3_ceil): New insn pattern.

gcc/testsuite/ChangeLog:

PR target/103861
* gcc.target/i386/pr103861-3.c: New test.
* g++.dg/vect/slp-pr98855.cc (dg-final): Check that
no vectorization using SLP was performed.

[Bug target/103861] [i386] vectorize v2qi vectors

2022-01-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:708b87dcb6e48cb48d170a4b3625088995377a5c

commit r12-6215-g708b87dcb6e48cb48d170a4b3625088995377a5c
Author: Uros Bizjak 
Date:   Tue Jan 4 19:41:47 2022 +0100

i386: Introduce V2QImode vectorized logic [PR103861]

Add V2QImode logic operations with SSE and GP registers and split
them to V4QImode SSE instructions or SImode GP instructions.

The patch also fixes PR target/103900.

2022-01-04  Uroš Bizjak  

gcc/ChangeLog:

PR target/103861
* config/i386/mmx.md (one_cmplv2qi3): New insn pattern.
(one_cmplv2qi3 splitters): New post-reload splitters.
(*andnotv2qi3): New insn pattern.
(andnotv2qi3 splitters): New post-reload splitters.
(v2qi3): New insn pattern.
(v2qi3 splitters): New post-reload splitters.

gcc/testsuite/ChangeLog:

PR target/103861
* gcc.target/i386/warn-vect-op-2.c: Adjust warnings.
* gcc.target/i386/pr103900.c: New test.

[Bug target/103861] [i386] vectorize v2qi vectors

2022-01-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #7 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #6)
> Not fully fixed I guess?

Not yet. I have a bunch of follow-up patches for various operations.

[Bug target/103861] [i386] vectorize v2qi vectors

2022-01-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #6 from Richard Biener  ---
Not fully fixed I guess?

[Bug target/103861] [i386] vectorize v2qi vectors

2022-01-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:9ff206d3865df5cb8407490aa9481029beac087f

commit r12-6173-g9ff206d3865df5cb8407490aa9481029beac087f
Author: Uros Bizjak 
Date:   Sun Jan 2 21:12:10 2022 +0100

i386: Introduce V2QImode vectorized arithmetic [PR103861]

This patch adds basic V2QImode infrastructure and V2QImode arithmetic
operations (plus, minus and neg).  The patched compiler can emit SSE
vectorized QImode operations (e.g. PADDB) with partial QImode vector,
and also synthesized double HI/LO QImode operations with integer registers.

The testcase:

typedef char __v2qi __attribute__ ((__vector_size__ (2)));
__v2qi plus  (__v2qi a, __v2qi b) { return a + b; };

compiles with -O2 to:

movl%edi, %edx
movl%esi, %eax
addb%sil, %dl
addb%ah, %dh
movl%edx, %eax
ret

which is much better than what the unpatched compiler produces:

movl%edi, %eax
movl%esi, %edx
xorl%ecx, %ecx
movb%dil, %cl
movsbl  %dh, %edx
movsbl  %ah, %eax
addl%edx, %eax
addb%sil, %cl
movb%al, %ch
movl%ecx, %eax
ret

The V2QImode vectorization does not require vector registers, so it can
be enabled by default also for 32-bit targets without SSE.

The patch also enables vectorized V2QImode sign/zero extends.

2021-12-30  Uroš Bizjak  

gcc/ChangeLog:

PR target/103861
* config/i386/i386.h (VALID_SSE2_REG_MODE): Add V2QImode.
(VALID_INT_MODE_P): Ditto.
* config/i386/i386.c (ix86_secondary_reload): Handle
V2QImode reloads from SSE register to memory.
(vector_mode_supported_p): Always return true for V2QImode.
* config/i386/i386.md (*subqi_ext_2): New insn pattern.
(*negqi_ext_2): Ditto.
* config/i386/mmx.md (movv2qi): New expander.
(movmisalignv2qi): Ditto.
(*movv2qi_internal): New insn pattern.
(*pushv2qi2): Ditto.
(negv2qi2 and splitters): Ditto.
(v2qi3 and splitters): Ditto.

gcc/testsuite/ChangeLog:

PR target/103861
* gcc.dg/store_merging_18.c (dg-options): Add -fno-tree-vectorize.
* gcc.dg/store_merging_29.c (dg-options): Ditto.
* gcc.target/i386/pr103861.c: New test.
* gcc.target/i386/pr92658-avx512vl.c (dg-final):
Remove vpmovqb scan-assembler xfail.
* gcc.target/i386/pr92658-sse4.c (dg-final):
Remove pmovzxbq scan-assembler xfail.
* gcc.target/i386/pr92658-sse4-2.c (dg-final):
Remove pmovsxbq scan-assembler xfail.
* gcc.target/i386/warn-vect-op-2.c (dg-warning): Adjust warnings.

[Bug target/103861] [i386] vectorize v2qi vectors

2021-12-29 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Last reconfirmed||2021-12-29
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Severity|normal  |enhancement

--- Comment #4 from Andrew Pinski  ---
Confirmed.

[Bug target/103861] [i386] vectorize v2qi vectors

2021-12-29 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #3 from Uroš Bizjak  ---
The patched compiler compiles the testcase from Comment #0 on x86_64 with -O2
to:

plus:
movl%edi, %edx
movl%esi, %eax
addb%sil, %dl
addb%ah, %dh
movl%edx, %eax
ret

and the testcase from Comment #1 to:

foo:
movzwl  a(%rip), %edx
movzwl  b(%rip), %eax
addb%dl, %al
addb%dh, %ah
movw%ax, r(%rip)
ret

Some additional examples:

char r[2], a[2], b[2];

void maxs (void)
{
  int i;

  for (i = 0; i < 2; i++)
r[i] = a[i] > b[i] ? a[i] : b[i];
}

compiles with -O2 -msse4 to:

maxs:
pinsrw  $0, b(%rip), %xmm0
pinsrw  $0, a(%rip), %xmm1
pmaxsb  %xmm1, %xmm0
pextrw  $0, %xmm0, r(%rip)
ret

[Bug target/103861] [i386] vectorize v2qi vectors

2021-12-29 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #2 from Uroš Bizjak  ---
Created attachment 52087
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52087=edit
Protorypw patch to vectorize with v2qi vectors

Patch that implmenents V2QI moves, logic and basic arithmetic operations.

[Bug target/103861] [i386] vectorize v2qi vectors

2021-12-29 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861

--- Comment #1 from Uroš Bizjak  ---
Also:

char r[2], a[2], b[2];

void foo (void)
{
  int i;

  for (i = 0; i < 2; i++)
r[i] = a[i] + b[i];
}