[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675

Hongtao Liu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #9 from Hongtao Liu  ---
Thanks to Jakub, fixed.

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:32a3f46ca543726196371a6f2a5d06feb31aa92d

commit r15-5754-g32a3f46ca543726196371a6f2a5d06feb31aa92d
Author: Jakub Jelinek 
Date:   Thu Nov 28 14:54:42 2024 +0100

testsuite: Fix up pr116675.c test [PR116675]

The test uses dg-do run and scan-assembler* at the same time,
that obviously doesn't work when pr116675.s isn't created at all,
so one gets
PASS: gcc.target/i386/pr116675.c execution test
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pand 4
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pandn 4
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times por 4
The usual way to handle that is adding -save-temps option.

The test FAILs after that change though, for simple reason, the pand
regex doesn't match just pand instructions, but also the pandn ones.

I've added \t there to make sure it matches only pand.

Though, wonder if it wouldn't be safer to split the test into two,
one with just the 4 functions (why noinline, noclone rather than
noipa, btw?), that one would be dg-do compile and have the scan-assembler*
directives, and then another one which includes the first one and is
dg-do run and contains the runtime checking of those.

In any case, I've committed this as obvious.

2024-11-28  Jakub Jelinek  

PR target/116675
* gcc.target/i386/pr116675.c: Add -save-temps to dg-options.
Scan for pand\t rather than pand.

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675

--- Comment #7 from Hongtao Liu  ---
(In reply to Rainer Orth from comment #6)
> The test is broken:
> 
> +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pand 4
> +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pandn 4
> +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times por 4
> 
> (seen on 32 and 64-bit Solaris/x86).
> 
> The log shows
> 
> gcc.target/i386/pr116675.c: output file does not exist
> 
> It seems the test needs -save-temps so the scan-assembler-times has something
> to work with...

Yes, thanks for the report.

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-27 Thread ro at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675

Rainer Orth  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 CC||ro at gcc dot gnu.org
 Ever confirmed|0   |1
 Resolution|FIXED   |---
   Last reconfirmed||2024-11-27

--- Comment #6 from Rainer Orth  ---
The test is broken:

+UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pand 4
+UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pandn 4
+UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times por 4

(seen on 32 and 64-bit Solaris/x86).

The log shows

gcc.target/i386/pr116675.c: output file does not exist

It seems the test needs -save-temps so the scan-assembler-times has something
to work with...

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675

Hongtao Liu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Hongtao Liu  ---
Fixed in GCC15.

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675

--- Comment #4 from GCC Commits  ---
The master branch has been updated by Lili Cui :

https://gcc.gnu.org/g:60b708a9c878aff9a76ec0d446ae63e6527327a6

commit r15-5666-g60b708a9c878aff9a76ec0d446ae63e6527327a6
Author: Cui, Lili 
Date:   Tue Nov 26 15:10:23 2024 +0800

Optimize 128-bit vector permutation with pand, pandn and por.

This patch introduces a new subroutine in ix86_expand_vec_perm_const_1.
On x86, use mixed constant permutation for V8HImode and V16QImode when
SSE2 is supported. This patch handles certain vector shuffle operations
more efficiently using pand, pandn, and por. This change is intended to
improve assembly code generation for configurations that support SSE2.

gcc/ChangeLog:

PR target/116675
* config/i386/i386-expand.cc (expand_vec_perm_pand_pandn_por):
New subroutine.
(ix86_expand_vec_perm_const_1): Call
expand_vec_perm_pand_pandn_por.

gcc/testsuite/ChangeLog:

PR target/116675
* gcc.target/i386/pr116675.c: New test.

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-09-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675

Hongtao Liu  changed:

   What|Removed |Added

 CC||liuhongt at gcc dot gnu.org

--- Comment #3 from Hongtao Liu  ---
 5032(define_expand "vcond_mask_" 
 5033  [(set (match_operand:VI_128 0 "register_operand")
 5034(vec_merge:VI_128  
 5035  (match_operand:VI_128 1 "vector_operand")
 5036  (match_operand:VI_128 2 "nonimm_or_0_operand")   
 5037  (match_operand: 3 "register_operand")))]  
 5038  "TARGET_SSE2"
 5039{  
 5040  ix86_expand_sse_movcc (operands[0], operands[3], 
 5041 operands[1], operands[2]);
 5042  DONE;
 5043}) 

Yes, a blend shoud be same as vcond_mask.

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-09-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675

--- Comment #2 from Richard Biener  ---
Same for V16QImode.  It works for V4SImode using

shufps  $216, %xmm1, %xmm0
pshufd  $216, %xmm0, %xmm0

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-09-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675

Richard Biener  changed:

   What|Removed |Added

 Blocks||53947
 Target||x86_64-*-*
   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
This causes issues with SLP vectorization multi-operator groups where we expect
to be able to blend two aribtrary vectors (at one point we've open-coded the
masking and OR but transitioned to using permutes).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations