[Bug target/116675] No blend constant permute for V8HImode with just SSE2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED|RESOLVED --- Comment #9 from Hongtao Liu --- Thanks to Jakub, fixed.
[Bug target/116675] No blend constant permute for V8HImode with just SSE2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 --- Comment #8 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:32a3f46ca543726196371a6f2a5d06feb31aa92d commit r15-5754-g32a3f46ca543726196371a6f2a5d06feb31aa92d Author: Jakub Jelinek Date: Thu Nov 28 14:54:42 2024 +0100 testsuite: Fix up pr116675.c test [PR116675] The test uses dg-do run and scan-assembler* at the same time, that obviously doesn't work when pr116675.s isn't created at all, so one gets PASS: gcc.target/i386/pr116675.c execution test gcc.target/i386/pr116675.c: output file does not exist UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pand 4 gcc.target/i386/pr116675.c: output file does not exist UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pandn 4 gcc.target/i386/pr116675.c: output file does not exist UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times por 4 The usual way to handle that is adding -save-temps option. The test FAILs after that change though, for simple reason, the pand regex doesn't match just pand instructions, but also the pandn ones. I've added \t there to make sure it matches only pand. Though, wonder if it wouldn't be safer to split the test into two, one with just the 4 functions (why noinline, noclone rather than noipa, btw?), that one would be dg-do compile and have the scan-assembler* directives, and then another one which includes the first one and is dg-do run and contains the runtime checking of those. In any case, I've committed this as obvious. 2024-11-28 Jakub Jelinek PR target/116675 * gcc.target/i386/pr116675.c: Add -save-temps to dg-options. Scan for pand\t rather than pand.
[Bug target/116675] No blend constant permute for V8HImode with just SSE2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 --- Comment #7 from Hongtao Liu --- (In reply to Rainer Orth from comment #6) > The test is broken: > > +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pand 4 > +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pandn 4 > +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times por 4 > > (seen on 32 and 64-bit Solaris/x86). > > The log shows > > gcc.target/i386/pr116675.c: output file does not exist > > It seems the test needs -save-temps so the scan-assembler-times has something > to work with... Yes, thanks for the report.
[Bug target/116675] No blend constant permute for V8HImode with just SSE2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 Rainer Orth changed: What|Removed |Added Status|RESOLVED|REOPENED CC||ro at gcc dot gnu.org Ever confirmed|0 |1 Resolution|FIXED |--- Last reconfirmed||2024-11-27 --- Comment #6 from Rainer Orth --- The test is broken: +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pand 4 +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pandn 4 +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times por 4 (seen on 32 and 64-bit Solaris/x86). The log shows gcc.target/i386/pr116675.c: output file does not exist It seems the test needs -save-temps so the scan-assembler-times has something to work with...
[Bug target/116675] No blend constant permute for V8HImode with just SSE2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #5 from Hongtao Liu --- Fixed in GCC15.
[Bug target/116675] No blend constant permute for V8HImode with just SSE2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 --- Comment #4 from GCC Commits --- The master branch has been updated by Lili Cui : https://gcc.gnu.org/g:60b708a9c878aff9a76ec0d446ae63e6527327a6 commit r15-5666-g60b708a9c878aff9a76ec0d446ae63e6527327a6 Author: Cui, Lili Date: Tue Nov 26 15:10:23 2024 +0800 Optimize 128-bit vector permutation with pand, pandn and por. This patch introduces a new subroutine in ix86_expand_vec_perm_const_1. On x86, use mixed constant permutation for V8HImode and V16QImode when SSE2 is supported. This patch handles certain vector shuffle operations more efficiently using pand, pandn, and por. This change is intended to improve assembly code generation for configurations that support SSE2. gcc/ChangeLog: PR target/116675 * config/i386/i386-expand.cc (expand_vec_perm_pand_pandn_por): New subroutine. (ix86_expand_vec_perm_const_1): Call expand_vec_perm_pand_pandn_por. gcc/testsuite/ChangeLog: PR target/116675 * gcc.target/i386/pr116675.c: New test.
[Bug target/116675] No blend constant permute for V8HImode with just SSE2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment #3 from Hongtao Liu --- 5032(define_expand "vcond_mask_" 5033 [(set (match_operand:VI_128 0 "register_operand") 5034(vec_merge:VI_128 5035 (match_operand:VI_128 1 "vector_operand") 5036 (match_operand:VI_128 2 "nonimm_or_0_operand") 5037 (match_operand: 3 "register_operand")))] 5038 "TARGET_SSE2" 5039{ 5040 ix86_expand_sse_movcc (operands[0], operands[3], 5041 operands[1], operands[2]); 5042 DONE; 5043}) Yes, a blend shoud be same as vcond_mask.
[Bug target/116675] No blend constant permute for V8HImode with just SSE2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 --- Comment #2 from Richard Biener --- Same for V16QImode. It works for V4SImode using shufps $216, %xmm1, %xmm0 pshufd $216, %xmm0, %xmm0
[Bug target/116675] No blend constant permute for V8HImode with just SSE2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 Richard Biener changed: What|Removed |Added Blocks||53947 Target||x86_64-*-* Keywords||missed-optimization --- Comment #1 from Richard Biener --- This causes issues with SLP vectorization multi-operator groups where we expect to be able to blend two aribtrary vectors (at one point we've open-coded the masking and OR but transitioned to using permutes). Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations