[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-24 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

Uroš Bizjak  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Uroš Bizjak  ---
Implemented for gcc-14.

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:6dd73f0f00f454a05552b008a1d56560bd3f1d4a

commit r14-3471-g6dd73f0f00f454a05552b008a1d56560bd3f1d4a
Author: Uros Bizjak 
Date:   Thu Aug 24 22:23:52 2023 +0200

i386: Optimize pinsrq of 0 with index 1 into movq [PR94866]

Add new pattern involving vec_merge RTX that is produced by combine from
the
combination of sse4_1_pinsrq and *movdi_internal:

7: r86:DI=0
8: r85:V2DI=vec_merge(vec_duplicate(r86:DI),r87:V2DI,0x2)
  REG_DEAD r87:V2DI
  REG_DEAD r86:DI
Successfully matched this instruction:
(set (reg:V2DI 85 [ a ])
(vec_merge:V2DI (reg:V2DI 87)
(const_vector:V2DI [
(const_int 0 [0]) repeated x2
])
(const_int 1 [0x1])))

PR target/94866

gcc/ChangeLog:

* config/i386/sse.md (*sse2_movq128__1): New insn pattern.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr94866.C: New test.

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-23 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #8 from Hongtao.liu  ---
(In reply to Uroš Bizjak from comment #7)
> (In reply to Hongtao.liu from comment #6) 
> > > So, the compiler still expects vec_concat/vec_select patterns to be 
> > > present.
> > 
> > v2df foo_v2df (v2df x)
> >  {
> >return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
> >  }
> > 
> > The testcase is not a typical vec_merge case, for vec_merge, the shuffle
> > index should be {0, 3}. Here it happened to be a vec_merge because the
> > second vector is all zero. And yes for this case, we still need to
> > vec_concat:vec_select pattern.
> 
> I guess the original patch is the way to go then.

Yes.

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-23 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #7 from Uroš Bizjak  ---
(In reply to Hongtao.liu from comment #6) 
> > So, the compiler still expects vec_concat/vec_select patterns to be present.
> 
> v2df foo_v2df (v2df x)
>  {
>return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
>  }
> 
> The testcase is not a typical vec_merge case, for vec_merge, the shuffle
> index should be {0, 3}. Here it happened to be a vec_merge because the
> second vector is all zero. And yes for this case, we still need to
> vec_concat:vec_select pattern.

I guess the original patch is the way to go then.

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-23 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #6 from Hongtao.liu  ---
(In reply to Uroš Bizjak from comment #4)
> (In reply to Hongtao.liu from comment #3)
> > in x86 backend expand_vec_perm_1, we always tries vec_merge frist for
> > !one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed
> > which means we'd better to use vec_merge instead of vec_select:vec_concat
> > when available in out backend pattern match.
> 
> In fact, I tried to convert existing sse2_movq128 patterns to vec_merge, but
> the patch regressed:
> 
> -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler movq
> -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler-not pxor
> -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-not pxor
> -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-times
> (?n)(?:mov|psrldq).*%xmm[0-9] 12
> 
> So, the compiler still expects vec_concat/vec_select patterns to be present.


v2df foo_v2df (v2df x)
 {
   return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
 }

The testcase is not a typical vec_merge case, for vec_merge, the shuffle index
should be {0, 3}. Here it happened to be a vec_merge because the second vector
is all zero. And yes for this case, we still need to vec_concat:vec_select
pattern.

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-23 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #5 from Uroš Bizjak  ---
Created attachment 55778
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55778&action=edit
Failing patch,  for reference

Patch that converts vec_concat/vec_select sse2_movq128 patterns to vec_merge.

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-23 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #4 from Uroš Bizjak  ---
(In reply to Hongtao.liu from comment #3)
> in x86 backend expand_vec_perm_1, we always tries vec_merge frist for
> !one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed
> which means we'd better to use vec_merge instead of vec_select:vec_concat
> when available in out backend pattern match.

In fact, I tried to convert existing sse2_movq128 patterns to vec_merge, but
the patch regressed:

-FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler movq
-FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler-not pxor
-FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-not pxor
-FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-times
(?n)(?:mov|psrldq).*%xmm[0-9] 12

So, the compiler still expects vec_concat/vec_select patterns to be present.

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-22 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #3 from Hongtao.liu  ---
in x86 backend expand_vec_perm_1, we always tries vec_merge frist for
!one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed
which means we'd better to use vec_merge instead of vec_select:vec_concat when
available in out backend pattern match.

Also for the view of avx512 kmask instructions, use vec_merge will help
constant propagation.

20107  /* Try the SSE4.1 blend variable merge instructions.  */
20108  if (expand_vec_perm_blend (d))
20109return true;
20110
20111  /* Try movss/movsd instructions.  */
20112  if (expand_vec_perm_movs (d))
20113return true;

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-22 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

Uroš Bizjak  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com

--- Comment #2 from Uroš Bizjak  ---
Created attachment 55776
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55776&action=edit
Proposed patch

Patch that introduces alternative MOVQ RTX definition.

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2023-08-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
Bug 94866 depends on bug 94864, which changed state.

Bug 94864 Summary: Failure to combine vunpckhpd+movsd into single vunpckhpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq

2020-04-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2020-04-30
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 Depends on||94864

--- Comment #1 from Richard Biener  ---
We're expanding from

  _3 = BIT_INSERT_EXPR ;
  return _3;

which ends up using

(insn 8 7 9 (set (reg:V2DI 85)
(vec_merge:V2DI (vec_duplicate:V2DI (reg:DI 86))
(reg:V2DI 85)
(const_int 2 [0x2]))) "y.c":6:28 -1

so likely the vec_merge "issue" again.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864
[Bug 94864] Failure to combine vunpckhpd+movsd into single vunpckhpd