[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension

2017-11-07 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425

--- Comment #7 from uros at gcc dot gnu.org ---
Author: uros
Date: Tue Nov  7 18:51:22 2017
New Revision: 254505

URL: https://gcc.gnu.org/viewcvs?rev=254505=gcc=rev
Log:
PR target/80425
* config/i386.i386.md (*zero_extendsidi2): Change (?r,*Yj), (?*Yi,r)
and (*x,m) to ($r,Yj), ($Yi,r) and ($x,m).
(zero-extendsidi peephole2): Remove peephole.

testsuite/ChangeLog:

PR target/80425
* gcc.target/i386/pr80425-3.c: New test.


Added:
trunk/gcc/testsuite/gcc.target/i386/pr80425-3.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.md
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension

2017-05-15 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425

--- Comment #6 from Uroš Bizjak  ---
Mostly fixed, an issue from Comment #4 remains, although *zero_extendsidi2
pattern now reads:

(define_insn "*zero_extendsidi2"
  [(set (match_operand:DI 0 "nonimmediate_operand"
"=r,?r,?o,r   ,o,?*Ym,?!*y,?r ,?r,?*Yi,*x,*x,*v,*r")
(zero_extend:DI
 (match_operand:SI 1 "x86_64_zext_operand"
"0 ,rm,r ,rmWz,0,r   ,m   ,*Yj,*x,r   ,m ,*x,*v,*k")))]

LRA starts with:

7: r96:DI=zero_extend([`a'])
   12: r92:V8DI#0=r95:V16SI>>r96:DI
  REG_DEAD r96:DI
  REG_DEAD r95:V16SI

and creates:

7: ax:DI=zero_extend([`a'])
   21: [bp:DI-0x38]=ax:DI
   22: xmm1:DI=[bp:DI-0x38]
   12: xmm0:V16SI=xmm0:V16SI>>xmm1:DI

xmm0 could be zero-extended directly from memory.

[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension

2017-05-15 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425

--- Comment #5 from Uroš Bizjak  ---
Author: uros
Date: Mon May 15 19:04:35 2017
New Revision: 248070

URL: https://gcc.gnu.org/viewcvs?rev=248070=gcc=rev
Log:
* config/i386.i386.md (*zero_extendsidi2): Do not penalize
non-interunit SSE move alternatives with '?'.
(zero-extendsidi peephole2): New peephole to skip intermediate
general register in SSE zero-extend sequence.

testsuite/ChangeLog:

* gcc.target/i386/pr80425-1.c: New test.
* gcc.target/i386/pr80425-2.c: Ditto.


Added:
trunk/gcc/testsuite/gcc.target/i386/pr80425-1.c
trunk/gcc/testsuite/gcc.target/i386/pr80425-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.md
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension

2017-05-15 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425

--- Comment #4 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #1)
> Looks like RA issue.

A related problem is shown with:

extern int a;
__m512i
f1 (__m512i x)
{
  return _mm512_srai_epi32 (x, a);
}

compiled with -O2 -mavx512f:

movla(%rip), %eax   # 7 *zero_extendsidi2/4 [length = 6]
movq%rax, -56(%rbp) # 21*movdi_internal/6   [length = 4]
vmovq   -56(%rbp), %xmm1# 22*movdi_internal/15  [length
= 7]
vpsrad  %xmm1, %zmm0, %zmm0 # 12ashrv16si3/1[length = 6]

Please note that GR->xmm moves are disabled by default. In this case, we could
extend from mem->xmm, but RA choose general register instead.

This happens even with the following patch that removes "?" from relevant
insns:

--cut here--
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index da79d8f..a1ff7c9 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3762,10 +3762,10 @@

 (define_insn "*zero_extendsidi2"
   [(set (match_operand:DI 0 "nonimmediate_operand"
-   "=r,?r,?o,r   ,o,?*Ym,?!*y,?r ,?r,?*Yi,?*x,?*x,?*v,*r")
+   "=r,?r,?o,r   ,o,?*Ym,?!*y,?r ,?r,?*Yi,*x,*x,*v,*r")
(zero_extend:DI
 (match_operand:SI 1 "x86_64_zext_operand"
-   "0 ,rm,r ,rmWz,0,r   ,m   ,*Yj,*x,r   ,m  , *x, *v,*k")))]
+   "0 ,rm,r ,rmWz,0,r   ,m   ,*Yj,*x,r   ,m ,*x,*v,*k")))]
   ""
 {
   switch (get_attr_type (insn))
--cut here--

-m32 generates optimal code w and w/o patch:

vmovd   a, %xmm1# 7 *zero_extendsidi2/11[length = 11]
vpsrad  %xmm1, %zmm0, %zmm0 # 12ashrv16si3/1[length = 6]

[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension

2017-04-21 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425

--- Comment #3 from Uroš Bizjak  ---
(In reply to Vladimir Makarov from comment #2)

> So I don't know how to fix it in IRA or in LRA.  I am pretty sure the
> old RA and reload would have had the same problem.
> 
> Probably the issue should be fixed in machine dependent code.  But the fix
> might create more problems.

It is possible to create a couple of peephole2 patterns to catch this, but
perhaps REE can be enhanced to optimize the above register propagation, so the
solution would apply to all targets.

[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension

2017-04-20 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425

--- Comment #2 from Vladimir Makarov  ---
We have the following fragment:

8: r96:DI=zero_extend(r93:SI)
   REG_DEAD r93:SI
   13: r91:V8DI#0=r95:V16SI>>r96:DI
   REG_DEAD r96:DI
   REG_DEAD r95:V16SI

IRA allocates general regs to r96 and r93.  And it means insn 8
alternative (0) r (1) rmWz {*zero_extendsidi2} as requiring no any
reload.

So why does IRA choose general regs for r96 instead of SSE ones.  For insn 8 we
have the following alternatives:

"=...r   ,...,?*Ym,..."
"... rmWz,...,r   ,..."

Alternative '?*Ym, r' discourages usage of SSE regs as Y has *
(exluded from pseudo class consideration).  It is also discouraged in
LRA as it has ?

So I don't know how to fix it in IRA or in LRA.  I am pretty sure the
old RA and reload would have had the same problem.

Probably the issue should be fixed in machine dependent code.  But the fix
might create more problems.

[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension

2017-04-14 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425

Uroš Bizjak  changed:

   What|Removed |Added

   Keywords||ra
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-04-14
 CC||jakub at gcc dot gnu.org,
   ||vmakarov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Uroš Bizjak  ---
Looks like RA issue.