[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425 --- Comment #7 from uros at gcc dot gnu.org --- Author: uros Date: Tue Nov 7 18:51:22 2017 New Revision: 254505 URL: https://gcc.gnu.org/viewcvs?rev=254505=gcc=rev Log: PR target/80425 * config/i386.i386.md (*zero_extendsidi2): Change (?r,*Yj), (?*Yi,r) and (*x,m) to ($r,Yj), ($Yi,r) and ($x,m). (zero-extendsidi peephole2): Remove peephole. testsuite/ChangeLog: PR target/80425 * gcc.target/i386/pr80425-3.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr80425-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425 --- Comment #6 from Uroš Bizjak --- Mostly fixed, an issue from Comment #4 remains, although *zero_extendsidi2 pattern now reads: (define_insn "*zero_extendsidi2" [(set (match_operand:DI 0 "nonimmediate_operand" "=r,?r,?o,r ,o,?*Ym,?!*y,?r ,?r,?*Yi,*x,*x,*v,*r") (zero_extend:DI (match_operand:SI 1 "x86_64_zext_operand" "0 ,rm,r ,rmWz,0,r ,m ,*Yj,*x,r ,m ,*x,*v,*k")))] LRA starts with: 7: r96:DI=zero_extend([`a']) 12: r92:V8DI#0=r95:V16SI>>r96:DI REG_DEAD r96:DI REG_DEAD r95:V16SI and creates: 7: ax:DI=zero_extend([`a']) 21: [bp:DI-0x38]=ax:DI 22: xmm1:DI=[bp:DI-0x38] 12: xmm0:V16SI=xmm0:V16SI>>xmm1:DI xmm0 could be zero-extended directly from memory.
[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425 --- Comment #5 from Uroš Bizjak --- Author: uros Date: Mon May 15 19:04:35 2017 New Revision: 248070 URL: https://gcc.gnu.org/viewcvs?rev=248070=gcc=rev Log: * config/i386.i386.md (*zero_extendsidi2): Do not penalize non-interunit SSE move alternatives with '?'. (zero-extendsidi peephole2): New peephole to skip intermediate general register in SSE zero-extend sequence. testsuite/ChangeLog: * gcc.target/i386/pr80425-1.c: New test. * gcc.target/i386/pr80425-2.c: Ditto. Added: trunk/gcc/testsuite/gcc.target/i386/pr80425-1.c trunk/gcc/testsuite/gcc.target/i386/pr80425-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425 --- Comment #4 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #1) > Looks like RA issue. A related problem is shown with: extern int a; __m512i f1 (__m512i x) { return _mm512_srai_epi32 (x, a); } compiled with -O2 -mavx512f: movla(%rip), %eax # 7 *zero_extendsidi2/4 [length = 6] movq%rax, -56(%rbp) # 21*movdi_internal/6 [length = 4] vmovq -56(%rbp), %xmm1# 22*movdi_internal/15 [length = 7] vpsrad %xmm1, %zmm0, %zmm0 # 12ashrv16si3/1[length = 6] Please note that GR->xmm moves are disabled by default. In this case, we could extend from mem->xmm, but RA choose general register instead. This happens even with the following patch that removes "?" from relevant insns: --cut here-- diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index da79d8f..a1ff7c9 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -3762,10 +3762,10 @@ (define_insn "*zero_extendsidi2" [(set (match_operand:DI 0 "nonimmediate_operand" - "=r,?r,?o,r ,o,?*Ym,?!*y,?r ,?r,?*Yi,?*x,?*x,?*v,*r") + "=r,?r,?o,r ,o,?*Ym,?!*y,?r ,?r,?*Yi,*x,*x,*v,*r") (zero_extend:DI (match_operand:SI 1 "x86_64_zext_operand" - "0 ,rm,r ,rmWz,0,r ,m ,*Yj,*x,r ,m , *x, *v,*k")))] + "0 ,rm,r ,rmWz,0,r ,m ,*Yj,*x,r ,m ,*x,*v,*k")))] "" { switch (get_attr_type (insn)) --cut here-- -m32 generates optimal code w and w/o patch: vmovd a, %xmm1# 7 *zero_extendsidi2/11[length = 11] vpsrad %xmm1, %zmm0, %zmm0 # 12ashrv16si3/1[length = 6]
[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425 --- Comment #3 from Uroš Bizjak --- (In reply to Vladimir Makarov from comment #2) > So I don't know how to fix it in IRA or in LRA. I am pretty sure the > old RA and reload would have had the same problem. > > Probably the issue should be fixed in machine dependent code. But the fix > might create more problems. It is possible to create a couple of peephole2 patterns to catch this, but perhaps REE can be enhanced to optimize the above register propagation, so the solution would apply to all targets.
[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425 --- Comment #2 from Vladimir Makarov --- We have the following fragment: 8: r96:DI=zero_extend(r93:SI) REG_DEAD r93:SI 13: r91:V8DI#0=r95:V16SI>>r96:DI REG_DEAD r96:DI REG_DEAD r95:V16SI IRA allocates general regs to r96 and r93. And it means insn 8 alternative (0) r (1) rmWz {*zero_extendsidi2} as requiring no any reload. So why does IRA choose general regs for r96 instead of SSE ones. For insn 8 we have the following alternatives: "=...r ,...,?*Ym,..." "... rmWz,...,r ,..." Alternative '?*Ym, r' discourages usage of SSE regs as Y has * (exluded from pseudo class consideration). It is also discouraged in LRA as it has ? So I don't know how to fix it in IRA or in LRA. I am pretty sure the old RA and reload would have had the same problem. Probably the issue should be fixed in machine dependent code. But the fix might create more problems.
[Bug rtl-optimization/80425] Extra inter-unit register move with zero-extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80425 Uroš Bizjak changed: What|Removed |Added Keywords||ra Status|UNCONFIRMED |NEW Last reconfirmed||2017-04-14 CC||jakub at gcc dot gnu.org, ||vmakarov at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Uroš Bizjak --- Looks like RA issue.