[Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap

2020-04-29 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

--- Comment #5 from Uroš Bizjak  ---
Probably some secondary effect of subregs on register allocation, changing
"float" to "int" in the original testcase gets us expected alternative and
optimal code using BSWAP.

[Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap

2020-04-29 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

Uroš Bizjak  changed:

   What|Removed |Added

   Keywords|missed-optimization |ra
 CC||vmakarov at gcc dot gnu.org
   Last reconfirmed||2020-04-29
 Resolution|DUPLICATE   |---
 Status|RESOLVED|NEW
 Ever confirmed|0   |1

--- Comment #4 from Uroš Bizjak  ---
Looks like RA (tuning?) problem.

We enter reload (-O2 -mmovbe -mtune=intel) with:

(insn 14 4 2 2 (set (reg:SF 87)
(reg:SF 20 xmm0 [ x ])) "pr94837.c":2:1 112 {*movsf_internal}
 (expr_list:REG_DEAD (reg:SF 20 xmm0 [ x ])
(nil)))
(insn 7 6 11 2 (set (subreg:SI (reg:SF 84 [  ]) 0)
(bswap:SI (subreg:SI (reg:SF 87) 0))) "pr94837.c":11:19 869
{*bswapsi2_movbe}
 (expr_list:REG_DEAD (reg:SF 87)
(nil)))
(insn 11 7 12 2 (set (reg/i:SF 20 xmm0)
(reg:SF 84 [  ])) "pr94837.c":12:1 112 {*movsf_internal}
 (expr_list:REG_DEAD (reg:SF 84 [  ])
(nil)))

and this sequence gets reloaded to:

(insn 17 6 7 2 (set (mem/c:SI (plus:DI (reg/f:DI 7 sp)
(const_int -4 [0xfffc])) [1 %sfp+-4 S4 A32])
(reg:SI 20 xmm0 [87])) "pr94837.c":11:19 67 {*movsi_internal}
 (nil))
(insn 7 17 16 2 (set (reg:SI 0 ax [88])
(bswap:SI (mem/c:SI (plus:DI (reg/f:DI 7 sp)
(const_int -4 [0xfffc])) [1 %sfp+-4 S4 A32])))
"pr94837.c":11:19 869 {*bswapsi2_movbe}
 (nil))
(insn 16 7 12 2 (set (reg:SI 20 xmm0 [orig:84  ] [84])
(reg:SI 0 ax [88])) "pr94837.c":11:19 67 {*movsi_internal}
 (nil))

One would expect reg allocator to choose alternative 0 from:

(define_insn "*bswap2_movbe"
  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,r,m")
(bswap:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,m,r")))]
  "TARGET_MOVBE
   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
  "@
bswap\t%0
movbe{}\t{%1, %0|%0, %1}
movbe{}\t{%1, %0|%0, %1}"

but for some reason this is not the case.

[Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap

2020-04-29 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

--- Comment #3 from Gabriel Ravier  ---
Also, I've tested the code from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593 and the optimization in
question is no longer in in `-mtune=generic`, only with specific architectures
like `-mtune=k8`

[Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap

2020-04-29 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

--- Comment #2 from Gabriel Ravier  ---
This is what I get with `-O3 -mmovbe -mtune=intel` : 

swapFloat(float):
  movd DWORD PTR [rsp-4], xmm0
  movbe eax, DWORD PTR [rsp-4]
  movd xmm0, eax
  ret

This seems erroneous

[Bug rtl-optimization/94837] Failure to optimize out spurious movbe into bswap

2020-04-28 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94837

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
This is on purpose.

Use -mtune=intel to get the result you want.

See PR 54593 of the reason why.

*** This bug has been marked as a duplicate of bug 54593 ***