[Bug rtl-optimization/88770] Redundant load opt. or CSE pessimizes code

2021-06-06 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88770

--- Comment #3 from Hongtao.liu  ---
Shouldn't pass_store_merging be better place to handle such optimization?
currently store-merging only merges .a and .b, fails to merge .c and .d

202t.store-merging

void caller ()
{
  struct guu D.4030;
  struct guu D.4029;

   [local count: 1073741824]:
  MEM  [(int *)] = 21474836483;
  D.4029.c = 7.0e+0;
  D.4029.d = 9;
  test (D.4029);
  MEM  [(int *)] = 21474836483;
  D.4030.c = 7.0e+0;
  D.4030.d = 9;
  test (D.4030);
  D.4029 ={v} {CLOBBER};
  D.4030 ={v} {CLOBBER};
  return;

[Bug rtl-optimization/88770] Redundant load opt. or CSE pessimizes code

2021-06-05 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88770

Peter Cordes  changed:

   What|Removed |Added

 CC||peter at cordes dot ca

--- Comment #2 from Peter Cordes  ---
Note that mov r64, imm64 is a 10-byte instruction, and can be slow to read from
the uop-cache on Sandybridge-family.

The crap involving OR is clearly sub-optimal, but *if* you already have two
spare call-preserved registers across this call, the following is actually
smaller code-size:

movabs  rdi, 21474836483
mov rbp, rdi
movabs  rsi, 39743127552
mov rbx, rsi
calltest
mov rdi, rbp
mov rsi, rbx
calltest


This is more total uops for the back-end though (movabs is still single-uop,
but takes 2 entries the uop cache on Sandybridge-family;
https://agner.org/optimize/).  So saving x86 machine-code size this way does
limit the ability of out-of-order exec to see farther, if the front-end isn't
the bottleneck.  And it's highly unlikely to be worth saving/restoring two regs
to enable this.  (Or to push rdi / push rsi before call, then pop after!)

Setting up the wrong value and then fixing it twice with OR is obviously
terrible and never has any advantage, but the general idea to CSE large
constants isn't totally crazy.  (But it's profitable only in such limited cases
that it might not be worth looking for, especially if it's only helpful at -Os)

[Bug rtl-optimization/88770] Redundant load opt. or CSE pessimizes code

2019-01-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88770

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization, ra
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-09
 CC||vmakarov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
I guess being constants would make this a job for lra remat?

Confirmed also on trunk.