[Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils

2023-08-24 Thread eggert at cs dot ucla.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823

--- Comment #5 from Paul Eggert  ---
Also see bug 43 for a related performance issue, which is perhaps more
important given the current state of bleeding-edge GNU diffutils.

[Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils

2023-07-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #4 from Alexander Monakov  ---
It's a weakness in the REE pass. AFAICT normally it would handle this, but here
there are two elimination candidates in 'main', the first is eliminated
successfully, and then REE punts on the second because one if its reaching
definitions is the first redundant extension:

  /* If def_insn is already scheduled to be deleted, don't attempt
 to modify it.  */
  if (state->modified[INSN_UID (def_insn)].deleted)
return false;

While looking into this I noticed that the fix for PR 61094 introduced a
write-only bitfield 'do_not_reextend' (the Changelog wrongly claimed it was
used).

[Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils

2023-07-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823

--- Comment #3 from Andrew Pinski  ---
The gimple level looks like:
```
  if (_54 >= 0)
goto ; [90.00%]
  else
goto ; [10.00%]

   [local count: 63261141172]:
  _18 = (unsigned int) _54;
  goto ; [100.00%]
...
  len_37 = mbrtoc32 (, iter_39, _36, );
  len.0_38 = (signed long) len_37;
  if (len.0_38 < 0)
goto ; [10.00%]
  else
goto ; [90.00%]

   [local count: 632611429]:
  ch.1_42 = ch; // Note this is a local variable

   [local count: 7029015815]:
  # SR.45_12 = PHI 
  # SR.46_46 = PHI 
  mbs ={v} {CLOBBER(eol)};
  ch ={v} {CLOBBER(eol)};

   [local count: 70290156974]:
  # SR.41_16 = PHI <_18(4), SR.45_12(7)>
  # SR.42_47 = PHI <1(4), SR.46_46(7)>
  _6 = (long long unsigned int) SR.41_16;
```

Maybe we should have a type promotion pass on the gimple level that promotes
_54 to `long unsigned int`.

[Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils

2023-07-26 Thread eggert at cs dot ucla.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823

--- Comment #2 from Paul Eggert  ---
Created attachment 55645
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55645=edit
code-mbcel1.s with the optimization suggested in the bug report

[Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils

2023-07-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Severity|normal  |enhancement

[Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils

2023-07-26 Thread eggert at cs dot ucla.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823

--- Comment #1 from Paul Eggert  ---
Created attachment 55644
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55644=edit
gcc -O2 -S output (from code-mbcel1.i)