https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326
--- Comment #6 from Alexander Nesterovskiy ---
Thanks! I see performance gain on 648.exchange2_s (~6% on Broadwell and ~3% on
Skylake-X) reverting performance to r255266 level (Skylake-X regression was
~3%).
And loops unrolled with 2 and 3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326
--- Comment #5 from Richard Biener ---
Author: rguenth
Date: Thu Dec 14 14:32:24 2017
New Revision: 255635
URL: https://gcc.gnu.org/viewcvs?rev=255635=gcc=rev
Log:
2017-12-14 Richard Biener
PR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326
Richard Biener changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326
--- Comment #3 from Richard Biener ---
Created attachment 42879
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42879=edit
patch in testing
Patch I am testing. Performance evaluation appreciated.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326
--- Comment #2 from Richard Biener ---
We no longer unroll the inner loops in cunrolli because cunrolli will leave us
with exit checks.
We fail to compute the number of iterations of the inner loop(s) (pre loop
header copying):
[local count:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326
Richard Biener changed:
What|Removed |Added
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326
Andrew Pinski changed:
What|Removed |Added
Keywords||missed-optimization
Target