https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114449
--- Comment #5 from Richard Biener ---
Note we do unroll the loop with -O3 but only late after which we do not re-do
bswap recognition (which happens before loop optimization). At -O2 we
don't unroll because that increases code-size too much.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114449
Andrew Pinski changed:
What|Removed |Added
CC||pinskia at gcc dot gnu.org
Se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114449
Xi Ruoyao changed:
What|Removed |Added
Keywords||missed-optimization
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114449
--- Comment #3 from Pali Rohár ---
Note that clang optimizes it just with -O2 and does not require any special
pragma.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114449
--- Comment #2 from Pali Rohár ---
Interesting... I was expecting that some -O3 or better -Ofast option tells gcc
to optimize the code as much as possible.
I added that pragma before for-loop in the first example and then gcc really
optimized t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114449
Xi Ruoyao changed:
What|Removed |Added
CC||xry111 at gcc dot gnu.org
--- Comment #1 fr