https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712
Bug ID: 79712 Summary: Clang smarter about unrolling in fhourstones benchmark Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40829 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40829&action=edit preprocessed source It seems clang is probably doing a better job at unrolling in the fhourstones benchmark: $ gcc -Wextra -Wall -Ofast -mcpu=cortex-a53 -march=armv8-a+crc -ftree-vectorize SearchGame.i (-funroll-loops -fvariable-expansion-in-unroller -ftree-loop-ivcanon -fivopts) $ ./a.out < inputs - clang 3.8 result: 3358 kpos/s - gcc result: 3220 kpos/s - gcc result with unrolling: 3473 kpos/s It would be nice if gcc could achieve similar performance to clang's -O3 out of the box. BTW, running the benchmark on 32-bit requires changing the %lu's to %llu's at line 200 in the C source.