http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231
Bug #: 54231 Summary: LTO generates code for the wrong CPU if different options used Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: thi...@kde.org Created attachment 27992 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27992 Makefile Summary: Given the following code: ===== #include <immintrin.h> void BZERO(char *ptr, size_t count) { __m128i zero = _mm_set1_epi8(0); while (count--) { _mm_stream_si128((__m128i*)ptr, zero); ptr += 16; } } ===== When compiled twice, once for SSE2 and once for AVX (so we get VEX-prefixed code), under LTO gcc will generate both cases using VEX. See the attached Makefile. Long description: A library or program that attempts to determine at runtime whether certain CPU features, like AVX support, may need to compile different compilation units with different compiler flags. In the example I am providing, a simple function that zeroes out a segment of memory aligned to 16 bytes. It's provided by the same compilation unit which is compiled twice, but that does not seem to be relevant. The idea is that each of these two functions would be called by a dispatcher function, after verifying the result of CPUID. However, if you compile the code with LTO (e.g., by make CFLAGS=-flto with the attached Makefile), GCC will apply the highest CPU setting to all compilation units. This defeats the runtime detection technique: in this example, both functions will contain AVX code, which would end up being run on computers without AVX support. This might be intentional. If so, please close this bug report. However, I would recommend that the behaviour be fixed: the ability to use LTO with different CPU settings would allow for better inlining of the functions and suppressing unnecessary function calls. The bzero example is a good one.