http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

             Bug #: 54231
           Summary: LTO generates code for the wrong CPU if different
                    options used
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: thi...@kde.org


Created attachment 27992
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27992
Makefile

Summary:

Given the following code:

=====
#include <immintrin.h>

void BZERO(char *ptr, size_t count)
{
    __m128i zero = _mm_set1_epi8(0);
    while (count--) {
        _mm_stream_si128((__m128i*)ptr, zero);
        ptr += 16;
    }
}
=====

When compiled twice, once for SSE2 and once for AVX (so we get VEX-prefixed
code), under LTO gcc will generate both cases using VEX. See the attached
Makefile.

Long description:

A library or program that attempts to determine at runtime whether certain CPU
features, like AVX support, may need to compile different compilation units
with different compiler flags. In the example I am providing, a simple function
that zeroes out a segment of memory aligned to 16 bytes. It's provided by the
same compilation unit which is compiled twice, but that does not seem to be
relevant.

The idea is that each of these two functions would be called by a dispatcher
function, after verifying the result of CPUID.

However, if you compile the code with LTO (e.g., by make CFLAGS=-flto with the
attached Makefile), GCC will apply the highest CPU setting to all compilation
units. This defeats the runtime detection technique: in this example, both
functions will contain AVX code, which would end up being run on computers
without AVX support.

This might be intentional. If so, please close this bug report.

However, I would recommend that the behaviour be fixed: the ability to use LTO
with different CPU settings would allow for better inlining of the functions
and suppressing unnecessary function calls. The bzero example is a good one.

Reply via email to