https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84328

            Bug ID: 84328
           Summary: [6 Regression] -finline-small-functions and inline
                    keyword lead to slowdown since version 6
           Product: gcc
           Version: 8.0.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: xyzdr4gon333 at googlemail dot com
  Target Milestone: ---

Created attachment 43393
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43393&action=edit
optimizeFlags.cpp

I have a function looking like this:

unsigned int interleaveTwoZeros( unsigned int n )
{
    n&= 0x000003ff;
    n = (n ^ (n << 16)) & 0xFF0000FF;
    n = (n ^ (n <<  8)) & 0x0300F00F;
    n = (n ^ (n <<  4)) & 0x030C30C3;
    n = (n ^ (n <<  2)) & 0x09249249;
    return n;
}

On g++ < 6 it takes 5.7s for all optimization levels >= O1.
Since g++6 it takes 6.2s when the -finline-small-functions option is specified
or if the `inline` keyword is added in front of `interleaveTwoZeros`.
Very interestingly, this bug also disappears when changing the function body!? 
What I mean is, this function:

unsigned int interleaveZeros( unsigned int n )
{
        n &= 0x0000ffff;
        n = (n | (n << 8)) & 0x00FF00FF;
        n = (n | (n << 4)) & 0x0F0F0F0F;
        n = (n | (n << 2)) & 0x33333333;
        n = (n | (n << 1)) & 0x55555555;
        return n;
}

is exactly as fast as interleaveTwoZeros, but it isn't being slowed down by the
inlining bug which appears since version 6, which seems to mean, that the the
change of the constants doesn't lead to any change to the internal logic, but
somehow still influences the change done by inlining.

Here are the full benchmarks on my system as done with:

for function in '' '-DTWO_ZEROS_VERSION'  '-DTWO_ZEROS_VERSION
-DMANUAL_INLINE'; do
    for GPP in g++-4.9 g++-5 g++-6 g++-7 g++-8; do
        $GPP --version | head -1;
        for flag in -O1 '-O1 -finline-small-functions'; do
            echo -n "$flag "
            $GPP $flag $function -std=c++11 optimizeFlags.cpp &&
            ./a.out
        done
    done
done

interleaveZeros:
  4.9.4 -O1                          5.67675s
  4.9.4 -O1 -finline-small-functions 5.65597s
  5.5.0 -O1                          5.63532s
  5.5.0 -O1 -finline-small-functions 5.66475s
  6.4.0 -O1                          5.64871s
  6.4.0 -O1 -finline-small-functions 5.74504s
  7.3.0 -O1                          5.70723s
  7.3.0 -O1 -finline-small-functions 5.7509s
  8.0.1 -O1                          5.73126s
  8.0.1 -O1 -finline-small-functions 5.65887s
interleaveTwoZeros:
  4.9.4 -O1                          5.68634s
  4.9.4 -O1 -finline-small-functions 5.67831s
  5.5.0 -O1                          5.70178s
  5.5.0 -O1 -finline-small-functions 5.67027s
  6.4.0 -O1                          5.77438s
  6.4.0 -O1 -finline-small-functions 6.16534s -> 10% slower!
  7.3.0 -O1                          5.74391s
  7.3.0 -O1 -finline-small-functions 6.15133s -> 10% slower!
  8.0.1 -O1                          5.76954s
  8.0.1 -O1 -finline-small-functions 6.13896s -> 10% slower!
inline interleaveTwoZeros:
  4.9.4 -O1                          5.6749s
  4.9.4 -O1 -finline-small-functions 5.64078s
  5.5.0 -O1                          5.73546s
  5.5.0 -O1 -finline-small-functions 5.7754s
  6.4.0 -O1                          6.1316s  -> 10% slower!
  6.4.0 -O1 -finline-small-functions 6.13555s -> 10% slower!
  7.3.0 -O1                          6.12899s -> 10% slower!
  7.3.0 -O1 -finline-small-functions 6.15963s -> 10% slower!
  8.0.1 -O1                          6.17762s -> 10% slower!
  8.0.1 -O1 -finline-small-functions 6.15857s -> 10% slower!

Reply via email to