[Bug 897583] Re: Code generation bug with -O2 (-foptimize-sibling-calls)

Bug Watch Updater Tue, 29 Nov 2011 04:32:48 -0800

Launchpad has imported 10 comments from the remote bug at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51323.


If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2011-11-27T19:05:29+00:00 David Kastrup wrote:

Created attachment 25921
Boiled down source code.  Bad code for last function.

The following boiled down code produces a jmp to
Grob::internal_set_property where the implicit first call argument
(this) is equal to the explicit second call argument instead of the
actual this pointer.  The guilty code sequence is

.L4:
        movl    %ebx, 40(%esp)
        movl    %ebx, 32(%esp)
        movl    %eax, 36(%esp)
        addl    $24, %esp
        .cfi_remember_state
        .cfi_def_cfa_offset 8
        popl    %ebx
        .cfi_def_cfa_offset 4
        .cfi_restore 3
        jmp     _ZN4Grob21internal_set_propertyEPvS0_

Version is
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-linux-gnu/4.6.1/lto-wrapper
Target: i686-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 
4.6.1-9ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs 
--enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr 
--program-suffix=-4.6 --enable-shared --enable-linker-build-id 
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin 
--enable-objc-gc --enable-targets=all --disable-werror --with-arch-32=i686 
--with-tune=generic --enable-checking=release --build=i686-linux-gnu 
--host=i686-linux-gnu --target=i686-linux-gnu
Thread model: posix
gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) 

Compilation options are -O2

This is from Lilypond source code and causes a segfault.

Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/0

------------------------------------------------------------------------
On 2011-11-28T11:32:46+00:00 David Kastrup wrote:

-fno-optimize-sibling-calls avoids the problematic optimization.  For no
good reason at all, -fkeep-inline-functions, documented to do a
completely unrelated non-optimization (namely emitting inline functions
even when all uses of them had been inlined), will also switch off the
problematic tail call optimization.

Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/1

------------------------------------------------------------------------
On 2011-11-28T12:17:52+00:00 David Kastrup wrote:

This particular code generation bug is responsible for at least half a
dozen problems in the code base of Lilypond and causes a number of
regression test failures.

We will have to add respective compiler options based on the version
number of gcc.  If anybody knowing the responsible compiler internals
can construct a self-contained test case that does not require manually
inspecting the generated code for errors, we could at least add an
autoconf test specifically tailored to the occurence of this bug instead
of basing the workaround compiler options on the version number.

Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/2

------------------------------------------------------------------------
On 2011-11-28T17:54:05+00:00 Jakub-gcc wrote:

Can't reproduce this with 4.6, but can with the trunk.

/* PR middle-end/51323 */

extern void abort (void);
struct S { int a, b, c; };
int v;

__attribute__((noinline, noclone)) void
foo (int x, int y, int z)
{
  if (x != v || y != 0 || z != 9)
    abort ();
}

static inline int
baz (const struct S *p)
{
  return p->b;
}

__attribute__((noinline, noclone)) void
bar (int x, struct S y)
{
  foo (baz (&y), 0, x);
}

int
main ()
{
  struct S s;
  v = 3; s.a = v - 1; s.b = v; s.c = v + 1;
  bar (9, s);
  v = 17; s.a = v - 1; s.b = v; s.c = v + 1;
  bar (9, s);
  return 0;
}

at -O2 -m32 fails.  My http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02413.html
seems to fix this.

Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/3

------------------------------------------------------------------------
On 2011-11-28T18:24:10+00:00 David Kastrup wrote:

I can confirm that my version of gcc identifying itself as
gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) 
makes your test program abort under -O2.  If you _cannot_ confirm this with 
your version 4.6.1 but with the trunk, it would appear that Ubuntu 11.10 (or 
its upstream Debian) has imprudently integrated unstable code from the 4.7 
branch into the version of gcc they choose to distribute with the release.

If your test program can reasonably be considered as perfectly
correlated with the occurence of the bug (I don't have the expertise),
I'll be using it as an autoconf test in Lilypond for deciding whether to
compile with -fno-optimize-sibling-calls instead of the current test
just checking the version.

Thanks.

Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/4

------------------------------------------------------------------------
On 2011-11-28T19:33:26+00:00 David Kastrup wrote:

Question: the proposed fix is in gcc/calls.c which looks somewhat
architecture independent.  Am I right in assuming that this means that
the bug may manifest itself under architectures different from i686
given different conditions?

In that case, I would tend to just unconditionally do -fno-optimize-
sibling-calls in our autoconf checks for all respective gcc versions
independent from tests and architecture since I don't have the hardware
for other platforms in order to figure out compiler bugs, and since the
bug tends to hide its cause in the resulting segfault, as it occurs only
with tail jumps, meaning that the responsible function is not even
visible in the stack traceback.

Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/5

------------------------------------------------------------------------
On 2011-11-28T20:38:59+00:00 Jakub-gcc wrote:

That would be too big hammer approach.  While the fix is in arch independent 
code, on most architectures you could hit it only with > 6 resp. > 8 arguments 
and with similar scenario earlier callee argument stack slot initialized from 
later caller's argument stack slot.  Furthermore I don't think you could hit it 
before 4.6 because MEM_REF wasn't supported there, so the address would be 
expanded without a temporary pseudo and the checking routine would see it right 
away.
Using this testcase and perhaps a modified one additionally too which will have 
in both the caller and tail callee say 16 extra dummy integer arguments passed 
through should be more than enough.  Not to mention that having workarounds for 
such compiler bugs in packages is just weird, if the compiler is buggy, the 
user should just upgrade it to a fixed version.

Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/6

------------------------------------------------------------------------
On 2011-11-29T09:43:44+00:00 David Kastrup wrote:

I agree that the real fix is to force an upgrade of the compiler to a
fixed version.  However, Ubuntu 11.10 has been released and is in
circulation, so we can't reasonably implement that solution until the
buggy compilers have had a reasonable chance to be replaced everywhere.

I have reported this bug to Ubuntu.  If you are right that it can't be
found in 4.6 proper, they will have acquired it via distribution
specific patches.  What that means for stability and security of the
entire current Ubuntu code base, one can only guess.

Regarding Lilypond, we have chosen to use -fno-optimize-sibling-calls
based on the gcc version number instead of an actual test, without
consideration of the architecture.  Tracking this bug down has cost us
several weeks of developer time and brought down our build
infrastructure for a while until the first workaround, -fkeep-inline-
functions, has been discovered by chance.  Lilypond is a C++ application
with considerable parts written in Guile, so segfaults usually are a
problem of forgetting garbage collection protection measures.  As far as
I know, I am the only active programmer with a system programming
background.  When the bug manifests itself in a segfault, the
responsible function is no longer visible in the stack backtrace.  This
makes finding the culprit extremely unfunny.  In our case, the problem
was exacerbated because the last visible caller in the stack backtrace
made its call via a function pointer table, this table was a C++ vector,
and accessing the vector in gdb was not possible because operator[] had
been inlined.  Specifying -fkeep-inline-function, which is according to
its documentation supposed to _only_ additionally emit (unused) inline
function instantiations that could have been used for accessing that
table in the debugger, made the bug disappear.

There is no sane reason that -fkeep-inline-functions turns off sibling
call optimization, but while sabotaging the debugging of this problem,
it at least gave us a workaround.

So we simply can't afford dealing with this kind of situation more than
once.  We don't have the skill sets.  In contrast, the positive results
of this optimization are negligible for us since we don't employ
systematic call chaining (like a P code interpreter using function
pointer tables likely would).

Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/9

------------------------------------------------------------------------
On 2011-11-29T10:28:48+00:00 Jakub-gcc wrote:

Actually, it fails on 4.6 vanilla branch too, just not in GCC 4.6-RH,
because we don't enable -fipa-sra by default for -O2/-Os for debug info
quality reasons in 4.6, only for -O3 (that is something that is solved
in GCC 4.7).

The regression started with
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164135 which started
building MEM_REF in the IPA-SRA codepath.

As for your workaround, if you only do it for GCC 4.6.* and not also for
4.7 and later, it is fine, all I would like is that such workarounds
don't stick around forever for future versions of GCC.

Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/10

------------------------------------------------------------------------
On 2011-11-29T11:13:21+00:00 David Kastrup wrote:

Code review is at <URL:http://codereview.appspot.com/5431088>, the
discussion of the bug is at
<URL:http://code.google.com/p/lilypond/issues/detail?id=1997>.

As you can see, the proposed workaround is restricted to g++ versions of
4.6.x.  I trust that it will be fixed by the time 4.7 gets released, and
once we have conclusive evidence about versions of 4.6.x that are
unaffected, those will likely not get the fix either.

As a suggestion: it might be sensible to have a meta option -fdebug that
will disable all options significantly interfering with post mortem
debugging.  While -g by itself should not change code generation, having
a supporting option that helps debugging might be nice.

The option set I currently think of is something like -fno-crossjumping
-fkeep-inline-functions -fno-optimize-sibling-calls.  Also optimization
of noreturn functions, in particular of abort, would be disabled since
clobbering the stack traceback is not really helpful for debugging.

But that's a different issue.

Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/11


** Changed in: gcc
       Status: Unknown => In Progress

** Changed in: gcc
   Importance: Unknown => High

** Bug watch added: code.google.com/p/lilypond/issues #1997
   http://code.google.com/p/lilypond/issues/detail?id=1997

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/897583

Title:
  Code generation bug with -O2 (-foptimize-sibling-calls)

To manage notifications about this bug go to:
https://bugs.launchpad.net/gcc/+bug/897583/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 897583] Re: Code generation bug with -O2 (-foptimize-sibling-calls)

Reply via email to