Launchpad has imported 10 comments from the remote bug at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51323.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2011-11-27T19:05:29+00:00 David Kastrup wrote: Created attachment 25921 Boiled down source code. Bad code for last function. The following boiled down code produces a jmp to Grob::internal_set_property where the implicit first call argument (this) is equal to the explicit second call argument instead of the actual this pointer. The guilty code sequence is .L4: movl %ebx, 40(%esp) movl %ebx, 32(%esp) movl %eax, 36(%esp) addl $24, %esp .cfi_remember_state .cfi_def_cfa_offset 8 popl %ebx .cfi_def_cfa_offset 4 .cfi_restore 3 jmp _ZN4Grob21internal_set_propertyEPvS0_ Version is Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-linux-gnu/4.6.1/lto-wrapper Target: i686-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.1-9ubuntu3' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++,go --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-plugin --enable-objc-gc --enable-targets=all --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=i686-linux-gnu --host=i686-linux-gnu --target=i686-linux-gnu Thread model: posix gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) Compilation options are -O2 This is from Lilypond source code and causes a segfault. Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/0 ------------------------------------------------------------------------ On 2011-11-28T11:32:46+00:00 David Kastrup wrote: -fno-optimize-sibling-calls avoids the problematic optimization. For no good reason at all, -fkeep-inline-functions, documented to do a completely unrelated non-optimization (namely emitting inline functions even when all uses of them had been inlined), will also switch off the problematic tail call optimization. Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/1 ------------------------------------------------------------------------ On 2011-11-28T12:17:52+00:00 David Kastrup wrote: This particular code generation bug is responsible for at least half a dozen problems in the code base of Lilypond and causes a number of regression test failures. We will have to add respective compiler options based on the version number of gcc. If anybody knowing the responsible compiler internals can construct a self-contained test case that does not require manually inspecting the generated code for errors, we could at least add an autoconf test specifically tailored to the occurence of this bug instead of basing the workaround compiler options on the version number. Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/2 ------------------------------------------------------------------------ On 2011-11-28T17:54:05+00:00 Jakub-gcc wrote: Can't reproduce this with 4.6, but can with the trunk. /* PR middle-end/51323 */ extern void abort (void); struct S { int a, b, c; }; int v; __attribute__((noinline, noclone)) void foo (int x, int y, int z) { if (x != v || y != 0 || z != 9) abort (); } static inline int baz (const struct S *p) { return p->b; } __attribute__((noinline, noclone)) void bar (int x, struct S y) { foo (baz (&y), 0, x); } int main () { struct S s; v = 3; s.a = v - 1; s.b = v; s.c = v + 1; bar (9, s); v = 17; s.a = v - 1; s.b = v; s.c = v + 1; bar (9, s); return 0; } at -O2 -m32 fails. My http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02413.html seems to fix this. Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/3 ------------------------------------------------------------------------ On 2011-11-28T18:24:10+00:00 David Kastrup wrote: I can confirm that my version of gcc identifying itself as gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3) makes your test program abort under -O2. If you _cannot_ confirm this with your version 4.6.1 but with the trunk, it would appear that Ubuntu 11.10 (or its upstream Debian) has imprudently integrated unstable code from the 4.7 branch into the version of gcc they choose to distribute with the release. If your test program can reasonably be considered as perfectly correlated with the occurence of the bug (I don't have the expertise), I'll be using it as an autoconf test in Lilypond for deciding whether to compile with -fno-optimize-sibling-calls instead of the current test just checking the version. Thanks. Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/4 ------------------------------------------------------------------------ On 2011-11-28T19:33:26+00:00 David Kastrup wrote: Question: the proposed fix is in gcc/calls.c which looks somewhat architecture independent. Am I right in assuming that this means that the bug may manifest itself under architectures different from i686 given different conditions? In that case, I would tend to just unconditionally do -fno-optimize- sibling-calls in our autoconf checks for all respective gcc versions independent from tests and architecture since I don't have the hardware for other platforms in order to figure out compiler bugs, and since the bug tends to hide its cause in the resulting segfault, as it occurs only with tail jumps, meaning that the responsible function is not even visible in the stack traceback. Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/5 ------------------------------------------------------------------------ On 2011-11-28T20:38:59+00:00 Jakub-gcc wrote: That would be too big hammer approach. While the fix is in arch independent code, on most architectures you could hit it only with > 6 resp. > 8 arguments and with similar scenario earlier callee argument stack slot initialized from later caller's argument stack slot. Furthermore I don't think you could hit it before 4.6 because MEM_REF wasn't supported there, so the address would be expanded without a temporary pseudo and the checking routine would see it right away. Using this testcase and perhaps a modified one additionally too which will have in both the caller and tail callee say 16 extra dummy integer arguments passed through should be more than enough. Not to mention that having workarounds for such compiler bugs in packages is just weird, if the compiler is buggy, the user should just upgrade it to a fixed version. Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/6 ------------------------------------------------------------------------ On 2011-11-29T09:43:44+00:00 David Kastrup wrote: I agree that the real fix is to force an upgrade of the compiler to a fixed version. However, Ubuntu 11.10 has been released and is in circulation, so we can't reasonably implement that solution until the buggy compilers have had a reasonable chance to be replaced everywhere. I have reported this bug to Ubuntu. If you are right that it can't be found in 4.6 proper, they will have acquired it via distribution specific patches. What that means for stability and security of the entire current Ubuntu code base, one can only guess. Regarding Lilypond, we have chosen to use -fno-optimize-sibling-calls based on the gcc version number instead of an actual test, without consideration of the architecture. Tracking this bug down has cost us several weeks of developer time and brought down our build infrastructure for a while until the first workaround, -fkeep-inline- functions, has been discovered by chance. Lilypond is a C++ application with considerable parts written in Guile, so segfaults usually are a problem of forgetting garbage collection protection measures. As far as I know, I am the only active programmer with a system programming background. When the bug manifests itself in a segfault, the responsible function is no longer visible in the stack backtrace. This makes finding the culprit extremely unfunny. In our case, the problem was exacerbated because the last visible caller in the stack backtrace made its call via a function pointer table, this table was a C++ vector, and accessing the vector in gdb was not possible because operator[] had been inlined. Specifying -fkeep-inline-function, which is according to its documentation supposed to _only_ additionally emit (unused) inline function instantiations that could have been used for accessing that table in the debugger, made the bug disappear. There is no sane reason that -fkeep-inline-functions turns off sibling call optimization, but while sabotaging the debugging of this problem, it at least gave us a workaround. So we simply can't afford dealing with this kind of situation more than once. We don't have the skill sets. In contrast, the positive results of this optimization are negligible for us since we don't employ systematic call chaining (like a P code interpreter using function pointer tables likely would). Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/9 ------------------------------------------------------------------------ On 2011-11-29T10:28:48+00:00 Jakub-gcc wrote: Actually, it fails on 4.6 vanilla branch too, just not in GCC 4.6-RH, because we don't enable -fipa-sra by default for -O2/-Os for debug info quality reasons in 4.6, only for -O3 (that is something that is solved in GCC 4.7). The regression started with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=164135 which started building MEM_REF in the IPA-SRA codepath. As for your workaround, if you only do it for GCC 4.6.* and not also for 4.7 and later, it is fine, all I would like is that such workarounds don't stick around forever for future versions of GCC. Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/10 ------------------------------------------------------------------------ On 2011-11-29T11:13:21+00:00 David Kastrup wrote: Code review is at <URL:http://codereview.appspot.com/5431088>, the discussion of the bug is at <URL:http://code.google.com/p/lilypond/issues/detail?id=1997>. As you can see, the proposed workaround is restricted to g++ versions of 4.6.x. I trust that it will be fixed by the time 4.7 gets released, and once we have conclusive evidence about versions of 4.6.x that are unaffected, those will likely not get the fix either. As a suggestion: it might be sensible to have a meta option -fdebug that will disable all options significantly interfering with post mortem debugging. While -g by itself should not change code generation, having a supporting option that helps debugging might be nice. The option set I currently think of is something like -fno-crossjumping -fkeep-inline-functions -fno-optimize-sibling-calls. Also optimization of noreturn functions, in particular of abort, would be disabled since clobbering the stack traceback is not really helpful for debugging. But that's a different issue. Reply at: https://bugs.launchpad.net/gcc/+bug/897583/comments/11 ** Changed in: gcc Status: Unknown => In Progress ** Changed in: gcc Importance: Unknown => High ** Bug watch added: code.google.com/p/lilypond/issues #1997 http://code.google.com/p/lilypond/issues/detail?id=1997 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/897583 Title: Code generation bug with -O2 (-foptimize-sibling-calls) To manage notifications about this bug go to: https://bugs.launchpad.net/gcc/+bug/897583/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
