Bug#630441: g++-4.6 miscompilation
Hi there. It turns out that the problem was in the debug version - it shouldn't have worked. Further, the compiler should have noticed a reference to a stack variable being returned, but that didn't happen either. It's still my bad though. Sorry if I've wasted your time. Maybe this problem can help you guys figure out why the compiler didn't notice. Here's a detailed explanation for those interested. Notes - treedb and meta-treedb deal in offsets where allocation or freeing of heap data is involved, as they may result in having to resize the heap, which may cause the heap to move in memory. Yes, segment addressing would make this all go away, but it's not portable. Meta treedb stores data in a double-linked list node as follows - context.push_back(const InputType d) - meta::L2ListT...::push_back(Backend::pack_input(d)) - meta::L2List::PushBack(d) - return (HTREEDB_L2LISTNODE)L2LIST_pack_node(L2LIST_NS(PushBack)(context, hl, pv) This calls into l2list-impl.h, which needs to allocate a node. To do this, it calls on the BackEnd's AllocNode member, which knows how to calculate the required size based on the input data and information about the node size and alignment requirements, which it gets from the constexts description. Once returned, the node is linked into the list by L2LIST_NS(PushBack), and passed on to the caller. Meta treedb retrieves data from a double-linked list as follows context.PointerHead() gets the address of the node. context.data(node) gets the data from the node as follows: - meta:L2ListT...::data(pv) - meta::L2ListImplT...::data(pv) - Backend::unpack_data(node_to_data(pv)) The problem was that Backend::unpack_data returns a reference to a char *, which gets stepped on when optimized. Regards, Philip Ashmore -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
reference to a char * should read reference to a char * on the stack -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
Hi there. I believe I've tracked down the problem. I've published new versions of v3c(1.9.0-03), treedb(1.1.0-01) and meta-treedb(1.3.0-02) in SourceForge, which gets around this problem. It appears that gcc-4.6 (and clang for that matter) make some dodgy decisions about what appear to be references to temporaries created during optimization. Looking at meta-treedb's v3c/1-comet/cxx-string-list-test.cpp, line 75: typedef char * DataType; is the heart of the problem. The change at line 77: typedef char DataType[0]; guides the compiler along the proper path. Neither compiler has a problem with either option in debug (-O0) builds and the tests pass. I guess the compiler should issue warnings or errors, depending on how you want to handle this. Regards, Philip -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
Philip Ashmore cont...@philipashmore.com writes: It appears that gcc-4.6 (and clang for that matter) make some dodgy decisions about what appear to be references to temporaries created during optimization. You don't seem to have addressed the issue raised by Matthias Klose in the bug thread though: specifically, whether this is truly a compiler problem, or simply buggy code exposed by the newer compilers. That it works as intended with older compilers or -O0 isn't enough to show that -- it's very common for buggy code to work correctly for a long time, and then suddenly stop working when a new compiler release uses more aggressive [but correct] optimization. It seems like an important step here would be to reduce this down to a minimal test case. [The fact that both newer versions of gcc and clang show the same behavior does suggest that maybe it's the application code which is buggy.] -Miles -- One of the lessons of history is that nothing is often a good thing to do, and always a clever thing to say. -- Will Durant -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
I'll work on trying to put together a simpler test case. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
tag 630441 moreinfo help thanks On 07/21/2011 12:01 PM, Philip Ashmore wrote: Sorry if I wasn't clear. All the tests pass in the debug (-0O) build. I've got gcc/g++ 4.4.6-6 installed and all the tests pass in debug and release mode. This is a problem with the g++ 4.6 release (-03) optimization. unproven. Repeating my questions here: how do you know that it's not undefined behaviour exposed by the new compiler version? Some more information is needed: - is this seen on amd64 only, or on other architectures too? - which optimization flags are used? does lowering the optimization level works around the issue? - does it build using gcc-snapshot, gcc-4.5 or gcc-4.4? - if you have a working and a non-working build, can you try to combine object files to determine the problematic object file? -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
On 04/09/11 15:21, Matthias Klose wrote: tag 630441 moreinfo help thanks On 07/21/2011 12:01 PM, Philip Ashmore wrote: Sorry if I wasn't clear. All the tests pass in the debug (-0O) build. I've got gcc/g++ 4.4.6-6 installed and all the tests pass in debug and release mode. This is a problem with the g++ 4.6 release (-03) optimization. unproven. Well you could try it yourself, but that might be unproven too =) Repeating my questions here: how do you know that it's not undefined behaviour exposed by the new compiler version? I reported this bug because of undefined behaviour of the compiler. Some more information is needed: - is this seen on amd64 only, or on other architectures too? See Message 27 above http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#27 A 32 bit chroot means 32 bit, unless I'm mistaken. - which optimization flags are used? does lowering the optimization level works around the issue? See message 22 above http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#22 - does it build using gcc-snapshot, gcc-4.5 or gcc-4.4? I have gcc and g++ locked to version 4:4.4.5-1, where they work. time sh build.sh (from Message 12 above http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#12) ... real6m52.589s user5m37.477s sys 0m31.150s The test fails with gcc-snapshot-20110816-1 (attached) at the same place. time sh build-with-gcc-snapshot.sh ... real12m9.970s user10m57.633s sys 0m29.250s - if you have a working and a non-working build, can you try The build fails due to a test failure caused by the compiler generating incorrect code. If we want to fix this bug then working around it won't help. The same problem may happen at any time in anyones code. The fact that my code caught the problem in a test where it could be spotted is sheer good/bad luck, and is reproducible. to combine object files to determine the problematic object file? Philip build-with-gcc-snapshot.sh Description: Bourne shell script
Bug#630441: g++-4.6 miscompilation
On 09/04/2011 06:24 PM, Philip Ashmore wrote: - if you have a working and a non-working build, can you try The build fails due to a test failure caused by the compiler generating incorrect code. If we want to fix this bug then working around it won't help. you do misunderstand. somebody has to find the object file which is causing this. It's not meant as a work around, but to find the offending code. If you can track this down to an object file, split it up further, and get to a function which you claim is miscompiled. The same problem may happen at any time in anyones code. The fact that my code caught the problem in a test where it could be spotted is sheer good/bad luck, and is reproducible. and it's not uncommon that newer GCC versions expose invalid code in somebody's code. Matthias -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
On 04/09/11 17:40, Matthias Klose wrote: On 09/04/2011 06:24 PM, Philip Ashmore wrote: - if you have a working and a non-working build, can you try The build fails due to a test failure caused by the compiler generating incorrect code. If we want to fix this bug then working around it won't help. you do misunderstand. somebody has to find the object file which is causing this. It's not meant as a work around, but to find the offending code. If you can track this down to an object file, split it up further, and get to a function which you claim is miscompiled. The same problem may happen at any time in anyones code. The fact that my code caught the problem in a test where it could be spotted is sheer good/bad luck, and is reproducible. and it's not uncommon that newer GCC versions expose invalid code in somebody's code. Matthias From Message 5 above http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#5 The problem line appears to be 206: const char * p = context.data((PCVOID)node); Sorry if I wasn't clear: This is line 206 in file meta-treedb-1.3.0-01/v3c/1-comet/cxx-string-list-test.cpp. The function this line is in is int test(uint16_t abytes, uint16_t aflags, uint16_t xbytes, uint16_t xflags) and starts at line 96. This is the source file for the er, cxx-string-list-test program that failed. Philip -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
On 09/04/2011 07:35 PM, Philip Ashmore wrote: On 04/09/11 17:40, Matthias Klose wrote: On 09/04/2011 06:24 PM, Philip Ashmore wrote: - if you have a working and a non-working build, can you try The build fails due to a test failure caused by the compiler generating incorrect code. If we want to fix this bug then working around it won't help. you do misunderstand. somebody has to find the object file which is causing this. It's not meant as a work around, but to find the offending code. If you can track this down to an object file, split it up further, and get to a function which you claim is miscompiled. The same problem may happen at any time in anyones code. The fact that my code caught the problem in a test where it could be spotted is sheer good/bad luck, and is reproducible. and it's not uncommon that newer GCC versions expose invalid code in somebody's code. Matthias From Message 5 above http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#5 The problem line appears to be 206: const char * p = context.data((PCVOID)node); Sorry if I wasn't clear: This is line 206 in file meta-treedb-1.3.0-01/v3c/1-comet/cxx-string-list-test.cpp. The function this line is in is int test(uint16_t abytes, uint16_t aflags, uint16_t xbytes, uint16_t xflags) and starts at line 96. This is the source file for the er, cxx-string-list-test program that failed. do you say, that if you just re-build the test program with an older GCC version, then the test does succeed? If not, you'll still have to find the object file in the tested code, not in the testing code. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
- does it build using gcc-snapshot, gcc-4.5 or gcc-4.4? I have gcc and g++ locked to version 4:4.4.5-1, where they work. time sh build.sh (from Message 12 above http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#12) ... real6m52.589s user5m37.477s sys 0m31.150s do you say, that if you just re-build the test program with an older GCC version, then the test does succeed? If not, you'll still have to find the object file in the tested code, not in the testing code. Yes. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
On 07/21/2011 02:26 AM, Philip Ashmore wrote: This fails with g++-4.6.1-4 from testing at the same place. I've updated the packages in sourceforge, so no patches are needed. I've attached a revised build.sh which runs the tests with the current versions. I was about to dive into a rant about why didn't you try this before releasing gcc/g++ 4.6.1-4 when I spotted a bug in v3c's build system, now fixed. If you'd tried it yourself you could have let me know. Shouldn't incorrect code generation block a compiler release? how do you know that it's not undefined behaviour exposed by the new compiler version? Some more information is needed: - is this seen on amd64 only, or on other architectures too? - which optimization flags are used? does lowering the optimization level works around the issue? - does it build using gcc-snapshot, gcc-4.5 or gcc-4.4? - if you have a working and a non-working build, can you try to combine object files to determine the problematice object file? -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
Sorry if I wasn't clear. All the tests pass in the debug (-0O) build. I've got gcc/g++ 4.4.6-6 installed and all the tests pass in debug and release mode. This is a problem with the g++ 4.6 release (-03) optimization. Philip -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
I just finished running the tests inside a fresh wheezy 32 bit chroot. The results are the same. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#630441: g++-4.6 miscompilation
This fails with g++-4.6.1-4 from testing at the same place. I've updated the packages in sourceforge, so no patches are needed. I've attached a revised build.sh which runs the tests with the current versions. I was about to dive into a rant about why didn't you try this before releasing gcc/g++ 4.6.1-4 when I spotted a bug in v3c's build system, now fixed. If you'd tried it yourself you could have let me know. Shouldn't incorrect code generation block a compiler release? Please try this one. Philip build.sh Description: Bourne shell script