Bug#630441: g++-4.6 miscompilation

2011-09-23 Thread Philip Ashmore

Hi there.

It turns out that the problem was in the debug version - it shouldn't have 
worked.
Further, the compiler should have noticed a reference to a stack variable being 
returned,
but that didn't happen either.

It's still my bad though. Sorry if I've wasted your time.

Maybe this problem can help you guys figure out why the compiler didn't notice.

Here's a detailed explanation for those interested.

Notes
-
treedb and meta-treedb deal in offsets where allocation or freeing of heap data
is involved, as they may result in having to resize the heap, which may cause
the heap to move in memory.

Yes, segment addressing would make this all go away, but it's not portable.


Meta treedb stores data in a double-linked list node as follows

-  context.push_back(const InputType  d)
-  meta::L2ListT...::push_back(Backend::pack_input(d))
-  meta::L2List::PushBack(d)
-  return (HTREEDB_L2LISTNODE)L2LIST_pack_node(L2LIST_NS(PushBack)(context, 
hl, pv)
This calls into l2list-impl.h, which needs to allocate a node.
To do this, it calls on the BackEnd's AllocNode member, which knows how to
calculate the required size based on the input data and information about the
node size and alignment requirements, which it gets from the constexts 
description.

Once returned, the node is linked into the list by L2LIST_NS(PushBack), and
passed on to the caller.


Meta treedb retrieves data from a double-linked list as follows

context.PointerHead() gets the address of the node.
context.data(node) gets the data from the node as follows:
-  meta:L2ListT...::data(pv)
-  meta::L2ListImplT...::data(pv)
-  Backend::unpack_data(node_to_data(pv))

The problem was that Backend::unpack_data returns a reference to a char *,
which gets stepped on when optimized.

Regards,
Philip Ashmore




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-09-23 Thread Philip Ashmore

reference to a char * should read reference to a char * on the stack




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-09-22 Thread Philip Ashmore

Hi there.
I believe I've tracked down the problem.

I've published new versions of v3c(1.9.0-03), treedb(1.1.0-01) and 
meta-treedb(1.3.0-02) in SourceForge, which gets around this problem.


It appears that gcc-4.6 (and clang for that matter) make some dodgy 
decisions about what appear to be references to temporaries created 
during optimization.


Looking at meta-treedb's v3c/1-comet/cxx-string-list-test.cpp, line 75:
typedef char * DataType;
is the heart of the problem.
The change at line 77:
typedef char DataType[0];
guides the compiler along the proper path.

Neither compiler has a problem with either option in debug (-O0) builds 
and the tests pass.


I guess the compiler should issue warnings or errors, depending on how 
you want to handle this.


Regards,
Philip



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-09-22 Thread Miles Bader
Philip Ashmore cont...@philipashmore.com writes:
 It appears that gcc-4.6 (and clang for that matter) make some dodgy
 decisions about what appear to be references to temporaries created
 during optimization.

You don't seem to have addressed the issue raised by Matthias Klose in
the bug thread though:  specifically, whether this is truly a compiler
problem, or simply buggy code exposed by the newer compilers.

That it works as intended with older compilers or -O0 isn't enough to
show that -- it's very common for buggy code to work correctly for a
long time, and then suddenly stop working when a new compiler release
uses more aggressive [but correct] optimization.

It seems like an important step here would be to reduce this down to a
minimal test case.

[The fact that both newer versions of gcc and clang show the same
behavior does suggest that maybe it's the application code which is
buggy.]

-Miles

-- 
One of the lessons of history is that nothing is often a good thing to
do, and always a clever thing to say.  -- Will Durant



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-09-22 Thread Philip Ashmore

I'll work on trying to put together a simpler test case.




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-09-04 Thread Matthias Klose

tag 630441 moreinfo help
thanks

On 07/21/2011 12:01 PM, Philip Ashmore wrote:

Sorry if I wasn't clear.
All the tests pass in the debug (-0O) build.

I've got gcc/g++ 4.4.6-6 installed and all the tests pass in debug and release
mode.
This is a problem with the g++ 4.6 release (-03) optimization.


unproven.

Repeating my questions here:

how do you know that it's not undefined behaviour exposed by the new compiler
version?

Some more information is needed:

 - is this seen on amd64 only, or on other architectures too?
 - which optimization flags are used? does lowering the optimization
   level works around the issue?
 - does it build using gcc-snapshot, gcc-4.5 or gcc-4.4?
 - if you have a working and a non-working build, can you try
   to combine object files to determine the problematic object
   file?



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-09-04 Thread Philip Ashmore

On 04/09/11 15:21, Matthias Klose wrote:

tag 630441 moreinfo help
thanks

On 07/21/2011 12:01 PM, Philip Ashmore wrote:

Sorry if I wasn't clear.
All the tests pass in the debug (-0O) build.

I've got gcc/g++ 4.4.6-6 installed and all the tests pass in debug 
and release

mode.
This is a problem with the g++ 4.6 release (-03) optimization.


unproven.

Well you could try it yourself, but that might be unproven too =)



Repeating my questions here:

how do you know that it's not undefined behaviour exposed by the new 
compiler

version?

I reported this bug because of undefined behaviour of the compiler.



Some more information is needed:

 - is this seen on amd64 only, or on other architectures too?
See Message 27 above 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#27


A 32 bit chroot means 32 bit, unless I'm mistaken.


 - which optimization flags are used? does lowering the optimization
   level works around the issue?
See message 22 above 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#22



 - does it build using gcc-snapshot, gcc-4.5 or gcc-4.4?

I have gcc and g++ locked to version 4:4.4.5-1, where they work.

time sh build.sh (from Message 12 above 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#12)

...

real6m52.589s
user5m37.477s
sys 0m31.150s

The test fails with gcc-snapshot-20110816-1 (attached) at the same place.

time sh build-with-gcc-snapshot.sh
...
real12m9.970s
user10m57.633s
sys 0m29.250s



 - if you have a working and a non-working build, can you try
The build fails due to a test failure caused by the compiler generating 
incorrect code.

If we want to fix this bug then working around it won't help.
The same problem may happen at any time in anyones code.
The fact that my code caught the problem in a test where it could be 
spotted is sheer good/bad

luck, and is reproducible.


   to combine object files to determine the problematic object
   file?


Philip


build-with-gcc-snapshot.sh
Description: Bourne shell script


Bug#630441: g++-4.6 miscompilation

2011-09-04 Thread Matthias Klose

On 09/04/2011 06:24 PM, Philip Ashmore wrote:

- if you have a working and a non-working build, can you try

The build fails due to a test failure caused by the compiler generating
incorrect code.
If we want to fix this bug then working around it won't help.


you do misunderstand. somebody has to find the object file which is causing 
this. It's not meant as a work around, but to find the offending code. If you 
can track this down to an object file, split it up further, and get to a 
function which you claim is miscompiled.



The same problem may happen at any time in anyones code.
The fact that my code caught the problem in a test where it could be spotted is
sheer good/bad luck, and is reproducible.


and it's not uncommon that newer GCC versions expose invalid code in somebody's 
code.


  Matthias



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-09-04 Thread Philip Ashmore

On 04/09/11 17:40, Matthias Klose wrote:

On 09/04/2011 06:24 PM, Philip Ashmore wrote:

- if you have a working and a non-working build, can you try

The build fails due to a test failure caused by the compiler generating
incorrect code.
If we want to fix this bug then working around it won't help.


you do misunderstand. somebody has to find the object file which is 
causing this. It's not meant as a work around, but to find the 
offending code. If you can track this down to an object file, split it 
up further, and get to a function which you claim is miscompiled.



The same problem may happen at any time in anyones code.
The fact that my code caught the problem in a test where it could be 
spotted is

sheer good/bad luck, and is reproducible.


and it's not uncommon that newer GCC versions expose invalid code in 
somebody's code.


  Matthias

From Message 5 above 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#5



 The problem line appears to be
   206: const char * p = context.data((PCVOID)node);


Sorry if I wasn't clear:

This is line 206 in file 
meta-treedb-1.3.0-01/v3c/1-comet/cxx-string-list-test.cpp.

The function this line is in is
   int test(uint16_t abytes, uint16_t aflags, uint16_t xbytes, uint16_t xflags)
and starts at line 96.

This is the source file for the er, cxx-string-list-test program that failed.

Philip




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-09-04 Thread Matthias Klose

On 09/04/2011 07:35 PM, Philip Ashmore wrote:

On 04/09/11 17:40, Matthias Klose wrote:

On 09/04/2011 06:24 PM, Philip Ashmore wrote:

- if you have a working and a non-working build, can you try

The build fails due to a test failure caused by the compiler generating
incorrect code.
If we want to fix this bug then working around it won't help.


you do misunderstand. somebody has to find the object file which is causing
this. It's not meant as a work around, but to find the offending code. If you
can track this down to an object file, split it up further, and get to a
function which you claim is miscompiled.


The same problem may happen at any time in anyones code.
The fact that my code caught the problem in a test where it could be spotted is
sheer good/bad luck, and is reproducible.


and it's not uncommon that newer GCC versions expose invalid code in
somebody's code.

Matthias


 From Message 5 above http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#5


The problem line appears to be
206: const char * p = context.data((PCVOID)node);


Sorry if I wasn't clear:

This is line 206 in file 
meta-treedb-1.3.0-01/v3c/1-comet/cxx-string-list-test.cpp.

The function this line is in is
int test(uint16_t abytes, uint16_t aflags, uint16_t xbytes, uint16_t xflags)
and starts at line 96.

This is the source file for the er, cxx-string-list-test program that failed.


do you say, that if you just re-build the test program with an older GCC 
version, then the test does succeed? If not, you'll still have to find the 
object file in the tested code, not in the testing code.




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-09-04 Thread Philip Ashmore

  - does it build using gcc-snapshot, gcc-4.5 or gcc-4.4?

  I have gcc and g++ locked to version 4:4.4.5-1, where they work.

  time sh build.sh (from Message 12 above
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630441#12)
  ...

  real6m52.589s
  user5m37.477s
  sys 0m31.150s

 do you say, that if you just re-build the test program with an older GCC 
version, then the test does succeed? If not, you'll still have to find the 
object file in the tested code, not in the testing code.


Yes.




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-07-21 Thread Matthias Klose
On 07/21/2011 02:26 AM, Philip Ashmore wrote:
 This fails with g++-4.6.1-4 from testing at the same place.
 
 I've updated the packages in sourceforge, so no patches are needed.
 I've attached a revised build.sh which runs the tests with the
 current versions.
 
 I was about to dive into a rant about why didn't you try this before
 releasing gcc/g++ 4.6.1-4 when I spotted a bug in v3c's build system,
 now fixed.
 
 If you'd tried it yourself you could have let me know.
 Shouldn't incorrect code generation block a compiler release?

how do you know that it's not undefined behaviour exposed by the new compiler
version?

Some more information is needed:

 - is this seen on amd64 only, or on other architectures too?
 - which optimization flags are used? does lowering the optimization
   level works around the issue?
 - does it build using gcc-snapshot, gcc-4.5 or gcc-4.4?
 - if you have a working and a non-working build, can you try
   to combine object files to determine the problematice object
   file?



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-07-21 Thread Philip Ashmore

Sorry if I wasn't clear.
All the tests pass in the debug (-0O) build.

I've got gcc/g++ 4.4.6-6 installed and all the tests pass in debug and 
release mode.

This is a problem with the g++ 4.6 release (-03) optimization.

Philip



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-07-21 Thread Philip Ashmore

I just finished running the tests inside a fresh wheezy 32 bit chroot.
The results are the same.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630441: g++-4.6 miscompilation

2011-07-20 Thread Philip Ashmore

This fails with g++-4.6.1-4 from testing at the same place.

I've updated the packages in sourceforge, so no patches are needed.
I've attached a revised build.sh which runs the tests with the
current versions.

I was about to dive into a rant about why didn't you try this before
releasing gcc/g++ 4.6.1-4 when I spotted a bug in v3c's build system,
now fixed.

If you'd tried it yourself you could have let me know.
Shouldn't incorrect code generation block a compiler release?

Please try this one.

Philip


build.sh
Description: Bourne shell script