Re: Additional debug info to aid cacheline analysis

2020-11-02 Thread Namhyung Kim
Hi Masami,

On Mon, Nov 2, 2020 at 5:27 PM Masami Hiramatsu  wrote:
>
> Hi,
>
> On Fri, 30 Oct 2020 11:10:04 +0100
> Peter Zijlstra  wrote:
>
> > On Fri, Oct 30, 2020 at 10:16:49AM +0100, Mark Wielaard wrote:
> > > Hi Namhyung,
> > >
> > > On Fri, Oct 30, 2020 at 02:26:19PM +0900, Namhyung Kim wrote:
> > > > On Thu, Oct 8, 2020 at 6:38 PM Mark Wielaard  wrote:
> > > > > GCC using -fvar-tracking and -fvar-tracking-assignments is pretty good
> > > > > at keeping track of where variables are held (in memory or registers)
> > > > > when in the program, even through various optimizations.
> > > > >
> > > > > -fvar-tracking-assignments is the default with -g -O2.
> > > > > Except for the upstream linux kernel code. Most distros enable it
> > > > > again, but you do want to enable it by hand when building from the
> > > > > upstream linux git repo.
> > > >
> > > > Please correct me if I'm wrong.  This seems to track local variables.
> > > > But I'm not sure it's enough for this purpose as we want to know
> > > > types of any memory references (not directly from a variable).
> > > >
> > > > Let's say we have a variable like below:
> > > >
> > > >   struct xxx a;
> > > >
> > > >   a.b->c->d++;
> > > >
> > > > And we have a sample where 'd' is updated, then how can we know
> > > > it's from the variable 'a'?  Maybe we don't need to know it, but we
> > > > should know it accesses the 'd' field in the struct 'c'.
> > > >
> > > > Probably we can analyze the asm code and figure out it's from 'a'
> > > > and accessing 'd' at the moment.  I'm curious if there's a way in
> > > > the DWARF to help this kind of work.
> > >
> > > DWARF does have that information, but it stores it in a way that is
> > > kind of opposite to how you want to access it. Given a variable and an
> > > address, you can easily get the location where that variable is
> > > stored. But if you want to map back from a given (memory) location and
> > > address to the variable, that is more work.
> >
> > The principal idea in this thread doesn't care about the address of the
> > variables. The idea was to get the data type and member information from
> > the instruction.
> >
> > So in the above example: a.b->c->d++; what we'll end up with is
> > something like:
> >
> >   inc 8(%rax)
> >
> > Where %rax contains c, and the offset of d in c is 8.
>
> For this simple case, it is possible.
>
> This offset information is stored in the DWARF as a data-structure type
> information. (perf-probe uses it to find how to get the given local var's
> fields)
>
> So if we do this off-line, I think it is possible if it is recorded with
> instruction-pointers. For each place, we can do
>
>  - decode instruction and get the access address.
>  - get var assignment of %rax at that IP.
>  - get type information of var and find the field from offset.
>
> However, the problem is that if the DWARF has only assignment of "a",
> we need to decode the function body. (and usually this happens)
>
> func() {
>  struct xxx a;
>  ...
>  a.b->c->d++;
> }
>
> In this case, only "a" is the local variable. So DWARF records assignment of
> "a", not "b" nor "c" (since those are not a name of variables, just a name
> of fields). GCC may generate something like
>
>  mov16(%rsp),%rdx   // rdx = a.b
>  mov8(%rdx),%rax// rax = b->c
>  inc8(%rax) // c->d++

Right, it'd be really nice if compiler can add information about the
(hidden) assignments in the rdx and rax here.

Thanks
Namhyung


Re: Additional debug info to aid cacheline analysis

2020-11-02 Thread Masami Hiramatsu
Hi,

On Fri, 30 Oct 2020 11:10:04 +0100
Peter Zijlstra  wrote:

> On Fri, Oct 30, 2020 at 10:16:49AM +0100, Mark Wielaard wrote:
> > Hi Namhyung,
> > 
> > On Fri, Oct 30, 2020 at 02:26:19PM +0900, Namhyung Kim wrote:
> > > On Thu, Oct 8, 2020 at 6:38 PM Mark Wielaard  wrote:
> > > > GCC using -fvar-tracking and -fvar-tracking-assignments is pretty good
> > > > at keeping track of where variables are held (in memory or registers)
> > > > when in the program, even through various optimizations.
> > > >
> > > > -fvar-tracking-assignments is the default with -g -O2.
> > > > Except for the upstream linux kernel code. Most distros enable it
> > > > again, but you do want to enable it by hand when building from the
> > > > upstream linux git repo.
> > > 
> > > Please correct me if I'm wrong.  This seems to track local variables.
> > > But I'm not sure it's enough for this purpose as we want to know
> > > types of any memory references (not directly from a variable).
> > > 
> > > Let's say we have a variable like below:
> > > 
> > >   struct xxx a;
> > > 
> > >   a.b->c->d++;
> > > 
> > > And we have a sample where 'd' is updated, then how can we know
> > > it's from the variable 'a'?  Maybe we don't need to know it, but we
> > > should know it accesses the 'd' field in the struct 'c'.
> > > 
> > > Probably we can analyze the asm code and figure out it's from 'a'
> > > and accessing 'd' at the moment.  I'm curious if there's a way in
> > > the DWARF to help this kind of work.
> > 
> > DWARF does have that information, but it stores it in a way that is
> > kind of opposite to how you want to access it. Given a variable and an
> > address, you can easily get the location where that variable is
> > stored. But if you want to map back from a given (memory) location and
> > address to the variable, that is more work.
> 
> The principal idea in this thread doesn't care about the address of the
> variables. The idea was to get the data type and member information from
> the instruction.
> 
> So in the above example: a.b->c->d++; what we'll end up with is
> something like:
> 
>   inc 8(%rax)
> 
> Where %rax contains c, and the offset of d in c is 8.

For this simple case, it is possible.

This offset information is stored in the DWARF as a data-structure type
information. (perf-probe uses it to find how to get the given local var's
fields)

So if we do this off-line, I think it is possible if it is recorded with
instruction-pointers. For each place, we can do

 - decode instruction and get the access address. 
 - get var assignment of %rax at that IP.
 - get type information of var and find the field from offset.

However, the problem is that if the DWARF has only assignment of "a",
we need to decode the function body. (and usually this happens)

func() {
 struct xxx a;
 ...
 a.b->c->d++;
}

In this case, only "a" is the local variable. So DWARF records assignment of
"a", not "b" nor "c" (since those are not a name of variables, just a name
of fields). GCC may generate something like

 mov16(%rsp),%rdx   // rdx = a.b
 mov8(%rdx),%rax// rax = b->c
 inc8(%rax) // c->d++

GCC only knows "a" is 0(%rsp), there is no other "assignments". Thus we need
to backtrace the %rax from the hit ip address until known assignment register
appears.

Note that if there is a loop, we have to trace it back too, but it's more hard,

func() {
 struct yyy a;
 int i;
 ...
 for (i = 0; i < 100; i++)
   a.b->c[i]++;
}

In this case, GCC will optimize "i" out and make an end-address.
(This is what GCC -O2 generated)

1190 :
{
1190:   f3 0f 1e fa endbr64 
struct yyy a = *_a;
1194:   48 8b 57 10 mov0x10(%rdi),%rdx
for (i = 0; i < 100; i++)
1198:   48 8d 42 08 lea0x8(%rdx),%rax
119c:   48 81 c2 98 01 00 00add$0x198,%rdx
11a3:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
a.b->c[i]++;
11a8:   83 00 01addl   $0x1,(%rax)
for (i = 0; i < 100; i++)
11ab:   48 83 c0 04 add$0x4,%rax
11af:   48 39 d0cmp%rdx,%rax
11b2:   75 f4   jne11a8 
}
11b4:   c3  retq   

If we ignore the array support, this can be simplified as

1194:   48 8b 57 10 mov0x10(%rdi),%rdx
1198:   48 8d 42 08 lea0x8(%rdx),%rax
11a8:   83 00 01addl   $0x1,(%rax)

and maybe able to decode it.

Thank you,

> So what we want to (easily) find for that instruction is c::d.
> 
> So given any instruction with a memop (either load or store) we want to
> find: type::member.
> 
> 


-- 
Masami Hiramatsu 


Re: Additional debug info to aid cacheline analysis

2020-10-30 Thread Peter Zijlstra
On Fri, Oct 30, 2020 at 10:16:49AM +0100, Mark Wielaard wrote:
> Hi Namhyung,
> 
> On Fri, Oct 30, 2020 at 02:26:19PM +0900, Namhyung Kim wrote:
> > On Thu, Oct 8, 2020 at 6:38 PM Mark Wielaard  wrote:
> > > GCC using -fvar-tracking and -fvar-tracking-assignments is pretty good
> > > at keeping track of where variables are held (in memory or registers)
> > > when in the program, even through various optimizations.
> > >
> > > -fvar-tracking-assignments is the default with -g -O2.
> > > Except for the upstream linux kernel code. Most distros enable it
> > > again, but you do want to enable it by hand when building from the
> > > upstream linux git repo.
> > 
> > Please correct me if I'm wrong.  This seems to track local variables.
> > But I'm not sure it's enough for this purpose as we want to know
> > types of any memory references (not directly from a variable).
> > 
> > Let's say we have a variable like below:
> > 
> >   struct xxx a;
> > 
> >   a.b->c->d++;
> > 
> > And we have a sample where 'd' is updated, then how can we know
> > it's from the variable 'a'?  Maybe we don't need to know it, but we
> > should know it accesses the 'd' field in the struct 'c'.
> > 
> > Probably we can analyze the asm code and figure out it's from 'a'
> > and accessing 'd' at the moment.  I'm curious if there's a way in
> > the DWARF to help this kind of work.
> 
> DWARF does have that information, but it stores it in a way that is
> kind of opposite to how you want to access it. Given a variable and an
> address, you can easily get the location where that variable is
> stored. But if you want to map back from a given (memory) location and
> address to the variable, that is more work.

The principal idea in this thread doesn't care about the address of the
variables. The idea was to get the data type and member information from
the instruction.

So in the above example: a.b->c->d++; what we'll end up with is
something like:

inc 8(%rax)

Where %rax contains c, and the offset of d in c is 8.

So what we want to (easily) find for that instruction is c::d.

So given any instruction with a memop (either load or store) we want to
find: type::member.




Re: Additional debug info to aid cacheline analysis

2020-10-30 Thread Mark Wielaard
Hi Namhyung,

On Fri, Oct 30, 2020 at 02:26:19PM +0900, Namhyung Kim wrote:
> On Thu, Oct 8, 2020 at 6:38 PM Mark Wielaard  wrote:
> > GCC using -fvar-tracking and -fvar-tracking-assignments is pretty good
> > at keeping track of where variables are held (in memory or registers)
> > when in the program, even through various optimizations.
> >
> > -fvar-tracking-assignments is the default with -g -O2.
> > Except for the upstream linux kernel code. Most distros enable it
> > again, but you do want to enable it by hand when building from the
> > upstream linux git repo.
> 
> Please correct me if I'm wrong.  This seems to track local variables.
> But I'm not sure it's enough for this purpose as we want to know
> types of any memory references (not directly from a variable).
> 
> Let's say we have a variable like below:
> 
>   struct xxx a;
> 
>   a.b->c->d++;
> 
> And we have a sample where 'd' is updated, then how can we know
> it's from the variable 'a'?  Maybe we don't need to know it, but we
> should know it accesses the 'd' field in the struct 'c'.
> 
> Probably we can analyze the asm code and figure out it's from 'a'
> and accessing 'd' at the moment.  I'm curious if there's a way in
> the DWARF to help this kind of work.

DWARF does have that information, but it stores it in a way that is
kind of opposite to how you want to access it. Given a variable and an
address, you can easily get the location where that variable is
stored. But if you want to map back from a given (memory) location and
address to the variable, that is more work.

In theory what you could do is make a list of global variables from
the top-level DWARF CUs. Then take the debug aranges to map from the
program address to the DWARF CU that covers that address. Then for
that CU you would walk the CU DIE tree while keeping track of all
variables in scope till you find the function covering that
address. Then for each global variable and all variables in scope you
get the DWARF location description at the given address (for global
ones that is most likely always a static address, but for local ones
it depends on where exactly in the program you take the sample). That
plus the type information for each variable should then make it
possible to see which variable covers the given memory location.

But that is a lot of work to do for each sample.

Cheers,

Mark


Re: Additional debug info to aid cacheline analysis

2020-10-29 Thread Namhyung Kim
Hello,

On Thu, Oct 8, 2020 at 6:38 PM Mark Wielaard  wrote:
>
> Hi,
>
> On Thu, 2020-10-08 at 09:02 +0200, Peter Zijlstra wrote:
> > Some time ago, I had my intern pursue the other 2 approaches for
> > > symbolization. The one I see as most promising is by using the DWARF
> > > information (no BPF needed). The good news is that I believe we do not
> > > need more information than what is already there. We just need the
> > > compiler to generate valid DWARF at most optimization levels, which I
> > > believe is not the case for LLVM based compilers but maybe okay for
> > > GCC.
> >
> > Right, I think GCC improved a lot on this front over the past few years.
> > Also added Andi and Masami, who have worked on this or related topics.
>
> For GCC Alexandre Oliva did a really thorough write up of all the
> various optimization and their effect on debugging/DWARF:
> https://www.fsfla.org/~lxoliva/writeups/gOlogy/gOlogy.html

Thanks for the link.  Looks very nice.

>
> GCC using -fvar-tracking and -fvar-tracking-assignments is pretty good
> at keeping track of where variables are held (in memory or registers)
> when in the program, even through various optimizations.
>
> -fvar-tracking-assignments is the default with -g -O2.
> Except for the upstream linux kernel code. Most distros enable it
> again, but you do want to enable it by hand when building from the
> upstream linux git repo.

Please correct me if I'm wrong.  This seems to track local variables.
But I'm not sure it's enough for this purpose as we want to know
types of any memory references (not directly from a variable).

Let's say we have a variable like below:

  struct xxx a;

  a.b->c->d++;

And we have a sample where 'd' is updated, then how can we know
it's from the variable 'a'?  Maybe we don't need to know it, but we
should know it accesses the 'd' field in the struct 'c'.

Probably we can analyze the asm code and figure out it's from 'a'
and accessing 'd' at the moment.  I'm curious if there's a way in
the DWARF to help this kind of work.

Thanks
Namhyung


Re: Additional debug info to aid cacheline analysis

2020-10-11 Thread Florian Weimer
* Mark Wielaard:

> On Sun, Oct 11, 2020 at 02:15:18PM +0200, Florian Weimer wrote:
>> * Mark Wielaard:
>> 
>> > Yes, that would work. I don't know what the lowest supported GCC
>> > version is, but technically it was definitely fixed in 4.10.0, 4.8.4
>> > and 4.9.2. And various distros would probably have backported the
>> > fix. But checking for 5.0+ would certainly give you a good version.
>> >
>> > How about the attached?
>> 
>> Would it be possible to test for the actual presence of the bug, using
>> -fcompare-debug?
>
> Yes, that was discussed in the original commit message, but it was decided
> that disabling it unconditionaly was easier. See commit 2062afb4f.

I think the short test case was not yet available at the time of the
Linux commit.  But then it may not actually detect the bug in all
affected compilers.

Anyway, making this conditional on the GCC version is already a clear
improvement.


Re: Additional debug info to aid cacheline analysis

2020-10-11 Thread Mark Wielaard
On Sun, Oct 11, 2020 at 02:15:18PM +0200, Florian Weimer wrote:
> * Mark Wielaard:
> 
> > Yes, that would work. I don't know what the lowest supported GCC
> > version is, but technically it was definitely fixed in 4.10.0, 4.8.4
> > and 4.9.2. And various distros would probably have backported the
> > fix. But checking for 5.0+ would certainly give you a good version.
> >
> > How about the attached?
> 
> Would it be possible to test for the actual presence of the bug, using
> -fcompare-debug?

Yes, that was discussed in the original commit message, but it was decided
that disabling it unconditionaly was easier. See commit 2062afb4f.

Cheers,

Mark


Re: Additional debug info to aid cacheline analysis

2020-10-11 Thread Florian Weimer
* Mark Wielaard:

> Yes, that would work. I don't know what the lowest supported GCC
> version is, but technically it was definitely fixed in 4.10.0, 4.8.4
> and 4.9.2. And various distros would probably have backported the
> fix. But checking for 5.0+ would certainly give you a good version.
>
> How about the attached?

Would it be possible to test for the actual presence of the bug, using
-fcompare-debug?

(But it seems to me that the treatment of this particular compiler bug
is an outlier: other equally tricky bugs do not receive this kind of
attention.)


Re: Additional debug info to aid cacheline analysis

2020-10-11 Thread Segher Boessenkool
Hi!

On Sat, Oct 10, 2020 at 10:58:36PM +0200, Mark Wielaard wrote:
> On Thu, Oct 08, 2020 at 02:23:00PM -0700, Andi Kleen wrote:
> > So I guess could disable it for 5.0+ only. 
> 
> Yes, that would work. I don't know what the lowest supported GCC
> version is, but technically it was definitely fixed in 4.10.0, 4.8.4
> and 4.9.2.

Fwiw, GCC 4.10 was renamed to GCC 5 before it was released (it was the
first release with the new version number scheme).  Only old development
versions (that no one should use) identify as 4.10.

> And various distros would probably have backported the
> fix. But checking for 5.0+ would certainly give you a good version.

Yes, esp. since some versions of 4.9 and 4.8 are still buggy.  No one
should use any version for which a newer bug-fix release has long been
available, but do you want to deal with bugs from people who do not?


Segher


Re: Additional debug info to aid cacheline analysis

2020-10-10 Thread Mark Wielaard
On Thu, Oct 08, 2020 at 02:23:00PM -0700, Andi Kleen wrote:
> > Basically you simply want to remove this line in the top-level
> > Makefile:
> > 
> > DEBUG_CFLAGS:= $(call cc-option, -fno-var-tracking-assignments)
> 
> It looks like this was needed as a workaround for a gcc bug that was there
> from 4.5 to 4.9.
> 
> So I guess could disable it for 5.0+ only. 

Yes, that would work. I don't know what the lowest supported GCC
version is, but technically it was definitely fixed in 4.10.0, 4.8.4
and 4.9.2. And various distros would probably have backported the
fix. But checking for 5.0+ would certainly give you a good version.

How about the attached?

Cheers,

Mark>From 48628d3cf2d829a90cd6622355eada1b30cb10c1 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Sat, 10 Oct 2020 22:47:21 +0200
Subject: [PATCH] Only add -fno-var-tracking-assignments workaround for old GCC
 versions.

Some old GCC versions between 4.5.0 and 4.9.1 might miscompile code
with -fvar-tracking-assingments (which is enabled by default with -g -O2).
commit 2062afb4f added -fno-var-tracking-assignments unconditionally to
workaround this. But newer versions of GCC no longer have this bug, so
only add it for versions of GCC before 5.0.

Signed-off-by: Mark Wielaard 
---
 Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index f84d7e4ca0be..4f4a9416a87a 100644
--- a/Makefile
+++ b/Makefile
@@ -813,7 +813,9 @@ KBUILD_CFLAGS	+= -ftrivial-auto-var-init=zero
 KBUILD_CFLAGS	+= -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
 endif
 
-DEBUG_CFLAGS	:= $(call cc-option, -fno-var-tracking-assignments)
+# Workaround https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801
+# for old versions of GCC.
+DEBUG_CFLAGS	:= $(call cc-ifversion, -lt, 0500, $(call cc-option, -fno-var-tracking-assignments))
 
 ifdef CONFIG_DEBUG_INFO
 ifdef CONFIG_DEBUG_INFO_SPLIT
-- 
2.18.4



Re: Additional debug info to aid cacheline analysis

2020-10-10 Thread Mark Wielaard
On Sat, Oct 10, 2020 at 10:58:36PM +0200, Mark Wielaard wrote:
> Yes, that would work. I don't know what the lowest supported GCC
> version is, but technically it was definitely fixed in 4.10.0, 4.8.4
> and 4.9.2. And various distros would probably have backported the
> fix. But checking for 5.0+ would certainly give you a good version.
> 
> How about the attached?

Looks like vger just throws away emails with patch attachements.  How
odd.  I'll try sending it as reply to this message with git-send-email.

Cheers,

Mark



Re: Additional debug info to aid cacheline analysis

2020-10-08 Thread Andi Kleen
> Basically you simply want to remove this line in the top-level
> Makefile:
> 
> DEBUG_CFLAGS:= $(call cc-option, -fno-var-tracking-assignments)

It looks like this was needed as a workaround for a gcc bug that was there
from 4.5 to 4.9.

So I guess could disable it for 5.0+ only. 

-Andi


Re: Additional debug info to aid cacheline analysis

2020-10-08 Thread Mark Wielaard
Hi,

On Thu, 2020-10-08 at 09:02 +0200, Peter Zijlstra wrote:
> Some time ago, I had my intern pursue the other 2 approaches for
> > symbolization. The one I see as most promising is by using the DWARF
> > information (no BPF needed). The good news is that I believe we do not
> > need more information than what is already there. We just need the
> > compiler to generate valid DWARF at most optimization levels, which I
> > believe is not the case for LLVM based compilers but maybe okay for
> > GCC.
> 
> Right, I think GCC improved a lot on this front over the past few years.
> Also added Andi and Masami, who have worked on this or related topics.

For GCC Alexandre Oliva did a really thorough write up of all the
various optimization and their effect on debugging/DWARF:
https://www.fsfla.org/~lxoliva/writeups/gOlogy/gOlogy.html

GCC using -fvar-tracking and -fvar-tracking-assignments is pretty good
at keeping track of where variables are held (in memory or registers)
when in the program, even through various optimizations.

-fvar-tracking-assignments is the default with -g -O2.
Except for the upstream linux kernel code. Most distros enable it
again, but you do want to enable it by hand when building from the
upstream linux git repo.

Basically you simply want to remove this line in the top-level
Makefile:

DEBUG_CFLAGS:= $(call cc-option, -fno-var-tracking-assignments)

Cheers,

Mark


Re: Additional debug info to aid cacheline analysis

2020-10-08 Thread Peter Zijlstra


My appologies for adding a typo to the linux-kernel address, corrected
now.

On Wed, Oct 07, 2020 at 10:58:00PM -0700, Stephane Eranian wrote:
> Hi Peter,
> 
> On Tue, Oct 6, 2020 at 6:17 AM Peter Zijlstra  wrote:
> >
> > Hi all,
> >
> > I've been trying to float this idea for a fair number of years, and I
> > think at least Stephane has been talking to tools people about it, but
> > I'm not sure what, if anything, ever happened with it, so let me post it
> > here :-)
> >
> >
> Thanks for bringing this back. This is a pet project of mine and I
> have been looking at it for the last 4 years intermittently now.
> Simply never got a chance to complete because preempted by other
> higher priority projects. I have developed an internal
> proof-of-concept  prototype using one of the 3 approaches I know.  My
> goal was to demonstrate that PMU statistical sampling of loads/stores
> and with data addresses would work as well as instrumentation. This is
> slightly different from hit/miss in the analysis but the process is
> the same.
> 
> As you point out, the difficulty is not so much in collecting the
> sample but rather in symbolizing data addresses from the heap.

Right, that's non-trivial, although for static and per-cpu objects it
should be rather straight forward,  heap objects are going to be a pain.
You'd basically have to also log the alloc/free of every object along
with the data type used for it, which is not something we have readily
abailable at the allocator.

> Intel PEBS, IBM Marked Events work well to collect the data. AMD IBS
> works though you get a lot of irrelevant samples due to lack of
> hardware filtering. ARM SPE would work too.  Overall, all the major
> architectures will provide the sampling support needed.

That's for the data address, or also the eventing IP?

> Some time ago, I had my intern pursue the other 2 approaches for
> symbolization. The one I see as most promising is by using the DWARF
> information (no BPF needed). The good news is that I believe we do not
> need more information than what is already there. We just need the
> compiler to generate valid DWARF at most optimization levels, which I
> believe is not the case for LLVM based compilers but maybe okay for
> GCC.

Right, I think GCC improved a lot on this front over the past few years.
Also added Andi and Masami, who have worked on this or related topics.

> Once we have the DWARF logic in place then it is easier to improve
> perf report/annotate do to hit/miss or hot/cold, read/write analysis
> on each data type and fields within.
> 
> Once we have the code for perf, we are planning to contribute it upstream.
> 
> In the meantime, we need to lean on the compiler teams to ensure no
> data type information is lost with high optimizations levels.  My
> understanding from talking with some compiler folks is that this is
> not a trivial fix.

As you might have noticed, I send this to the linux-toolchains list.
While you lean on your copmiler folks, try and get them subscribed to
this list. It is meant to discuss toolchain issues as related to Linux.
Both GCC/binutils and LLVM should be represented here.