Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-19 Thread Julius Werner
Sounds like this could've been solved with a simple ALIGN(8) in the
ldscript, right? I don't know what made the compiler think that it
would have to align i386 pointers to 8 byte (which seems to be what
happened), but if it makes that decision then it should also conclude
that sizeof(struct boot_state_init_entry) == 0x20 and the array
traversal should still work. If those structures need to be writable
we could simply move them from .rodata to .data in the ldscript.

Aaron, do you have a definitive source for the no two symbols may be
the same rule? Because that's a pretty common pattern, I would be
surprised if it was incorrect (especially since there's things like
__attribute__((alias)) that explicitly allow you to do it even in C).
I can't find anything in a cursory glance at the standard... in fact,
6.5.9.6 (of some weird 2007 draft version I found, at least) says:

Two pointers compare equal if and only if [...] or one is a pointer
to one past the end of an array object and the other is a pointer to
the start of a different array object that happens to immediately
follow the first array object in the address space.

From that I'd conclude that if you did:

extern struct boot_state_init_entry _bs_init_begin[];
extern struct boot_state_init_entry _bs_init_end[];

for (cur = _bs_init_begin; cur  _bs_init_end; cur++) ...

the compiler shouldn't be able to guarantee that the first array
wasn't empty and therefore pointing to one after the end which is
the start of the next array in the first iteration.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-19 Thread Julius Werner
 You are right that it would work, but back solving for which alignment
 the compiler decided is hard. It's definitely whack-a-mole in that
 case. It could have very well decided 16 too. Without any insight as
 to why that breaks down. Or you go the route of putting alignments on
 structures just for that behavior which I consider not the best
 approach.

Well, we only need to pick an alignment *larger* than what the
compiler decides, right? I'd hazard a guess that 8 would be enough for
a long time (GCC probably just got confused between 32 and 64 bit
modes or something... there's no other reason I could think of to
explain this).

 From that I'd conclude that if you did:

 extern struct boot_state_init_entry _bs_init_begin[];
 extern struct boot_state_init_entry _bs_init_end[];

 for (cur = _bs_init_begin; cur  _bs_init_end; cur++) ...

 the compiler shouldn't be able to guarantee that the first array
 wasn't empty and therefore pointing to one after the end which is
 the start of the next array in the first iteration.

 I think you are right on that point -- using '' instead of equality.
 But those aren't pointer types. Not sure if the spec things arrays
 like that are poitners or a separate type. Either way, hopefully there
 isn't some language in the spec that suggests comparisons like that
 can be optimized away.

The array degenerates to a pointer at that point AFAIK. You could also
write _bs_init_begin... for an array type the two should be pretty
much equivalent.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-19 Thread Aaron Durbin
On Thu, Mar 19, 2015 at 5:29 PM, Julius Werner jwer...@chromium.org wrote:
 Sounds like this could've been solved with a simple ALIGN(8) in the
 ldscript, right? I don't know what made the compiler think that it
 would have to align i386 pointers to 8 byte (which seems to be what
 happened), but if it makes that decision then it should also conclude
 that sizeof(struct boot_state_init_entry) == 0x20 and the array
 traversal should still work. If those structures need to be writable
 we could simply move them from .rodata to .data in the ldscript.

You are right that it would work, but back solving for which alignment
the compiler decided is hard. It's definitely whack-a-mole in that
case. It could have very well decided 16 too. Without any insight as
to why that breaks down. Or you go the route of putting alignments on
structures just for that behavior which I consider not the best
approach.

They do need to be writable since our linked list is within them.


 Aaron, do you have a definitive source for the no two symbols may be
 the same rule? Because that's a pretty common pattern, I would be
 surprised if it was incorrect (especially since there's things like
 __attribute__((alias)) that explicitly allow you to do it even in C).
 I can't find anything in a cursory glance at the standard... in fact,
 6.5.9.6 (of some weird 2007 draft version I found, at least) says:

I don't  have a pointer. I only remember clang doing something because
of that behavior. I think Patrick may remember the details.


 Two pointers compare equal if and only if [...] or one is a pointer
 to one past the end of an array object and the other is a pointer to
 the start of a different array object that happens to immediately
 follow the first array object in the address space.

 From that I'd conclude that if you did:

 extern struct boot_state_init_entry _bs_init_begin[];
 extern struct boot_state_init_entry _bs_init_end[];

 for (cur = _bs_init_begin; cur  _bs_init_end; cur++) ...

 the compiler shouldn't be able to guarantee that the first array
 wasn't empty and therefore pointing to one after the end which is
 the start of the next array in the first iteration.

I think you are right on that point -- using '' instead of equality.
But those aren't pointer types. Not sure if the spec things arrays
like that are poitners or a separate type. Either way, hopefully there
isn't some language in the spec that suggests comparisons like that
can be optimized away.

-Aaron

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-19 Thread Timothy Pearson

On 03/19/2015 09:21 PM, Julius Werner wrote:

That said, I went back and looked at the assembly dump. It was using
0x14 as the size of the structure when sequencing through the entries.

001465dc R _bs_init_begin
001465e0 r cbmem_bscb
00146600 r gcov_bscb
0014663c R _bs_init_end

Each *entry* was aligned to 0x20.  Just aligning the symbol wouldn't
have fixed it. That means we'd need to mark the structure declaration
as aligned to a specific value and annotate it accordingly in the
linker script.  That's one route or just go w/ pointers to the
structures.


Yeah, that's what I was talking about in the first mail... I think
this means that the compiler would consider the sizeof() for each
structure to be 0x20. I'm pretty sure there's some kind of rule that
the sizeof() for any object must be evenly divisible by its minimum
alignment... that's how normal arrays work as well (e.g. if you had
struct mystruct { uint32_t a; uint8_t b; } and don't mark it
__packed__, then sizeof(struct mystruct) is 0x8 and an array of them
would have 3 bytes padding after each entry). So I'm pretty sure the
pointer++ mechanism to walk through it would've still worked if the
initial address had been right. (can we repro this? then we could
simply try it out instead of guessworking.)


Reproduction should be a simple matter of reverting 9ef9d859 on your 
local tree and building for the ASUS KFSN4-DRE; the breakage was 
extremely consistent and reproducible.  The only wildcard would be that 
gcc was building coreboot with specific optimisations for the AMD Fam10h 
processor; you might need to force the appropriate flags set 
(-mtune=amdfam10 ?) if building on some other CPU.


--
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645
http://www.raptorengineeringinc.com

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-19 Thread Aaron Durbin via coreboot
On Thu, Mar 19, 2015 at 7:53 PM, Julius Werner jwer...@chromium.org wrote:
   145a8a:   83 c3 14add$0x14,%ebx

 Okay, sorry, I didn't know you looked that closely into this. That's
 quite unrefuteable.

 The only question that I still have is then WTF the compiler was
 thinking... this just sounds like a plain bug somewhere (but I agree
 that doesn't really help us much). We could still work around it with
 an explicit __attribute__((aligned, 4)) or __attribute__((packed)) if
 that works... I guess if the compiler does this for no apparent
 reason, then there's no real guarantee that a simple list of pointers
 won't also get screwed up like this?

Well... I was working the assumption an array of pointers couldn't
possibly get hosed. That's about all I got for that. ;)

As for decorating the structures it's just sort of a pain in that you
need to match alignment of the symbol in the linker script and the
structure declaration -- for every object that we place like this in
the linker scripts. Definitely doable though. Something to keep in
one's back pocket.

-Aaron

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-19 Thread Aaron Durbin via coreboot
On Thu, Mar 19, 2015 at 7:21 PM, Julius Werner jwer...@chromium.org wrote:
 That said, I went back and looked at the assembly dump. It was using
 0x14 as the size of the structure when sequencing through the entries.

 001465dc R _bs_init_begin
 001465e0 r cbmem_bscb
 00146600 r gcov_bscb
 0014663c R _bs_init_end

 Each *entry* was aligned to 0x20.  Just aligning the symbol wouldn't
 have fixed it. That means we'd need to mark the structure declaration
 as aligned to a specific value and annotate it accordingly in the
 linker script.  That's one route or just go w/ pointers to the
 structures.

 Yeah, that's what I was talking about in the first mail... I think
 this means that the compiler would consider the sizeof() for each
 structure to be 0x20. I'm pretty sure there's some kind of rule that
 the sizeof() for any object must be evenly divisible by its minimum
 alignment... that's how normal arrays work as well (e.g. if you had
 struct mystruct { uint32_t a; uint8_t b; } and don't mark it
 __packed__, then sizeof(struct mystruct) is 0x8 and an array of them
 would have 3 bytes padding after each entry). So I'm pretty sure the
 pointer++ mechanism to walk through it would've still worked if the
 initial address had been right. (can we repro this? then we could
 simply try it out instead of guessworking.)

I think you misunderstood what I wrote; I wasn't guessworking.

The bscb structure arrays were aligned to 0x20. But the structure size
to the compiler is 0x14. It's 0x14 in gdb and it's 0x14 in the
instructions calculating the next entry. That's how I diagnosed this
to begin with. From 4.92/ramstage.debug:

  145991:   bb dc 65 14 00  mov$0x1465dc,%ebx
...
  145a36:   81 fb 3c 66 14 00   cmp$0x14663c,%ebx
  145a3c:   74 5f   je 145a9d main+0x11c
...
  145a8a:   83 c3 14add$0x14,%ebx
...
 145a9b:   eb 99   jmp145a36 main+0xb5

So we know the first entry in the array is wrong based on
_bs_init_begin != cbmem_bscb. However, ignoring that the entry after
cbmem_bscb is wrong too because the 0x14 increment will not result in
resolving to gcov_bscb.

-Aaron

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-19 Thread Julius Werner
 That said, I went back and looked at the assembly dump. It was using
 0x14 as the size of the structure when sequencing through the entries.

 001465dc R _bs_init_begin
 001465e0 r cbmem_bscb
 00146600 r gcov_bscb
 0014663c R _bs_init_end

 Each *entry* was aligned to 0x20.  Just aligning the symbol wouldn't
 have fixed it. That means we'd need to mark the structure declaration
 as aligned to a specific value and annotate it accordingly in the
 linker script.  That's one route or just go w/ pointers to the
 structures.

Yeah, that's what I was talking about in the first mail... I think
this means that the compiler would consider the sizeof() for each
structure to be 0x20. I'm pretty sure there's some kind of rule that
the sizeof() for any object must be evenly divisible by its minimum
alignment... that's how normal arrays work as well (e.g. if you had
struct mystruct { uint32_t a; uint8_t b; } and don't mark it
__packed__, then sizeof(struct mystruct) is 0x8 and an array of them
would have 3 bytes padding after each entry). So I'm pretty sure the
pointer++ mechanism to walk through it would've still worked if the
initial address had been right. (can we repro this? then we could
simply try it out instead of guessworking.)

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-19 Thread Aaron Durbin
On Thu, Mar 19, 2015 at 3:54 PM, Julius Werner jwer...@chromium.org wrote:
 You are right that it would work, but back solving for which alignment
 the compiler decided is hard. It's definitely whack-a-mole in that
 case. It could have very well decided 16 too. Without any insight as
 to why that breaks down. Or you go the route of putting alignments on
 structures just for that behavior which I consider not the best
 approach.

 Well, we only need to pick an alignment *larger* than what the
 compiler decides, right? I'd hazard a guess that 8 would be enough for
 a long time (GCC probably just got confused between 32 and 64 bit
 modes or something... there's no other reason I could think of to
 explain this).

Yes, my concern was not knowing that upper limit. I looked into the
Linux kernel on how they do most of this. From what I could tell they
largely use pointers to the structures like I worked around it. When
embedding structures into sections they use ALIGN(8) as you suggested
(see include/asm-generic/vmlinux.lds.h for the fun).

That said, I went back and looked at the assembly dump. It was using
0x14 as the size of the structure when sequencing through the entries.

001465dc R _bs_init_begin
001465e0 r cbmem_bscb
00146600 r gcov_bscb
0014663c R _bs_init_end

Each *entry* was aligned to 0x20.  Just aligning the symbol wouldn't
have fixed it. That means we'd need to mark the structure declaration
as aligned to a specific value and annotate it accordingly in the
linker script.  That's one route or just go w/ pointers to the
structures.

I'm looking forward to your linker scripts through preprocessor
patches to land because we can leverage for aligning some linker
scripts across archs, etc. Right now the changes are cumbersome. The
last part is just me whining. :)


 From that I'd conclude that if you did:

 extern struct boot_state_init_entry _bs_init_begin[];
 extern struct boot_state_init_entry _bs_init_end[];

 for (cur = _bs_init_begin; cur  _bs_init_end; cur++) ...

 the compiler shouldn't be able to guarantee that the first array
 wasn't empty and therefore pointing to one after the end which is
 the start of the next array in the first iteration.

 I think you are right on that point -- using '' instead of equality.
 But those aren't pointer types. Not sure if the spec things arrays
 like that are poitners or a separate type. Either way, hopefully there
 isn't some language in the spec that suggests comparisons like that
 can be optimized away.

 The array degenerates to a pointer at that point AFAIK. You could also
 write _bs_init_begin... for an array type the two should be pretty
 much equivalent.

We can still try that w/o the NULL tombstones. It doesn't hurt to try, right?

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-19 Thread Julius Werner
   145a8a:   83 c3 14add$0x14,%ebx

Okay, sorry, I didn't know you looked that closely into this. That's
quite unrefuteable.

The only question that I still have is then WTF the compiler was
thinking... this just sounds like a plain bug somewhere (but I agree
that doesn't really help us much). We could still work around it with
an explicit __attribute__((aligned, 4)) or __attribute__((packed)) if
that works... I guess if the compiler does this for no apparent
reason, then there's no real guarantee that a simple list of pointers
won't also get screwed up like this?

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Aaron Durbin
On Mon, Mar 16, 2015 at 4:49 PM, Timothy Pearson
tpear...@raptorengineeringinc.com wrote:
 On 03/16/2015 04:44 PM, Aaron Durbin wrote:

 On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson
 tpear...@raptorengineeringinc.com  wrote:

 On 03/16/2015 09:23 AM, Aaron Durbin wrote:


 On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson
 tpear...@raptorengineeringinc.com   wrote:


 All,

 Just a heads up as there is no bugtracker for this project.  GIT commit
 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2,
 breaks
 ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST
 code,
 but then goes into an infinite loop).  Downgrading the GCC version
 repairs
 the boot failure.

 Not sure if you want to revert that commit until someone can figure out
 what
 changed to cause the problem.



 Could post ramstage.elf from the two different builds somewhere? I'd
 like to take a peak at what is in there.



 Sure:
 https://raptorengineeringinc.com/coreboot/built.tar.bz2

 Other oddities:
 GCC 4.8.3:
 normal/romstage0x7ff80stage97345
 normal/ramstage0x97c40stage154869

 GCC 4.9.2:
 normal/romstage0x7ff80stage94773
 normal/ramstage0x97240stage173942

 Note in particular, judging from the file sizes, that something seems to
 have been relocated from romstage to ramstage by the new gcc version.


 I noticed you had CONFIG_COVERAGE selected in both the builds. Could
 you try not having that selected? I wonder if something changed in the
 compiler on that front. But... I think I found a bigger issue.


 That shouldn't be a problem.  For reference, should CONFIG_COVERAGE be on or
 off for board status report builds?


 $ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_
 00146fc4 r pch_intel_wifi
 00146fd0 R cpu_drivers
 00146fd0 R epci_drivers
 00146fd0 r model_10xxx
 00146fdc R _bs_init_begin
 00146fdc r cbmem_bscb
 00146fdc R ecpu_drivers
 00146ff0 r gcov_bscb
 0014702c R _bs_init_end
 0014702c R pnp_conf_mode_870155_aa
 00147034 R pnp_conf_mode_a0a0_aa
 0014703c R pnp_conf_mode_8787_aa
 00147044 R pnp_conf_mode__aa

 $ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_
 001465c4 r pch_intel_wifi
 001465d0 R cpu_drivers
 001465d0 R epci_drivers
 001465d0 r model_10xxx
 001465dc R _bs_init_begin
 001465dc R ecpu_drivers
 001465e0 r cbmem_bscb
 00146600 r gcov_bscb
 0014663c R _bs_init_end
 00146640 R pnp_conf_mode_870155_aa
 00146648 R pnp_conf_mode_a0a0_aa
 00146650 R pnp_conf_mode_8787_aa
 00146658 R pnp_conf_mode__a

 The boot state callbacks place the whole structure for each entry
 between _bs_init_begin and _bs_init_end. For both binaries the size of
 0x14.


 For the 4.8.3 compiled ramstage I see:
 (gdb) p/x 0x0014702c - 0x00146fdc
 $12 = 0x50
 For the 4.9.2 compiled ramstage I see:
 (gdb) p/x 0x014663c - 0x01465dc
 $14 = 0x60

 0x60 is not a multiple of 0x14 -- which is means things aren't cool.


 This makes perfect sense--whenever coreboot didn't hang outright it started
 infinitely spewing some message regarding a boot state callback already
 being complete.

 Looking at the symbols it appears the compiler is aligning those
 structures to 32-bytes for some reason...

 A quick hack is add ALIGN(32) to the linker script before
 _bs_init_begin: src/arch/x86/ramstage.ld


 So I wonder if this is unique to AMD Fam10h or if a whole lot of other
 boards broke with the gcc update.  I wouldn't have even caught this if I
 hadn't checked out a new coreboot tree instead of copying over the existing
 tree with the prebuilt crossgcc, so we might be looking at a ticking
 timebomb that will go off as people start upgrading their crossgcc
 versions...

It all sorta depends. But the issue that _bs_init_begin does not equal
the address of the first bscb structure is bad news all around.


 But I think we'll need to store pointers to the structures in order to
 properly handle the situation where the compiler is effectively making
 alignment/size decisions for some reason.


 I am not at all familiar with the code in question, so all I can do is offer
 to test.  Thanks for analysing the problem!

I might be able to whip up a patch, but it's harder than I first
thought because we were relying on arrays to be swept into those
regions. I'll have to think on this one or we'll just have to change
the API entirely for all the users.



 -Aaron



 --
 Timothy Pearson
 Raptor Engineering
 +1 (415) 727-8645
 http://www.raptorengineeringinc.com

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Aaron Durbin
On Mon, Mar 16, 2015 at 4:33 PM, Stefan Reinauer
stefan.reina...@coreboot.org wrote:
 * Aaron Durbin adur...@chromium.org [150316 22:44]:
 A quick hack is add ALIGN(32) to the linker script before
 _bs_init_begin: src/arch/x86/ramstage.ld

 But I think we'll need to store pointers to the structures in order to
 properly handle the situation where the compiler is effectively making
 alignment/size decisions for some reason.

 I suggest trying to enforce alignment / size instead of adding another
 layer of indirection.

I'm not sure that's possible w/o marking the struct packed just for
the sake of it. The other issue is that this section is sitting in RO
while also being written to. The other thing, which has been known for
awhile, is that we can't take advantage of symbols being equal for an
empty set since C says no two symbols can be the same. I'm sure the
language lawyers will correct me where I got that wrong.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Aaron Durbin
On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson
tpear...@raptorengineeringinc.com wrote:
 All,

 Just a heads up as there is no bugtracker for this project.  GIT commit
 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks
 ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code,
 but then goes into an infinite loop).  Downgrading the GCC version repairs
 the boot failure.

 Not sure if you want to revert that commit until someone can figure out what
 changed to cause the problem.

Could post ramstage.elf from the two different builds somewhere? I'd
like to take a peak at what is in there.


 --
 Timothy Pearson
 Raptor Engineering
 +1 (415) 727-8645
 http://www.raptorengineeringinc.com

 --
 coreboot mailing list: coreboot@coreboot.org
 http://www.coreboot.org/mailman/listinfo/coreboot

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Stefan Reinauer
* Aaron Durbin adur...@chromium.org [150316 22:44]:
 A quick hack is add ALIGN(32) to the linker script before
 _bs_init_begin: src/arch/x86/ramstage.ld
 
 But I think we'll need to store pointers to the structures in order to
 properly handle the situation where the compiler is effectively making
 alignment/size decisions for some reason.

I suggest trying to enforce alignment / size instead of adding another
layer of indirection.

Stefan


-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Aaron Durbin
On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson
tpear...@raptorengineeringinc.com wrote:
 On 03/16/2015 09:23 AM, Aaron Durbin wrote:

 On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson
 tpear...@raptorengineeringinc.com  wrote:

 All,

 Just a heads up as there is no bugtracker for this project.  GIT commit
 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2,
 breaks
 ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code,
 but then goes into an infinite loop).  Downgrading the GCC version
 repairs
 the boot failure.

 Not sure if you want to revert that commit until someone can figure out
 what
 changed to cause the problem.


 Could post ramstage.elf from the two different builds somewhere? I'd
 like to take a peak at what is in there.


 Sure:
 https://raptorengineeringinc.com/coreboot/built.tar.bz2

 Other oddities:
 GCC 4.8.3:
 normal/romstage0x7ff80stage97345
 normal/ramstage0x97c40stage154869

 GCC 4.9.2:
 normal/romstage0x7ff80stage94773
 normal/ramstage0x97240stage173942

 Note in particular, judging from the file sizes, that something seems to
 have been relocated from romstage to ramstage by the new gcc version.


I noticed you had CONFIG_COVERAGE selected in both the builds. Could
you try not having that selected? I wonder if something changed in the
compiler on that front. But... I think I found a bigger issue.



$ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_
00146fc4 r pch_intel_wifi
00146fd0 R cpu_drivers
00146fd0 R epci_drivers
00146fd0 r model_10xxx
00146fdc R _bs_init_begin
00146fdc r cbmem_bscb
00146fdc R ecpu_drivers
00146ff0 r gcov_bscb
0014702c R _bs_init_end
0014702c R pnp_conf_mode_870155_aa
00147034 R pnp_conf_mode_a0a0_aa
0014703c R pnp_conf_mode_8787_aa
00147044 R pnp_conf_mode__aa

$ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_
001465c4 r pch_intel_wifi
001465d0 R cpu_drivers
001465d0 R epci_drivers
001465d0 r model_10xxx
001465dc R _bs_init_begin
001465dc R ecpu_drivers
001465e0 r cbmem_bscb
00146600 r gcov_bscb
0014663c R _bs_init_end
00146640 R pnp_conf_mode_870155_aa
00146648 R pnp_conf_mode_a0a0_aa
00146650 R pnp_conf_mode_8787_aa
00146658 R pnp_conf_mode__a

The boot state callbacks place the whole structure for each entry
between _bs_init_begin and _bs_init_end. For both binaries the size of
0x14.

For the 4.8.3 compiled ramstage I see:
(gdb) p/x 0x0014702c - 0x00146fdc
$12 = 0x50
For the 4.9.2 compiled ramstage I see:
(gdb) p/x 0x014663c - 0x01465dc
$14 = 0x60

0x60 is not a multiple of 0x14 -- which is means things aren't cool.

Looking at the symbols it appears the compiler is aligning those
structures to 32-bytes for some reason...

A quick hack is add ALIGN(32) to the linker script before
_bs_init_begin: src/arch/x86/ramstage.ld

But I think we'll need to store pointers to the structures in order to
properly handle the situation where the compiler is effectively making
alignment/size decisions for some reason.

-Aaron




 --
 Timothy Pearson
 Raptor Engineering
 +1 (415) 727-8645
 http://www.raptorengineeringinc.com

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Timothy Pearson

On 03/16/2015 04:44 PM, Aaron Durbin wrote:

On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson
tpear...@raptorengineeringinc.com  wrote:

On 03/16/2015 09:23 AM, Aaron Durbin wrote:


On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson
tpear...@raptorengineeringinc.com   wrote:


All,

Just a heads up as there is no bugtracker for this project.  GIT commit
53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2,
breaks
ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code,
but then goes into an infinite loop).  Downgrading the GCC version
repairs
the boot failure.

Not sure if you want to revert that commit until someone can figure out
what
changed to cause the problem.



Could post ramstage.elf from the two different builds somewhere? I'd
like to take a peak at what is in there.



Sure:
https://raptorengineeringinc.com/coreboot/built.tar.bz2

Other oddities:
GCC 4.8.3:
normal/romstage0x7ff80stage97345
normal/ramstage0x97c40stage154869

GCC 4.9.2:
normal/romstage0x7ff80stage94773
normal/ramstage0x97240stage173942

Note in particular, judging from the file sizes, that something seems to
have been relocated from romstage to ramstage by the new gcc version.



I noticed you had CONFIG_COVERAGE selected in both the builds. Could
you try not having that selected? I wonder if something changed in the
compiler on that front. But... I think I found a bigger issue.


That shouldn't be a problem.  For reference, should CONFIG_COVERAGE be 
on or off for board status report builds?



$ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_
00146fc4 r pch_intel_wifi
00146fd0 R cpu_drivers
00146fd0 R epci_drivers
00146fd0 r model_10xxx
00146fdc R _bs_init_begin
00146fdc r cbmem_bscb
00146fdc R ecpu_drivers
00146ff0 r gcov_bscb
0014702c R _bs_init_end
0014702c R pnp_conf_mode_870155_aa
00147034 R pnp_conf_mode_a0a0_aa
0014703c R pnp_conf_mode_8787_aa
00147044 R pnp_conf_mode__aa

$ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_
001465c4 r pch_intel_wifi
001465d0 R cpu_drivers
001465d0 R epci_drivers
001465d0 r model_10xxx
001465dc R _bs_init_begin
001465dc R ecpu_drivers
001465e0 r cbmem_bscb
00146600 r gcov_bscb
0014663c R _bs_init_end
00146640 R pnp_conf_mode_870155_aa
00146648 R pnp_conf_mode_a0a0_aa
00146650 R pnp_conf_mode_8787_aa
00146658 R pnp_conf_mode__a

The boot state callbacks place the whole structure for each entry
between _bs_init_begin and _bs_init_end. For both binaries the size of
0x14.



For the 4.8.3 compiled ramstage I see:
(gdb) p/x 0x0014702c - 0x00146fdc
$12 = 0x50
For the 4.9.2 compiled ramstage I see:
(gdb) p/x 0x014663c - 0x01465dc
$14 = 0x60

0x60 is not a multiple of 0x14 -- which is means things aren't cool.


This makes perfect sense--whenever coreboot didn't hang outright it 
started infinitely spewing some message regarding a boot state callback 
already being complete.



Looking at the symbols it appears the compiler is aligning those
structures to 32-bytes for some reason...

A quick hack is add ALIGN(32) to the linker script before
_bs_init_begin: src/arch/x86/ramstage.ld


So I wonder if this is unique to AMD Fam10h or if a whole lot of other 
boards broke with the gcc update.  I wouldn't have even caught this if I 
hadn't checked out a new coreboot tree instead of copying over the 
existing tree with the prebuilt crossgcc, so we might be looking at a 
ticking timebomb that will go off as people start upgrading their 
crossgcc versions...



But I think we'll need to store pointers to the structures in order to
properly handle the situation where the compiler is effectively making
alignment/size decisions for some reason.


I am not at all familiar with the code in question, so all I can do is 
offer to test.  Thanks for analysing the problem!



-Aaron



--
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645
http://www.raptorengineeringinc.com

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Timothy Pearson

On 03/16/2015 09:23 AM, Aaron Durbin wrote:

On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson
tpear...@raptorengineeringinc.com  wrote:

All,

Just a heads up as there is no bugtracker for this project.  GIT commit
53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks
ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code,
but then goes into an infinite loop).  Downgrading the GCC version repairs
the boot failure.

Not sure if you want to revert that commit until someone can figure out what
changed to cause the problem.


Could post ramstage.elf from the two different builds somewhere? I'd
like to take a peak at what is in there.


Sure:
https://raptorengineeringinc.com/coreboot/built.tar.bz2

Other oddities:
GCC 4.8.3:
normal/romstage0x7ff80stage97345
normal/ramstage0x97c40stage154869

GCC 4.9.2:
normal/romstage0x7ff80stage94773
normal/ramstage0x97240stage173942

Note in particular, judging from the file sizes, that something seems to 
have been relocated from romstage to ramstage by the new gcc version.


--
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645
http://www.raptorengineeringinc.com

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


[coreboot] GCC update broke AMD Fam10h boot

2015-03-15 Thread Timothy Pearson

All,

Just a heads up as there is no bugtracker for this project.  GIT commit 
53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, 
breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 
POST code, but then goes into an infinite loop).  Downgrading the GCC 
version repairs the boot failure.


Not sure if you want to revert that commit until someone can figure out 
what changed to cause the problem.


--
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645
http://www.raptorengineeringinc.com

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-15 Thread Alexandru Gagniuc
On Sunday, March 15, 2015 02:04:51 PM Timothy Pearson wrote:
 All,
 
 Just a heads up as there is no bugtracker for this project.  GIT commit
 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2,
 breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39
 POST code, but then goes into an infinite loop).  Downgrading the GCC
 version repairs the boot failure.
 
 Not sure if you want to revert that commit until someone can figure out
 what changed to cause the problem.

Unless there's a clear case to be made that this is a compiler bug, I'd go 
with finding and fixing the bug in our code that causes that.

Alex

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot