Re: [coreboot] GCC update broke AMD Fam10h boot
Sounds like this could've been solved with a simple ALIGN(8) in the ldscript, right? I don't know what made the compiler think that it would have to align i386 pointers to 8 byte (which seems to be what happened), but if it makes that decision then it should also conclude that sizeof(struct boot_state_init_entry) == 0x20 and the array traversal should still work. If those structures need to be writable we could simply move them from .rodata to .data in the ldscript. Aaron, do you have a definitive source for the no two symbols may be the same rule? Because that's a pretty common pattern, I would be surprised if it was incorrect (especially since there's things like __attribute__((alias)) that explicitly allow you to do it even in C). I can't find anything in a cursory glance at the standard... in fact, 6.5.9.6 (of some weird 2007 draft version I found, at least) says: Two pointers compare equal if and only if [...] or one is a pointer to one past the end of an array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. From that I'd conclude that if you did: extern struct boot_state_init_entry _bs_init_begin[]; extern struct boot_state_init_entry _bs_init_end[]; for (cur = _bs_init_begin; cur _bs_init_end; cur++) ... the compiler shouldn't be able to guarantee that the first array wasn't empty and therefore pointing to one after the end which is the start of the next array in the first iteration. -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
You are right that it would work, but back solving for which alignment the compiler decided is hard. It's definitely whack-a-mole in that case. It could have very well decided 16 too. Without any insight as to why that breaks down. Or you go the route of putting alignments on structures just for that behavior which I consider not the best approach. Well, we only need to pick an alignment *larger* than what the compiler decides, right? I'd hazard a guess that 8 would be enough for a long time (GCC probably just got confused between 32 and 64 bit modes or something... there's no other reason I could think of to explain this). From that I'd conclude that if you did: extern struct boot_state_init_entry _bs_init_begin[]; extern struct boot_state_init_entry _bs_init_end[]; for (cur = _bs_init_begin; cur _bs_init_end; cur++) ... the compiler shouldn't be able to guarantee that the first array wasn't empty and therefore pointing to one after the end which is the start of the next array in the first iteration. I think you are right on that point -- using '' instead of equality. But those aren't pointer types. Not sure if the spec things arrays like that are poitners or a separate type. Either way, hopefully there isn't some language in the spec that suggests comparisons like that can be optimized away. The array degenerates to a pointer at that point AFAIK. You could also write _bs_init_begin... for an array type the two should be pretty much equivalent. -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Thu, Mar 19, 2015 at 5:29 PM, Julius Werner jwer...@chromium.org wrote: Sounds like this could've been solved with a simple ALIGN(8) in the ldscript, right? I don't know what made the compiler think that it would have to align i386 pointers to 8 byte (which seems to be what happened), but if it makes that decision then it should also conclude that sizeof(struct boot_state_init_entry) == 0x20 and the array traversal should still work. If those structures need to be writable we could simply move them from .rodata to .data in the ldscript. You are right that it would work, but back solving for which alignment the compiler decided is hard. It's definitely whack-a-mole in that case. It could have very well decided 16 too. Without any insight as to why that breaks down. Or you go the route of putting alignments on structures just for that behavior which I consider not the best approach. They do need to be writable since our linked list is within them. Aaron, do you have a definitive source for the no two symbols may be the same rule? Because that's a pretty common pattern, I would be surprised if it was incorrect (especially since there's things like __attribute__((alias)) that explicitly allow you to do it even in C). I can't find anything in a cursory glance at the standard... in fact, 6.5.9.6 (of some weird 2007 draft version I found, at least) says: I don't have a pointer. I only remember clang doing something because of that behavior. I think Patrick may remember the details. Two pointers compare equal if and only if [...] or one is a pointer to one past the end of an array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. From that I'd conclude that if you did: extern struct boot_state_init_entry _bs_init_begin[]; extern struct boot_state_init_entry _bs_init_end[]; for (cur = _bs_init_begin; cur _bs_init_end; cur++) ... the compiler shouldn't be able to guarantee that the first array wasn't empty and therefore pointing to one after the end which is the start of the next array in the first iteration. I think you are right on that point -- using '' instead of equality. But those aren't pointer types. Not sure if the spec things arrays like that are poitners or a separate type. Either way, hopefully there isn't some language in the spec that suggests comparisons like that can be optimized away. -Aaron -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On 03/19/2015 09:21 PM, Julius Werner wrote: That said, I went back and looked at the assembly dump. It was using 0x14 as the size of the structure when sequencing through the entries. 001465dc R _bs_init_begin 001465e0 r cbmem_bscb 00146600 r gcov_bscb 0014663c R _bs_init_end Each *entry* was aligned to 0x20. Just aligning the symbol wouldn't have fixed it. That means we'd need to mark the structure declaration as aligned to a specific value and annotate it accordingly in the linker script. That's one route or just go w/ pointers to the structures. Yeah, that's what I was talking about in the first mail... I think this means that the compiler would consider the sizeof() for each structure to be 0x20. I'm pretty sure there's some kind of rule that the sizeof() for any object must be evenly divisible by its minimum alignment... that's how normal arrays work as well (e.g. if you had struct mystruct { uint32_t a; uint8_t b; } and don't mark it __packed__, then sizeof(struct mystruct) is 0x8 and an array of them would have 3 bytes padding after each entry). So I'm pretty sure the pointer++ mechanism to walk through it would've still worked if the initial address had been right. (can we repro this? then we could simply try it out instead of guessworking.) Reproduction should be a simple matter of reverting 9ef9d859 on your local tree and building for the ASUS KFSN4-DRE; the breakage was extremely consistent and reproducible. The only wildcard would be that gcc was building coreboot with specific optimisations for the AMD Fam10h processor; you might need to force the appropriate flags set (-mtune=amdfam10 ?) if building on some other CPU. -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Thu, Mar 19, 2015 at 7:53 PM, Julius Werner jwer...@chromium.org wrote: 145a8a: 83 c3 14add$0x14,%ebx Okay, sorry, I didn't know you looked that closely into this. That's quite unrefuteable. The only question that I still have is then WTF the compiler was thinking... this just sounds like a plain bug somewhere (but I agree that doesn't really help us much). We could still work around it with an explicit __attribute__((aligned, 4)) or __attribute__((packed)) if that works... I guess if the compiler does this for no apparent reason, then there's no real guarantee that a simple list of pointers won't also get screwed up like this? Well... I was working the assumption an array of pointers couldn't possibly get hosed. That's about all I got for that. ;) As for decorating the structures it's just sort of a pain in that you need to match alignment of the symbol in the linker script and the structure declaration -- for every object that we place like this in the linker scripts. Definitely doable though. Something to keep in one's back pocket. -Aaron -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Thu, Mar 19, 2015 at 7:21 PM, Julius Werner jwer...@chromium.org wrote: That said, I went back and looked at the assembly dump. It was using 0x14 as the size of the structure when sequencing through the entries. 001465dc R _bs_init_begin 001465e0 r cbmem_bscb 00146600 r gcov_bscb 0014663c R _bs_init_end Each *entry* was aligned to 0x20. Just aligning the symbol wouldn't have fixed it. That means we'd need to mark the structure declaration as aligned to a specific value and annotate it accordingly in the linker script. That's one route or just go w/ pointers to the structures. Yeah, that's what I was talking about in the first mail... I think this means that the compiler would consider the sizeof() for each structure to be 0x20. I'm pretty sure there's some kind of rule that the sizeof() for any object must be evenly divisible by its minimum alignment... that's how normal arrays work as well (e.g. if you had struct mystruct { uint32_t a; uint8_t b; } and don't mark it __packed__, then sizeof(struct mystruct) is 0x8 and an array of them would have 3 bytes padding after each entry). So I'm pretty sure the pointer++ mechanism to walk through it would've still worked if the initial address had been right. (can we repro this? then we could simply try it out instead of guessworking.) I think you misunderstood what I wrote; I wasn't guessworking. The bscb structure arrays were aligned to 0x20. But the structure size to the compiler is 0x14. It's 0x14 in gdb and it's 0x14 in the instructions calculating the next entry. That's how I diagnosed this to begin with. From 4.92/ramstage.debug: 145991: bb dc 65 14 00 mov$0x1465dc,%ebx ... 145a36: 81 fb 3c 66 14 00 cmp$0x14663c,%ebx 145a3c: 74 5f je 145a9d main+0x11c ... 145a8a: 83 c3 14add$0x14,%ebx ... 145a9b: eb 99 jmp145a36 main+0xb5 So we know the first entry in the array is wrong based on _bs_init_begin != cbmem_bscb. However, ignoring that the entry after cbmem_bscb is wrong too because the 0x14 increment will not result in resolving to gcov_bscb. -Aaron -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
That said, I went back and looked at the assembly dump. It was using 0x14 as the size of the structure when sequencing through the entries. 001465dc R _bs_init_begin 001465e0 r cbmem_bscb 00146600 r gcov_bscb 0014663c R _bs_init_end Each *entry* was aligned to 0x20. Just aligning the symbol wouldn't have fixed it. That means we'd need to mark the structure declaration as aligned to a specific value and annotate it accordingly in the linker script. That's one route or just go w/ pointers to the structures. Yeah, that's what I was talking about in the first mail... I think this means that the compiler would consider the sizeof() for each structure to be 0x20. I'm pretty sure there's some kind of rule that the sizeof() for any object must be evenly divisible by its minimum alignment... that's how normal arrays work as well (e.g. if you had struct mystruct { uint32_t a; uint8_t b; } and don't mark it __packed__, then sizeof(struct mystruct) is 0x8 and an array of them would have 3 bytes padding after each entry). So I'm pretty sure the pointer++ mechanism to walk through it would've still worked if the initial address had been right. (can we repro this? then we could simply try it out instead of guessworking.) -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Thu, Mar 19, 2015 at 3:54 PM, Julius Werner jwer...@chromium.org wrote: You are right that it would work, but back solving for which alignment the compiler decided is hard. It's definitely whack-a-mole in that case. It could have very well decided 16 too. Without any insight as to why that breaks down. Or you go the route of putting alignments on structures just for that behavior which I consider not the best approach. Well, we only need to pick an alignment *larger* than what the compiler decides, right? I'd hazard a guess that 8 would be enough for a long time (GCC probably just got confused between 32 and 64 bit modes or something... there's no other reason I could think of to explain this). Yes, my concern was not knowing that upper limit. I looked into the Linux kernel on how they do most of this. From what I could tell they largely use pointers to the structures like I worked around it. When embedding structures into sections they use ALIGN(8) as you suggested (see include/asm-generic/vmlinux.lds.h for the fun). That said, I went back and looked at the assembly dump. It was using 0x14 as the size of the structure when sequencing through the entries. 001465dc R _bs_init_begin 001465e0 r cbmem_bscb 00146600 r gcov_bscb 0014663c R _bs_init_end Each *entry* was aligned to 0x20. Just aligning the symbol wouldn't have fixed it. That means we'd need to mark the structure declaration as aligned to a specific value and annotate it accordingly in the linker script. That's one route or just go w/ pointers to the structures. I'm looking forward to your linker scripts through preprocessor patches to land because we can leverage for aligning some linker scripts across archs, etc. Right now the changes are cumbersome. The last part is just me whining. :) From that I'd conclude that if you did: extern struct boot_state_init_entry _bs_init_begin[]; extern struct boot_state_init_entry _bs_init_end[]; for (cur = _bs_init_begin; cur _bs_init_end; cur++) ... the compiler shouldn't be able to guarantee that the first array wasn't empty and therefore pointing to one after the end which is the start of the next array in the first iteration. I think you are right on that point -- using '' instead of equality. But those aren't pointer types. Not sure if the spec things arrays like that are poitners or a separate type. Either way, hopefully there isn't some language in the spec that suggests comparisons like that can be optimized away. The array degenerates to a pointer at that point AFAIK. You could also write _bs_init_begin... for an array type the two should be pretty much equivalent. We can still try that w/o the NULL tombstones. It doesn't hurt to try, right? -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
145a8a: 83 c3 14add$0x14,%ebx Okay, sorry, I didn't know you looked that closely into this. That's quite unrefuteable. The only question that I still have is then WTF the compiler was thinking... this just sounds like a plain bug somewhere (but I agree that doesn't really help us much). We could still work around it with an explicit __attribute__((aligned, 4)) or __attribute__((packed)) if that works... I guess if the compiler does this for no apparent reason, then there's no real guarantee that a simple list of pointers won't also get screwed up like this? -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Mon, Mar 16, 2015 at 4:49 PM, Timothy Pearson tpear...@raptorengineeringinc.com wrote: On 03/16/2015 04:44 PM, Aaron Durbin wrote: On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson tpear...@raptorengineeringinc.com wrote: On 03/16/2015 09:23 AM, Aaron Durbin wrote: On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson tpear...@raptorengineeringinc.com wrote: All, Just a heads up as there is no bugtracker for this project. GIT commit 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, but then goes into an infinite loop). Downgrading the GCC version repairs the boot failure. Not sure if you want to revert that commit until someone can figure out what changed to cause the problem. Could post ramstage.elf from the two different builds somewhere? I'd like to take a peak at what is in there. Sure: https://raptorengineeringinc.com/coreboot/built.tar.bz2 Other oddities: GCC 4.8.3: normal/romstage0x7ff80stage97345 normal/ramstage0x97c40stage154869 GCC 4.9.2: normal/romstage0x7ff80stage94773 normal/ramstage0x97240stage173942 Note in particular, judging from the file sizes, that something seems to have been relocated from romstage to ramstage by the new gcc version. I noticed you had CONFIG_COVERAGE selected in both the builds. Could you try not having that selected? I wonder if something changed in the compiler on that front. But... I think I found a bigger issue. That shouldn't be a problem. For reference, should CONFIG_COVERAGE be on or off for board status report builds? $ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_ 00146fc4 r pch_intel_wifi 00146fd0 R cpu_drivers 00146fd0 R epci_drivers 00146fd0 r model_10xxx 00146fdc R _bs_init_begin 00146fdc r cbmem_bscb 00146fdc R ecpu_drivers 00146ff0 r gcov_bscb 0014702c R _bs_init_end 0014702c R pnp_conf_mode_870155_aa 00147034 R pnp_conf_mode_a0a0_aa 0014703c R pnp_conf_mode_8787_aa 00147044 R pnp_conf_mode__aa $ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_ 001465c4 r pch_intel_wifi 001465d0 R cpu_drivers 001465d0 R epci_drivers 001465d0 r model_10xxx 001465dc R _bs_init_begin 001465dc R ecpu_drivers 001465e0 r cbmem_bscb 00146600 r gcov_bscb 0014663c R _bs_init_end 00146640 R pnp_conf_mode_870155_aa 00146648 R pnp_conf_mode_a0a0_aa 00146650 R pnp_conf_mode_8787_aa 00146658 R pnp_conf_mode__a The boot state callbacks place the whole structure for each entry between _bs_init_begin and _bs_init_end. For both binaries the size of 0x14. For the 4.8.3 compiled ramstage I see: (gdb) p/x 0x0014702c - 0x00146fdc $12 = 0x50 For the 4.9.2 compiled ramstage I see: (gdb) p/x 0x014663c - 0x01465dc $14 = 0x60 0x60 is not a multiple of 0x14 -- which is means things aren't cool. This makes perfect sense--whenever coreboot didn't hang outright it started infinitely spewing some message regarding a boot state callback already being complete. Looking at the symbols it appears the compiler is aligning those structures to 32-bytes for some reason... A quick hack is add ALIGN(32) to the linker script before _bs_init_begin: src/arch/x86/ramstage.ld So I wonder if this is unique to AMD Fam10h or if a whole lot of other boards broke with the gcc update. I wouldn't have even caught this if I hadn't checked out a new coreboot tree instead of copying over the existing tree with the prebuilt crossgcc, so we might be looking at a ticking timebomb that will go off as people start upgrading their crossgcc versions... It all sorta depends. But the issue that _bs_init_begin does not equal the address of the first bscb structure is bad news all around. But I think we'll need to store pointers to the structures in order to properly handle the situation where the compiler is effectively making alignment/size decisions for some reason. I am not at all familiar with the code in question, so all I can do is offer to test. Thanks for analysing the problem! I might be able to whip up a patch, but it's harder than I first thought because we were relying on arrays to be swept into those regions. I'll have to think on this one or we'll just have to change the API entirely for all the users. -Aaron -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Mon, Mar 16, 2015 at 4:33 PM, Stefan Reinauer stefan.reina...@coreboot.org wrote: * Aaron Durbin adur...@chromium.org [150316 22:44]: A quick hack is add ALIGN(32) to the linker script before _bs_init_begin: src/arch/x86/ramstage.ld But I think we'll need to store pointers to the structures in order to properly handle the situation where the compiler is effectively making alignment/size decisions for some reason. I suggest trying to enforce alignment / size instead of adding another layer of indirection. I'm not sure that's possible w/o marking the struct packed just for the sake of it. The other issue is that this section is sitting in RO while also being written to. The other thing, which has been known for awhile, is that we can't take advantage of symbols being equal for an empty set since C says no two symbols can be the same. I'm sure the language lawyers will correct me where I got that wrong. -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson tpear...@raptorengineeringinc.com wrote: All, Just a heads up as there is no bugtracker for this project. GIT commit 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, but then goes into an infinite loop). Downgrading the GCC version repairs the boot failure. Not sure if you want to revert that commit until someone can figure out what changed to cause the problem. Could post ramstage.elf from the two different builds somewhere? I'd like to take a peak at what is in there. -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
* Aaron Durbin adur...@chromium.org [150316 22:44]: A quick hack is add ALIGN(32) to the linker script before _bs_init_begin: src/arch/x86/ramstage.ld But I think we'll need to store pointers to the structures in order to properly handle the situation where the compiler is effectively making alignment/size decisions for some reason. I suggest trying to enforce alignment / size instead of adding another layer of indirection. Stefan -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson tpear...@raptorengineeringinc.com wrote: On 03/16/2015 09:23 AM, Aaron Durbin wrote: On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson tpear...@raptorengineeringinc.com wrote: All, Just a heads up as there is no bugtracker for this project. GIT commit 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, but then goes into an infinite loop). Downgrading the GCC version repairs the boot failure. Not sure if you want to revert that commit until someone can figure out what changed to cause the problem. Could post ramstage.elf from the two different builds somewhere? I'd like to take a peak at what is in there. Sure: https://raptorengineeringinc.com/coreboot/built.tar.bz2 Other oddities: GCC 4.8.3: normal/romstage0x7ff80stage97345 normal/ramstage0x97c40stage154869 GCC 4.9.2: normal/romstage0x7ff80stage94773 normal/ramstage0x97240stage173942 Note in particular, judging from the file sizes, that something seems to have been relocated from romstage to ramstage by the new gcc version. I noticed you had CONFIG_COVERAGE selected in both the builds. Could you try not having that selected? I wonder if something changed in the compiler on that front. But... I think I found a bigger issue. $ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_ 00146fc4 r pch_intel_wifi 00146fd0 R cpu_drivers 00146fd0 R epci_drivers 00146fd0 r model_10xxx 00146fdc R _bs_init_begin 00146fdc r cbmem_bscb 00146fdc R ecpu_drivers 00146ff0 r gcov_bscb 0014702c R _bs_init_end 0014702c R pnp_conf_mode_870155_aa 00147034 R pnp_conf_mode_a0a0_aa 0014703c R pnp_conf_mode_8787_aa 00147044 R pnp_conf_mode__aa $ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_ 001465c4 r pch_intel_wifi 001465d0 R cpu_drivers 001465d0 R epci_drivers 001465d0 r model_10xxx 001465dc R _bs_init_begin 001465dc R ecpu_drivers 001465e0 r cbmem_bscb 00146600 r gcov_bscb 0014663c R _bs_init_end 00146640 R pnp_conf_mode_870155_aa 00146648 R pnp_conf_mode_a0a0_aa 00146650 R pnp_conf_mode_8787_aa 00146658 R pnp_conf_mode__a The boot state callbacks place the whole structure for each entry between _bs_init_begin and _bs_init_end. For both binaries the size of 0x14. For the 4.8.3 compiled ramstage I see: (gdb) p/x 0x0014702c - 0x00146fdc $12 = 0x50 For the 4.9.2 compiled ramstage I see: (gdb) p/x 0x014663c - 0x01465dc $14 = 0x60 0x60 is not a multiple of 0x14 -- which is means things aren't cool. Looking at the symbols it appears the compiler is aligning those structures to 32-bytes for some reason... A quick hack is add ALIGN(32) to the linker script before _bs_init_begin: src/arch/x86/ramstage.ld But I think we'll need to store pointers to the structures in order to properly handle the situation where the compiler is effectively making alignment/size decisions for some reason. -Aaron -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On 03/16/2015 04:44 PM, Aaron Durbin wrote: On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson tpear...@raptorengineeringinc.com wrote: On 03/16/2015 09:23 AM, Aaron Durbin wrote: On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson tpear...@raptorengineeringinc.com wrote: All, Just a heads up as there is no bugtracker for this project. GIT commit 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, but then goes into an infinite loop). Downgrading the GCC version repairs the boot failure. Not sure if you want to revert that commit until someone can figure out what changed to cause the problem. Could post ramstage.elf from the two different builds somewhere? I'd like to take a peak at what is in there. Sure: https://raptorengineeringinc.com/coreboot/built.tar.bz2 Other oddities: GCC 4.8.3: normal/romstage0x7ff80stage97345 normal/ramstage0x97c40stage154869 GCC 4.9.2: normal/romstage0x7ff80stage94773 normal/ramstage0x97240stage173942 Note in particular, judging from the file sizes, that something seems to have been relocated from romstage to ramstage by the new gcc version. I noticed you had CONFIG_COVERAGE selected in both the builds. Could you try not having that selected? I wonder if something changed in the compiler on that front. But... I think I found a bigger issue. That shouldn't be a problem. For reference, should CONFIG_COVERAGE be on or off for board status report builds? $ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_ 00146fc4 r pch_intel_wifi 00146fd0 R cpu_drivers 00146fd0 R epci_drivers 00146fd0 r model_10xxx 00146fdc R _bs_init_begin 00146fdc r cbmem_bscb 00146fdc R ecpu_drivers 00146ff0 r gcov_bscb 0014702c R _bs_init_end 0014702c R pnp_conf_mode_870155_aa 00147034 R pnp_conf_mode_a0a0_aa 0014703c R pnp_conf_mode_8787_aa 00147044 R pnp_conf_mode__aa $ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_ 001465c4 r pch_intel_wifi 001465d0 R cpu_drivers 001465d0 R epci_drivers 001465d0 r model_10xxx 001465dc R _bs_init_begin 001465dc R ecpu_drivers 001465e0 r cbmem_bscb 00146600 r gcov_bscb 0014663c R _bs_init_end 00146640 R pnp_conf_mode_870155_aa 00146648 R pnp_conf_mode_a0a0_aa 00146650 R pnp_conf_mode_8787_aa 00146658 R pnp_conf_mode__a The boot state callbacks place the whole structure for each entry between _bs_init_begin and _bs_init_end. For both binaries the size of 0x14. For the 4.8.3 compiled ramstage I see: (gdb) p/x 0x0014702c - 0x00146fdc $12 = 0x50 For the 4.9.2 compiled ramstage I see: (gdb) p/x 0x014663c - 0x01465dc $14 = 0x60 0x60 is not a multiple of 0x14 -- which is means things aren't cool. This makes perfect sense--whenever coreboot didn't hang outright it started infinitely spewing some message regarding a boot state callback already being complete. Looking at the symbols it appears the compiler is aligning those structures to 32-bytes for some reason... A quick hack is add ALIGN(32) to the linker script before _bs_init_begin: src/arch/x86/ramstage.ld So I wonder if this is unique to AMD Fam10h or if a whole lot of other boards broke with the gcc update. I wouldn't have even caught this if I hadn't checked out a new coreboot tree instead of copying over the existing tree with the prebuilt crossgcc, so we might be looking at a ticking timebomb that will go off as people start upgrading their crossgcc versions... But I think we'll need to store pointers to the structures in order to properly handle the situation where the compiler is effectively making alignment/size decisions for some reason. I am not at all familiar with the code in question, so all I can do is offer to test. Thanks for analysing the problem! -Aaron -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On 03/16/2015 09:23 AM, Aaron Durbin wrote: On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson tpear...@raptorengineeringinc.com wrote: All, Just a heads up as there is no bugtracker for this project. GIT commit 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, but then goes into an infinite loop). Downgrading the GCC version repairs the boot failure. Not sure if you want to revert that commit until someone can figure out what changed to cause the problem. Could post ramstage.elf from the two different builds somewhere? I'd like to take a peak at what is in there. Sure: https://raptorengineeringinc.com/coreboot/built.tar.bz2 Other oddities: GCC 4.8.3: normal/romstage0x7ff80stage97345 normal/ramstage0x97c40stage154869 GCC 4.9.2: normal/romstage0x7ff80stage94773 normal/ramstage0x97240stage173942 Note in particular, judging from the file sizes, that something seems to have been relocated from romstage to ramstage by the new gcc version. -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
[coreboot] GCC update broke AMD Fam10h boot
All, Just a heads up as there is no bugtracker for this project. GIT commit 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, but then goes into an infinite loop). Downgrading the GCC version repairs the boot failure. Not sure if you want to revert that commit until someone can figure out what changed to cause the problem. -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Sunday, March 15, 2015 02:04:51 PM Timothy Pearson wrote: All, Just a heads up as there is no bugtracker for this project. GIT commit 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, but then goes into an infinite loop). Downgrading the GCC version repairs the boot failure. Not sure if you want to revert that commit until someone can figure out what changed to cause the problem. Unless there's a clear case to be made that this is a compiler bug, I'd go with finding and fixing the bug in our code that causes that. Alex -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot