Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 15 July 2016 at 12:37, Laszlo Ersek wrote: > On 07/15/16 12:15, Ard Biesheuvel wrote: >> On 15 July 2016 at 11:51, Laszlo Ersek wrote: > >>> What I'd be most interested in is >>> enabling optimization (saving space) for -b DEBUG -- I don't do source >>> level debugging, but I want the debug messages and the ASSERT()s. Saving >>> space with those compiled in would be superb -- I guess that's where the >>> savings would really shine. >>> >> >> Doing the same build for DEBUG gives me >> >> -O0 >> >> SECFV [20%Full] 212992 total, 44464 used, 168528 free >> FVMAIN_COMPACT [75%Full] 1753088 total, 1326592 used, 426496 free >> DXEFV [67%Full] 10485760 total, 7077696 used, 3408064 free >> PEIFV [28%Full] 917504 total, 257200 used, 660304 free >> >> -O1 >> >> SECFV [16%Full] 212992 total, 35248 used, 177744 free >> FVMAIN_COMPACT [76%Full] 1753088 total, 1339592 used, 413496 free >> DXEFV [56%Full] 10485760 total, 5933120 used, 4552640 free >> PEIFV [23%Full] 917504 total, 216240 used, 701264 free >> >> -O2 >> >> SECFV [17%Full] 212992 total, 36688 used, 176304 free >> FVMAIN_COMPACT [83%Full] 1753088 total, 1469904 used, 283184 free >> DXEFV [58%Full] 10485760 total, 6157440 used, 4328320 free >> PEIFV [24%Full] 917504 total, 227408 used, 690096 free >> >> which paints roughly the same picture: much smaller binaries, but O2 >> is worse than O0/O1 after compression. > > Thank you for checking this -- in absolute terms, -O1 shaves about 1.1MB > off DXEFV. Not bad! > >> Another thing to take into account is that DXE code size translates >> directly into RuntimeServicesCode memory footprint of >> DXE_RUNTIME_DRIVER modules, > > Good point! > >> although I am not sure if that has ever >> been a source of concern. > > I've never thought of it -- my gut feeling is that the AcpiNVS stuff is > larger. (I could easily be biased though: the AcpiNVS allocations are > all manual and require thought, while RuntimeServicesCode allocs are all > automatic. I might be projecting "mental effort required" to "allocation > size" :)) > Actually, it appears -Os does work, as long as you also pass the -maccumulate-outgoing-args switch. I am not sure what I did wrong before, but all the builds below boot fine. I get the following results: RELEASE_GCC44 SECFV [6%Full] 212992 total, 14320 used, 198672 free FVMAIN_COMPACT [56%Full] 1753088 total, 997184 used, 755904 free DXEFV [33%Full] 10485760 total, 3508336 used, 6977424 free PEIFV [11%Full] 917504 total, 109896 used, 807608 free RELEASE_GCC46 SECFV [6%Full] 212992 total, 13936 used, 199056 free FVMAIN_COMPACT [56%Full] 1753088 total, 987632 used, 765456 free DXEFV [33%Full] 10485760 total, 3468304 used, 7017456 free PEIFV [11%Full] 917504 total, 107784 used, 809720 free RELEASE_GCC47 SECFV [6%Full] 212992 total, 13808 used, 199184 free FVMAIN_COMPACT [56%Full] 1753088 total, 988072 used, 765016 free DXEFV [32%Full] 10485760 total, 3449936 used, 7035824 free PEIFV [11%Full] 917504 total, 106984 used, 810520 free RELEASE_GCC48 SECFV [6%Full] 212992 total, 13744 used, 199248 free FVMAIN_COMPACT [56%Full] 1753088 total, 992800 used, 760288 free DXEFV [32%Full] 10485760 total, 3453456 used, 7032304 free PEIFV [11%Full] 917504 total, 107464 used, 810040 free DEBUG_GCC44 SECFV [14%Full] 212992 total, 31632 used, 181360 free FVMAIN_COMPACT [74%Full] 1753088 total, 1300904 used, 452184 free DXEFV [50%Full] 10485760 total, 5272464 used, 5213296 free PEIFV [25%Full] 917504 total, 231496 used, 686008 free DEBUG_GCC46 SECFV [14%Full] 212992 total, 31728 used, 181264 free FVMAIN_COMPACT [73%Full] 1753088 total, 1291672 used, 461416 free DXEFV [50%Full] 10485760 total, 5263888 used, 5221872 free PEIFV [25%Full] 917504 total, 231656 used, 685848 free DEBUG_GCC47 SECFV [14%Full] 212992 total, 31440 used, 181552 free FVMAIN_COMPACT [73%Full] 1753088 total, 1290584 used, 462504 free DXEFV [49%Full] 10485760 total, 5234224 used, 5251536 free PEIFV [25%Full] 917504 total, 230152 used, 687352 free DEBUG_GCC48 SECFV [14%Full] 212992 total, 31280 used, 181712 free FVMAIN_COMPACT [73%Full] 1753088 total, 1296208 used, 456880 free DXEFV [49%Full] 10485760 total, 5229424 used, 5256336 free PEIFV [25%Full] 917504 total, 229512 used, 687992 free I am going to respin the series to use Os rather than O2. Thanks, Ard. ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 07/15/16 12:15, Ard Biesheuvel wrote: > On 15 July 2016 at 11:51, Laszlo Ersek wrote: >> What I'd be most interested in is >> enabling optimization (saving space) for -b DEBUG -- I don't do source >> level debugging, but I want the debug messages and the ASSERT()s. Saving >> space with those compiled in would be superb -- I guess that's where the >> savings would really shine. >> > > Doing the same build for DEBUG gives me > > -O0 > > SECFV [20%Full] 212992 total, 44464 used, 168528 free > FVMAIN_COMPACT [75%Full] 1753088 total, 1326592 used, 426496 free > DXEFV [67%Full] 10485760 total, 7077696 used, 3408064 free > PEIFV [28%Full] 917504 total, 257200 used, 660304 free > > -O1 > > SECFV [16%Full] 212992 total, 35248 used, 177744 free > FVMAIN_COMPACT [76%Full] 1753088 total, 1339592 used, 413496 free > DXEFV [56%Full] 10485760 total, 5933120 used, 4552640 free > PEIFV [23%Full] 917504 total, 216240 used, 701264 free > > -O2 > > SECFV [17%Full] 212992 total, 36688 used, 176304 free > FVMAIN_COMPACT [83%Full] 1753088 total, 1469904 used, 283184 free > DXEFV [58%Full] 10485760 total, 6157440 used, 4328320 free > PEIFV [24%Full] 917504 total, 227408 used, 690096 free > > which paints roughly the same picture: much smaller binaries, but O2 > is worse than O0/O1 after compression. Thank you for checking this -- in absolute terms, -O1 shaves about 1.1MB off DXEFV. Not bad! > Another thing to take into account is that DXE code size translates > directly into RuntimeServicesCode memory footprint of > DXE_RUNTIME_DRIVER modules, Good point! > although I am not sure if that has ever > been a source of concern. I've never thought of it -- my gut feeling is that the AcpiNVS stuff is larger. (I could easily be biased though: the AcpiNVS allocations are all manual and require thought, while RuntimeServicesCode allocs are all automatic. I might be projecting "mental effort required" to "allocation size" :)) Thanks Laszlo ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 15 July 2016 at 11:51, Laszlo Ersek wrote: > On 07/15/16 11:25, Ard Biesheuvel wrote: >> On 15 July 2016 at 11:06, Laszlo Ersek wrote: > >>> And, to our collective relief, this means that there should be no >>> problem with this series. :) I think I can add my Tested-by: >>> >>> Tested-by: Laszlo Ersek >>> >> >> Thanks a lot! >> >> I do have another question, though. Looking at your logs, I found the >> following results for the sizes >> >> GCC48 before: >> >> SECFV [11%Full] 212992 total, 25392 used, 187600 free >> FVMAIN_COMPACT [63%Full] 1753088 total, 1113976 used, 639112 free >> DXEFV [56%Full] 10485760 total, 5970624 used, 4515136 free >> PEIFV [21%Full] 917504 total, 200616 used, 716888 free >> >> GCC48 after: >> >> SECFV [7%Full] 212992 total, 15728 used, 197264 free >> FVMAIN_COMPACT [65%Full] 1753088 total, 1145808 used, 607280 free >> DXEFV [39%Full] 10485760 total, 4148384 used, 6337376 free >> PEIFV [14%Full] 917504 total, 131208 used, 786296 free >> >> >> GCC49 before: >> >> SECFV [12%Full] 212992 total, 25616 used, 187376 free >> FVMAIN_COMPACT [64%Full] 1753088 total, 1122760 used, 630328 free >> DXEFV [58%Full] 10485760 total, 6106048 used, 4379712 free >> PEIFV [22%Full] 917504 total, 205992 used, 711512 free >> >> GCC49 after: >> >> SECFV [7%Full] 212992 total, 15824 used, 197168 free >> FVMAIN_COMPACT [66%Full] 1753088 total, 1166952 used, 586136 free >> DXEFV [40%Full] 10485760 total, 4225728 used, 6260032 free >> PEIFV [14%Full] 917504 total, 133672 used, 783832 free >> >> So DXEFV is substantially smaller, but FVMAIN_COMPACT ends up slightly >> bigger. Which one do we care about mostly? I would assume >> FVMAIN_COMPACT, since that translates into flash footprint directly. > > Correct. > > DXEFV is less relevant, but not irrelevant. We've been increasing its > allotted size in the FDF file(s) over time, cramming more and more DXE > phase stuff into OVMF. Its size does have an impact on guest memory > availability, when the SMM driver stack is built in, and S3 is enabled > on the QEMU command line. (Under these circumstances DXEFV's location is > covered as AcpiNVS. Not optimal, but that's what we have.) > > So, in practice, shrinking DXEFV is useful as well. > > FVMAIN_COMPACT is what ultimately determines the flash chip's size. > We've never had to change that (after a historical bump from 1MB to 2MB, > speaking unified flash terms -- for OVMF_CODE.fd, the bump means > 1MB-128KB --> 2MB-128KB). If we have to increase OVMF_CODE.fd, that will > be visible on the host filesystem, and might present compatibility > problems for existing VMs after their next boot. (I hope not, but I've > never actually tested this scenario.) > > Mike mentioned elsewhere in this thread that he measured the utility of > new build flags on the change in compressed size. > > I guess it just tells us that TianoCompress performs incredibly well :) > >> So I did a quick test, comparing O0 / O1 / O2 on GCC48, and it seems >> the sweet spot is O1 (due to lack of support for Os in older GCCs) >> >> -O0 >> >> SECFV [11%Full] 212992 total, 23536 used, 189456 free >> FVMAIN_COMPACT [58%Full] 1753088 total, 1016816 used, 736272 free >> DXEFV [47%Full] 10485760 total, 4978656 used, 5507104 free >> PEIFV [14%Full] 917504 total, 133328 used, 784176 free >> >> -O1 >> >> SECFV [6%Full] 212992 total, 14768 used, 198224 free >> FVMAIN_COMPACT [58%Full] 1753088 total, 1017856 used, 735232 free >> DXEFV [37%Full] 10485760 total, 3907648 used, 6578112 free >> PEIFV [10%Full] 917504 total, 95696 used, 821808 free >> >> -O2 >> >> SECFV [7%Full] 212992 total, 15728 used, 197264 free >> FVMAIN_COMPACT [63%Full] 1753088 total, 144 used, 641944 free >> DXEFV [38%Full] 10485760 total, 4029984 used, 6455776 free >> PEIFV [10%Full] 917504 total, 100400 used, 817104 free > > Huh, very interesting. -O1 vs. -O2 produce almost identically sized > PEIFV and DXEFV (both optimization settings beating -O0 of course), but > -O2 does something that makes the compression deteriorate, even to the > point where FVMAIN_COMPACT ends up larger than with -O0! > > More agressive unrolling / inlining with -O2 perhaps? > > And, the effect of -O1 seems to be: practically unchanged FVMAIN_COMPACT > size, relative to -O0 (it's negligibly larger than with -O0 I guess), > and significantly smaller DXEFV and PEIFV usage. > > -O1 seems useful. Assuming that the "negligible" size increase in > FVMAIN_COMPACT, relative to -O0, remains negligible in the long term. :) > >> Anyway, thanks again for your time and effort in testing this, much >> appreciated. > > No, thank you for doing this; I've always been intrigued by the > optimization possibilities for OVMF. What I'd be most interested in is > enabling optimization (saving space) for -b DEBUG -- I don't do source > level debugging, but I want the debug messages and the ASSERT()s. Saving > space with those compiled in would be superb -- I guess that's where the > savings would really shine. > Doing the sam
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 14/07/2016 16:57, Ard Biesheuvel wrote: >> > On patch 5, I don't see any change for IA32 arch. is there no mode for >> > IA32 arch? Here, small and pic must be enabled together, right? Otherwise, >> > the assumption is to load driver below 2G address. Have you collected size >> > data before and after this change? >> > > To be honest, I know very little about the IA32 ABI, but I don't think > it has different code models, does it? Since the ELF relocations are > stripped by the PE/COFF conversion, I don't think it is necessary to > take into account that X64 and IA32 modules may interact with each > other when running on a X64 platform. IIUC, IA32 cannot use memory above 4GB because it doesn't use virtual addresses. Also, IA32 doesn't have PC-relative addressing at all. Paolo ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 07/15/16 11:25, Ard Biesheuvel wrote: > On 15 July 2016 at 11:06, Laszlo Ersek wrote: >> And, to our collective relief, this means that there should be no >> problem with this series. :) I think I can add my Tested-by: >> >> Tested-by: Laszlo Ersek >> > > Thanks a lot! > > I do have another question, though. Looking at your logs, I found the > following results for the sizes > > GCC48 before: > > SECFV [11%Full] 212992 total, 25392 used, 187600 free > FVMAIN_COMPACT [63%Full] 1753088 total, 1113976 used, 639112 free > DXEFV [56%Full] 10485760 total, 5970624 used, 4515136 free > PEIFV [21%Full] 917504 total, 200616 used, 716888 free > > GCC48 after: > > SECFV [7%Full] 212992 total, 15728 used, 197264 free > FVMAIN_COMPACT [65%Full] 1753088 total, 1145808 used, 607280 free > DXEFV [39%Full] 10485760 total, 4148384 used, 6337376 free > PEIFV [14%Full] 917504 total, 131208 used, 786296 free > > > GCC49 before: > > SECFV [12%Full] 212992 total, 25616 used, 187376 free > FVMAIN_COMPACT [64%Full] 1753088 total, 1122760 used, 630328 free > DXEFV [58%Full] 10485760 total, 6106048 used, 4379712 free > PEIFV [22%Full] 917504 total, 205992 used, 711512 free > > GCC49 after: > > SECFV [7%Full] 212992 total, 15824 used, 197168 free > FVMAIN_COMPACT [66%Full] 1753088 total, 1166952 used, 586136 free > DXEFV [40%Full] 10485760 total, 4225728 used, 6260032 free > PEIFV [14%Full] 917504 total, 133672 used, 783832 free > > So DXEFV is substantially smaller, but FVMAIN_COMPACT ends up slightly > bigger. Which one do we care about mostly? I would assume > FVMAIN_COMPACT, since that translates into flash footprint directly. Correct. DXEFV is less relevant, but not irrelevant. We've been increasing its allotted size in the FDF file(s) over time, cramming more and more DXE phase stuff into OVMF. Its size does have an impact on guest memory availability, when the SMM driver stack is built in, and S3 is enabled on the QEMU command line. (Under these circumstances DXEFV's location is covered as AcpiNVS. Not optimal, but that's what we have.) So, in practice, shrinking DXEFV is useful as well. FVMAIN_COMPACT is what ultimately determines the flash chip's size. We've never had to change that (after a historical bump from 1MB to 2MB, speaking unified flash terms -- for OVMF_CODE.fd, the bump means 1MB-128KB --> 2MB-128KB). If we have to increase OVMF_CODE.fd, that will be visible on the host filesystem, and might present compatibility problems for existing VMs after their next boot. (I hope not, but I've never actually tested this scenario.) Mike mentioned elsewhere in this thread that he measured the utility of new build flags on the change in compressed size. I guess it just tells us that TianoCompress performs incredibly well :) > So I did a quick test, comparing O0 / O1 / O2 on GCC48, and it seems > the sweet spot is O1 (due to lack of support for Os in older GCCs) > > -O0 > > SECFV [11%Full] 212992 total, 23536 used, 189456 free > FVMAIN_COMPACT [58%Full] 1753088 total, 1016816 used, 736272 free > DXEFV [47%Full] 10485760 total, 4978656 used, 5507104 free > PEIFV [14%Full] 917504 total, 133328 used, 784176 free > > -O1 > > SECFV [6%Full] 212992 total, 14768 used, 198224 free > FVMAIN_COMPACT [58%Full] 1753088 total, 1017856 used, 735232 free > DXEFV [37%Full] 10485760 total, 3907648 used, 6578112 free > PEIFV [10%Full] 917504 total, 95696 used, 821808 free > > -O2 > > SECFV [7%Full] 212992 total, 15728 used, 197264 free > FVMAIN_COMPACT [63%Full] 1753088 total, 144 used, 641944 free > DXEFV [38%Full] 10485760 total, 4029984 used, 6455776 free > PEIFV [10%Full] 917504 total, 100400 used, 817104 free Huh, very interesting. -O1 vs. -O2 produce almost identically sized PEIFV and DXEFV (both optimization settings beating -O0 of course), but -O2 does something that makes the compression deteriorate, even to the point where FVMAIN_COMPACT ends up larger than with -O0! More agressive unrolling / inlining with -O2 perhaps? And, the effect of -O1 seems to be: practically unchanged FVMAIN_COMPACT size, relative to -O0 (it's negligibly larger than with -O0 I guess), and significantly smaller DXEFV and PEIFV usage. -O1 seems useful. Assuming that the "negligible" size increase in FVMAIN_COMPACT, relative to -O0, remains negligible in the long term. :) > Anyway, thanks again for your time and effort in testing this, much > appreciated. No, thank you for doing this; I've always been intrigued by the optimization possibilities for OVMF. What I'd be most interested in is enabling optimization (saving space) for -b DEBUG -- I don't do source level debugging, but I want the debug messages and the ASSERT()s. Saving space with those compiled in would be superb -- I guess that's where the savings would really shine. Thanks! Laszlo ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 15 July 2016 at 11:06, Laszlo Ersek wrote: > On 07/15/16 10:58, Ard Biesheuvel wrote: >> On 15 July 2016 at 10:48, Laszlo Ersek wrote: >>> On 07/15/16 10:40, Ard Biesheuvel wrote: On 15 July 2016 at 08:33, Laszlo Ersek wrote: > On 07/15/16 08:07, Ard Biesheuvel wrote: >> On 15 July 2016 at 01:27, Laszlo Ersek wrote: > >>> First, the build tests. I built OVMF 84 times, with the following >>> settings: >>> >>> * Dimension 1: whether your and Steven's patches were applied or not. >> >> I take it this means this series only? > > Ah, yes, sorry about the ambiguity -- with "your and Steven's patches" I > meant the five patches in this series -- one patch from Steven, four > patches from you. > >>> * However, then I built my program -- find it attached -- with -b DEBUG >>> and -b RELEASE too, and the -b RELEASE binary is broken. Here's the >>> difference in output: >>> >>> FS2:\> DebugEnrollDefaultKeys.efi >>> info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 >>> VendorKeys=0 >>> info: SetupMode=0 SecureBoot=1 SecureBootEnable=1 CustomMode=0 >>> VendorKeys=0 >>> info: success >>> >>> versus >>> >>> FS2:\> OptEnrollDefaultKeys.efi >>> info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 >>> VendorKeys=0 >>> error: EnrollListOfX509Certs("db", >>> D719B2CB-3D3A-4596-A3BC-DAD00E67656F): Invalid Parameter >>> >>> I don't know why this happens, but it definitely relates to varargs -- >>> the program uses them liberally. >>> >> >> If this code ultimately uses __builtin_ms_va_lists with -O2 and fails, >> it makes sense to report it to the GCC maintainers (assuming we can >> create a test case). I noticed that this code iterates over the same >> VA_LIST twice, I wonder how well that was tested ... > > You make a good point about using VA_LIST twice possibly tickling gcc > the wrong way. Plus, there's one (VOID)VA_ARG() macro invocation in my > code -- in order to step over some arguments --, which might not be all > that usual as well. > > In any case, the way I use varargs seems to be standards conformant. > > With regard to reporting this to gcc developers: I won't try that. I'm > discouraged by two facts: > > * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50818 -- open since > October 2011. You'll find some recent comments from Steven and David in > it :) > > * http://thread.gmane.org/gmane.comp.bios.tianocore.devel/10963 -- > shortly after I ran into the -Os corruption issue with GCC48, I created > and posted this small reproducer. Creating the reproducer wasn't > trivial. In parallel I sent the same reproducer to one of my (indirect) > colleagues at Red Hat, who had been a veteran in upstream GCC > development and maintenance. I also asked him how/where I should report > the bug. I got no answer from him. > > If you'd like to report a GCC bug, please go ahead, but I won't waste my > time :) > It seems that it is your testcase that is broken: --- a/OvmfPkg/EnrollDefaultKeys/EnrollDefaultKeys.c +++ b/OvmfPkg/EnrollDefaultKeys/EnrollDefaultKeys.c @@ -663,6 +663,8 @@ EnrollListOfX509Certs ( UINT8*Data; UINT8*Position; + Status = EFI_SUCCESS; + // // compute total size first, for UINT32 range check, and allocation // With that change, everything works fine for me >>> >>> I agree that proving my code wrong would be the best outcome here, but >>> can you please explain to me on what path we return from >>> EnrollListOfX509Certs() without setting Status? >>> >> >> Well, it seems that in the success case, we are supposed to end up at line >> 692 >> >> if (EFI_ERROR (Status)) { >> goto Out; >> } >> >> and expect the condition to be FALSE. If we have successfully tallied >> the size of at least one cert, 'DataSize == sizeof *SingleHeader' will >> be FALSE, and so Status will never have been assigned a value. >> Apparently, the optimizer sees some kind of undefined behavior here >> which it can exploit to bail unconditionally (or Status happens to be >> 0 under -O0) >> > > You are perfectly right. Thank you for catching this! > > And, to our collective relief, this means that there should be no > problem with this series. :) I think I can add my Tested-by: > > Tested-by: Laszlo Ersek > Thanks a lot! I do have another question, though. Looking at your logs, I found the following results for the sizes GCC48 before: SECFV [11%Full] 212992 total, 25392 used, 187600 free FVMAIN_COMPACT [63%Full] 1753088 total, 1113976 used, 639112 free DXEFV [56%Full] 10485760 total, 5970624 used, 4515136 free PEIFV [21%Full] 917504 total, 200616 used, 716888 free GCC48 after: SECFV [7%Full] 212992 total, 15728 used
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 07/15/16 10:58, Ard Biesheuvel wrote: > On 15 July 2016 at 10:48, Laszlo Ersek wrote: >> On 07/15/16 10:40, Ard Biesheuvel wrote: >>> On 15 July 2016 at 08:33, Laszlo Ersek wrote: On 07/15/16 08:07, Ard Biesheuvel wrote: > On 15 July 2016 at 01:27, Laszlo Ersek wrote: >> First, the build tests. I built OVMF 84 times, with the following >> settings: >> >> * Dimension 1: whether your and Steven's patches were applied or not. > > I take it this means this series only? Ah, yes, sorry about the ambiguity -- with "your and Steven's patches" I meant the five patches in this series -- one patch from Steven, four patches from you. >> * However, then I built my program -- find it attached -- with -b DEBUG >> and -b RELEASE too, and the -b RELEASE binary is broken. Here's the >> difference in output: >> >> FS2:\> DebugEnrollDefaultKeys.efi >> info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 >> VendorKeys=0 >> info: SetupMode=0 SecureBoot=1 SecureBootEnable=1 CustomMode=0 >> VendorKeys=0 >> info: success >> >> versus >> >> FS2:\> OptEnrollDefaultKeys.efi >> info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 >> VendorKeys=0 >> error: EnrollListOfX509Certs("db", >> D719B2CB-3D3A-4596-A3BC-DAD00E67656F): Invalid Parameter >> >> I don't know why this happens, but it definitely relates to varargs -- >> the program uses them liberally. >> > > If this code ultimately uses __builtin_ms_va_lists with -O2 and fails, > it makes sense to report it to the GCC maintainers (assuming we can > create a test case). I noticed that this code iterates over the same > VA_LIST twice, I wonder how well that was tested ... You make a good point about using VA_LIST twice possibly tickling gcc the wrong way. Plus, there's one (VOID)VA_ARG() macro invocation in my code -- in order to step over some arguments --, which might not be all that usual as well. In any case, the way I use varargs seems to be standards conformant. With regard to reporting this to gcc developers: I won't try that. I'm discouraged by two facts: * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50818 -- open since October 2011. You'll find some recent comments from Steven and David in it :) * http://thread.gmane.org/gmane.comp.bios.tianocore.devel/10963 -- shortly after I ran into the -Os corruption issue with GCC48, I created and posted this small reproducer. Creating the reproducer wasn't trivial. In parallel I sent the same reproducer to one of my (indirect) colleagues at Red Hat, who had been a veteran in upstream GCC development and maintenance. I also asked him how/where I should report the bug. I got no answer from him. If you'd like to report a GCC bug, please go ahead, but I won't waste my time :) >>> >>> It seems that it is your testcase that is broken: >>> >>> --- a/OvmfPkg/EnrollDefaultKeys/EnrollDefaultKeys.c >>> +++ b/OvmfPkg/EnrollDefaultKeys/EnrollDefaultKeys.c >>> @@ -663,6 +663,8 @@ EnrollListOfX509Certs ( >>>UINT8*Data; >>>UINT8*Position; >>> >>> + Status = EFI_SUCCESS; >>> + >>>// >>>// compute total size first, for UINT32 range check, and allocation >>>// >>> >>> With that change, everything works fine for me >> >> I agree that proving my code wrong would be the best outcome here, but >> can you please explain to me on what path we return from >> EnrollListOfX509Certs() without setting Status? >> > > Well, it seems that in the success case, we are supposed to end up at line 692 > > if (EFI_ERROR (Status)) { > goto Out; > } > > and expect the condition to be FALSE. If we have successfully tallied > the size of at least one cert, 'DataSize == sizeof *SingleHeader' will > be FALSE, and so Status will never have been assigned a value. > Apparently, the optimizer sees some kind of undefined behavior here > which it can exploit to bail unconditionally (or Status happens to be > 0 under -O0) > You are perfectly right. Thank you for catching this! And, to our collective relief, this means that there should be no problem with this series. :) I think I can add my Tested-by: Tested-by: Laszlo Ersek Cheers! Laszlo ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 15 July 2016 at 10:48, Laszlo Ersek wrote: > On 07/15/16 10:40, Ard Biesheuvel wrote: >> On 15 July 2016 at 08:33, Laszlo Ersek wrote: >>> On 07/15/16 08:07, Ard Biesheuvel wrote: On 15 July 2016 at 01:27, Laszlo Ersek wrote: >>> > First, the build tests. I built OVMF 84 times, with the following > settings: > > * Dimension 1: whether your and Steven's patches were applied or not. I take it this means this series only? >>> >>> Ah, yes, sorry about the ambiguity -- with "your and Steven's patches" I >>> meant the five patches in this series -- one patch from Steven, four >>> patches from you. >>> > * However, then I built my program -- find it attached -- with -b DEBUG > and -b RELEASE too, and the -b RELEASE binary is broken. Here's the > difference in output: > > FS2:\> DebugEnrollDefaultKeys.efi > info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 > VendorKeys=0 > info: SetupMode=0 SecureBoot=1 SecureBootEnable=1 CustomMode=0 > VendorKeys=0 > info: success > > versus > > FS2:\> OptEnrollDefaultKeys.efi > info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 > VendorKeys=0 > error: EnrollListOfX509Certs("db", > D719B2CB-3D3A-4596-A3BC-DAD00E67656F): Invalid Parameter > > I don't know why this happens, but it definitely relates to varargs -- > the program uses them liberally. > If this code ultimately uses __builtin_ms_va_lists with -O2 and fails, it makes sense to report it to the GCC maintainers (assuming we can create a test case). I noticed that this code iterates over the same VA_LIST twice, I wonder how well that was tested ... >>> >>> You make a good point about using VA_LIST twice possibly tickling gcc >>> the wrong way. Plus, there's one (VOID)VA_ARG() macro invocation in my >>> code -- in order to step over some arguments --, which might not be all >>> that usual as well. >>> >>> In any case, the way I use varargs seems to be standards conformant. >>> >>> With regard to reporting this to gcc developers: I won't try that. I'm >>> discouraged by two facts: >>> >>> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50818 -- open since >>> October 2011. You'll find some recent comments from Steven and David in >>> it :) >>> >>> * http://thread.gmane.org/gmane.comp.bios.tianocore.devel/10963 -- >>> shortly after I ran into the -Os corruption issue with GCC48, I created >>> and posted this small reproducer. Creating the reproducer wasn't >>> trivial. In parallel I sent the same reproducer to one of my (indirect) >>> colleagues at Red Hat, who had been a veteran in upstream GCC >>> development and maintenance. I also asked him how/where I should report >>> the bug. I got no answer from him. >>> >>> If you'd like to report a GCC bug, please go ahead, but I won't waste my >>> time :) >>> >> >> It seems that it is your testcase that is broken: >> >> --- a/OvmfPkg/EnrollDefaultKeys/EnrollDefaultKeys.c >> +++ b/OvmfPkg/EnrollDefaultKeys/EnrollDefaultKeys.c >> @@ -663,6 +663,8 @@ EnrollListOfX509Certs ( >>UINT8*Data; >>UINT8*Position; >> >> + Status = EFI_SUCCESS; >> + >>// >>// compute total size first, for UINT32 range check, and allocation >>// >> >> With that change, everything works fine for me > > I agree that proving my code wrong would be the best outcome here, but > can you please explain to me on what path we return from > EnrollListOfX509Certs() without setting Status? > Well, it seems that in the success case, we are supposed to end up at line 692 if (EFI_ERROR (Status)) { goto Out; } and expect the condition to be FALSE. If we have successfully tallied the size of at least one cert, 'DataSize == sizeof *SingleHeader' will be FALSE, and so Status will never have been assigned a value. Apparently, the optimizer sees some kind of undefined behavior here which it can exploit to bail unconditionally (or Status happens to be 0 under -O0) ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 07/15/16 10:40, Ard Biesheuvel wrote: > On 15 July 2016 at 08:33, Laszlo Ersek wrote: >> On 07/15/16 08:07, Ard Biesheuvel wrote: >>> On 15 July 2016 at 01:27, Laszlo Ersek wrote: >> First, the build tests. I built OVMF 84 times, with the following settings: * Dimension 1: whether your and Steven's patches were applied or not. >>> >>> I take it this means this series only? >> >> Ah, yes, sorry about the ambiguity -- with "your and Steven's patches" I >> meant the five patches in this series -- one patch from Steven, four >> patches from you. >> * However, then I built my program -- find it attached -- with -b DEBUG and -b RELEASE too, and the -b RELEASE binary is broken. Here's the difference in output: FS2:\> DebugEnrollDefaultKeys.efi info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 VendorKeys=0 info: SetupMode=0 SecureBoot=1 SecureBootEnable=1 CustomMode=0 VendorKeys=0 info: success versus FS2:\> OptEnrollDefaultKeys.efi info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 VendorKeys=0 error: EnrollListOfX509Certs("db", D719B2CB-3D3A-4596-A3BC-DAD00E67656F): Invalid Parameter I don't know why this happens, but it definitely relates to varargs -- the program uses them liberally. >>> >>> If this code ultimately uses __builtin_ms_va_lists with -O2 and fails, >>> it makes sense to report it to the GCC maintainers (assuming we can >>> create a test case). I noticed that this code iterates over the same >>> VA_LIST twice, I wonder how well that was tested ... >> >> You make a good point about using VA_LIST twice possibly tickling gcc >> the wrong way. Plus, there's one (VOID)VA_ARG() macro invocation in my >> code -- in order to step over some arguments --, which might not be all >> that usual as well. >> >> In any case, the way I use varargs seems to be standards conformant. >> >> With regard to reporting this to gcc developers: I won't try that. I'm >> discouraged by two facts: >> >> * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50818 -- open since >> October 2011. You'll find some recent comments from Steven and David in >> it :) >> >> * http://thread.gmane.org/gmane.comp.bios.tianocore.devel/10963 -- >> shortly after I ran into the -Os corruption issue with GCC48, I created >> and posted this small reproducer. Creating the reproducer wasn't >> trivial. In parallel I sent the same reproducer to one of my (indirect) >> colleagues at Red Hat, who had been a veteran in upstream GCC >> development and maintenance. I also asked him how/where I should report >> the bug. I got no answer from him. >> >> If you'd like to report a GCC bug, please go ahead, but I won't waste my >> time :) >> > > It seems that it is your testcase that is broken: > > --- a/OvmfPkg/EnrollDefaultKeys/EnrollDefaultKeys.c > +++ b/OvmfPkg/EnrollDefaultKeys/EnrollDefaultKeys.c > @@ -663,6 +663,8 @@ EnrollListOfX509Certs ( >UINT8*Data; >UINT8*Position; > > + Status = EFI_SUCCESS; > + >// >// compute total size first, for UINT32 range check, and allocation >// > > With that change, everything works fine for me I agree that proving my code wrong would be the best outcome here, but can you please explain to me on what path we return from EnrollListOfX509Certs() without setting Status? Thanks Laszlo ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 15 July 2016 at 08:33, Laszlo Ersek wrote: > On 07/15/16 08:07, Ard Biesheuvel wrote: >> On 15 July 2016 at 01:27, Laszlo Ersek wrote: > >>> First, the build tests. I built OVMF 84 times, with the following settings: >>> >>> * Dimension 1: whether your and Steven's patches were applied or not. >> >> I take it this means this series only? > > Ah, yes, sorry about the ambiguity -- with "your and Steven's patches" I > meant the five patches in this series -- one patch from Steven, four > patches from you. > >>> * However, then I built my program -- find it attached -- with -b DEBUG >>> and -b RELEASE too, and the -b RELEASE binary is broken. Here's the >>> difference in output: >>> >>> FS2:\> DebugEnrollDefaultKeys.efi >>> info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 VendorKeys=0 >>> info: SetupMode=0 SecureBoot=1 SecureBootEnable=1 CustomMode=0 VendorKeys=0 >>> info: success >>> >>> versus >>> >>> FS2:\> OptEnrollDefaultKeys.efi >>> info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 VendorKeys=0 >>> error: EnrollListOfX509Certs("db", >>> D719B2CB-3D3A-4596-A3BC-DAD00E67656F): Invalid Parameter >>> >>> I don't know why this happens, but it definitely relates to varargs -- >>> the program uses them liberally. >>> >> >> If this code ultimately uses __builtin_ms_va_lists with -O2 and fails, >> it makes sense to report it to the GCC maintainers (assuming we can >> create a test case). I noticed that this code iterates over the same >> VA_LIST twice, I wonder how well that was tested ... > > You make a good point about using VA_LIST twice possibly tickling gcc > the wrong way. Plus, there's one (VOID)VA_ARG() macro invocation in my > code -- in order to step over some arguments --, which might not be all > that usual as well. > > In any case, the way I use varargs seems to be standards conformant. > > With regard to reporting this to gcc developers: I won't try that. I'm > discouraged by two facts: > > * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50818 -- open since > October 2011. You'll find some recent comments from Steven and David in > it :) > > * http://thread.gmane.org/gmane.comp.bios.tianocore.devel/10963 -- > shortly after I ran into the -Os corruption issue with GCC48, I created > and posted this small reproducer. Creating the reproducer wasn't > trivial. In parallel I sent the same reproducer to one of my (indirect) > colleagues at Red Hat, who had been a veteran in upstream GCC > development and maintenance. I also asked him how/where I should report > the bug. I got no answer from him. > > If you'd like to report a GCC bug, please go ahead, but I won't waste my > time :) > It seems that it is your testcase that is broken: --- a/OvmfPkg/EnrollDefaultKeys/EnrollDefaultKeys.c +++ b/OvmfPkg/EnrollDefaultKeys/EnrollDefaultKeys.c @@ -663,6 +663,8 @@ EnrollListOfX509Certs ( UINT8*Data; UINT8*Position; + Status = EFI_SUCCESS; + // // compute total size first, for UINT32 range check, and allocation // With that change, everything works fine for me Thanks, Ard. ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
Ard: The error message shows the error happens in the first loop in VA_START and VA_END. Could we update the logic to use single va list and verify it again? If the single va list passes, it can prove your suspect. If the single va list still fail, it may be other issue. I think we need to know which usage will cause break with your patches. Thanks Liming > -Original Message- > From: Ard Biesheuvel [mailto:ard.biesheu...@linaro.org] > Sent: Friday, July 15, 2016 2:07 PM > To: Laszlo Ersek > Cc: Shi, Steven ; Zhu, Yonghong > ; Gao, Liming ; Kinney, > Michael D ; Justen, Jordan L > ; af...@apple.com; edk2-devel@lists.01.org > > Subject: Re: [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64 > > On 15 July 2016 at 01:27, Laszlo Ersek wrote: > > On 07/14/16 15:16, Ard Biesheuvel wrote: > >> This series is not an attempt to steal Steven's thunder. I was merely > >> inspired by some of the changes he is proposing for GCC 5 and Clang > >> support, which I noticed would be applicable to older versions of GCC > >> as well. > >> > >> The first patch fixes the issue that __builtin_unreachable() is not > >> implemented by GCC 4.4 or earlier. > >> > >> Patch #2 is Steven's patch to switch to the flavor of VA_LIST that is > >> explicitly modeled after the MS implementation. This by itself is an > >> improvement, since the open coded implementation that performs > arithmetic > >> on the address of explicit arguments to obtain the variadic arguments is > >> fragile and difficult to maintain, and should be best avoided. > >> > >> Patch #3 enabled -O2 optimization for X64/RELEASE all the way back to > GCC44. > >> I tested this change with both Ovmf and EmulatorPkg built in various ways > >> and with various versions, with the caveat that I did not always use a > matching > >> binutils (i.e., of the same era). Since the issues this series deal with > >> are > >> all code generation issues, I think this is reasonable, but more testing > would > >> be appreciated. > >> > >> Patch #4 applies the visibility 'hidden' GCC pragma globally. Please refer > >> to > >> the commit log for the motivation. > >> > >> Patch #5 switches GCC/X64 to the PIC small code model, which results in > smaller > >> code. > >> > >> Ard Biesheuvel (4): > >> MdePkg: avoid __builtin_unreachable() on GCC v4.4 > >> BaseTools/tools_def: enable O2 optimization for GCC X64 builds > >> MdePkg X64: force 'hidden' visibility when building with -fpic > >> BaseTools/tools_def: switch GCC/X64 to the PIC small model > >> > >> Shi, Steven (1): > >> MdePkg: Enable new MS VA intrinsics for GNUC x86 64bits build > >> > >> BaseTools/Conf/tools_def.template | 18 +++- > >> MdePkg/Include/Base.h | 29 +++- > >> MdePkg/Include/X64/ProcessorBind.h | 10 +++ > >> 3 files changed, 49 insertions(+), 8 deletions(-) > >> mode change 100644 => 100755 MdePkg/Include/Base.h > >> > > > > So, I did some tests. > > > > Thanks a lot for the awesome work, and for the amount of effort you > have put in. I could never have tested this as methodical and with the > same coverage as you have. > > > First, the build tests. I built OVMF 84 times, with the following settings: > > > > * Dimension 1: whether your and Steven's patches were applied or not. > > I take it this means this series only? > > > The basis was always: > > - upstream 04147690b59b, > > - plus my personal patches that I constantly rebase, > > - plus my pending "feat_ctrl_issue97_v3" patches, > > - plus my work-in-progress build fixes for GCCxx and VS2015x86 warnings, > > - plus OpenSSL 1.0.2g. > > > > These patches don't interfere with your and Steven's work. So on top of > > this basis, your and Steven's patches were either applied (rebased) or > > not. This is a multiplier of 2. > > > > * Dimension 2: the GCC toolchain used. Ranging from GCC44 through > GCC49, > > plus gcc-6.1 at the end (using GCC49 settings). Multiplier of 7. > > > > * Dimension 3: DEBUG versus RELEASE build target. Factor of 2. > > > > * Dimension 4: the OVMF platform. I used the following command lines: > > > > - build -a IA32-p OvmfPkg/OvmfPkgIa32.dsc-D SMM_REQUIRE \ > > -D SECURE_BOOT_ENABLE -D HTTP_BOOT_ENABLE > > - build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc -D SMM_REQUIRE \ > > -D SECURE_BOOT_ENABLE -D HTTP_BOOT_ENABLE > > - build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \ > > -D SECURE_BOOT_ENABLE -D HTTP_BOOT_ENABLE > > > > As you can see, Ia32 and Ia32X64 are built with the SMM driver stack; > > X64 is built without it. (This is because edk2 S3 doesn't support > > X64+SMM at the moment, so I either have to build X64 without SMM, or > > disable S3 support on the QEMU command line. I find the latter more > > inconvenient, plus Ia32X64 is just fine for 64-bit OS + SMM + S3.) > > Anyhow, this is a factor of 3. > > > > For these builds, I collected the log files and the report files. Values > > in dimension 3 are not separated into separate
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 07/15/16 08:07, Ard Biesheuvel wrote: > On 15 July 2016 at 01:27, Laszlo Ersek wrote: >> First, the build tests. I built OVMF 84 times, with the following settings: >> >> * Dimension 1: whether your and Steven's patches were applied or not. > > I take it this means this series only? Ah, yes, sorry about the ambiguity -- with "your and Steven's patches" I meant the five patches in this series -- one patch from Steven, four patches from you. >> * However, then I built my program -- find it attached -- with -b DEBUG >> and -b RELEASE too, and the -b RELEASE binary is broken. Here's the >> difference in output: >> >> FS2:\> DebugEnrollDefaultKeys.efi >> info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 VendorKeys=0 >> info: SetupMode=0 SecureBoot=1 SecureBootEnable=1 CustomMode=0 VendorKeys=0 >> info: success >> >> versus >> >> FS2:\> OptEnrollDefaultKeys.efi >> info: SetupMode=1 SecureBoot=0 SecureBootEnable=1 CustomMode=1 VendorKeys=0 >> error: EnrollListOfX509Certs("db", >> D719B2CB-3D3A-4596-A3BC-DAD00E67656F): Invalid Parameter >> >> I don't know why this happens, but it definitely relates to varargs -- >> the program uses them liberally. >> > > If this code ultimately uses __builtin_ms_va_lists with -O2 and fails, > it makes sense to report it to the GCC maintainers (assuming we can > create a test case). I noticed that this code iterates over the same > VA_LIST twice, I wonder how well that was tested ... You make a good point about using VA_LIST twice possibly tickling gcc the wrong way. Plus, there's one (VOID)VA_ARG() macro invocation in my code -- in order to step over some arguments --, which might not be all that usual as well. In any case, the way I use varargs seems to be standards conformant. With regard to reporting this to gcc developers: I won't try that. I'm discouraged by two facts: * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50818 -- open since October 2011. You'll find some recent comments from Steven and David in it :) * http://thread.gmane.org/gmane.comp.bios.tianocore.devel/10963 -- shortly after I ran into the -Os corruption issue with GCC48, I created and posted this small reproducer. Creating the reproducer wasn't trivial. In parallel I sent the same reproducer to one of my (indirect) colleagues at Red Hat, who had been a veteran in upstream GCC development and maintenance. I also asked him how/where I should report the bug. I got no answer from him. If you'd like to report a GCC bug, please go ahead, but I won't waste my time :) Laszlo ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 15 July 2016 at 01:27, Laszlo Ersek wrote: > On 07/14/16 15:16, Ard Biesheuvel wrote: >> This series is not an attempt to steal Steven's thunder. I was merely >> inspired by some of the changes he is proposing for GCC 5 and Clang >> support, which I noticed would be applicable to older versions of GCC >> as well. >> >> The first patch fixes the issue that __builtin_unreachable() is not >> implemented by GCC 4.4 or earlier. >> >> Patch #2 is Steven's patch to switch to the flavor of VA_LIST that is >> explicitly modeled after the MS implementation. This by itself is an >> improvement, since the open coded implementation that performs arithmetic >> on the address of explicit arguments to obtain the variadic arguments is >> fragile and difficult to maintain, and should be best avoided. >> >> Patch #3 enabled -O2 optimization for X64/RELEASE all the way back to GCC44. >> I tested this change with both Ovmf and EmulatorPkg built in various ways >> and with various versions, with the caveat that I did not always use a >> matching >> binutils (i.e., of the same era). Since the issues this series deal with are >> all code generation issues, I think this is reasonable, but more testing >> would >> be appreciated. >> >> Patch #4 applies the visibility 'hidden' GCC pragma globally. Please refer to >> the commit log for the motivation. >> >> Patch #5 switches GCC/X64 to the PIC small code model, which results in >> smaller >> code. >> >> Ard Biesheuvel (4): >> MdePkg: avoid __builtin_unreachable() on GCC v4.4 >> BaseTools/tools_def: enable O2 optimization for GCC X64 builds >> MdePkg X64: force 'hidden' visibility when building with -fpic >> BaseTools/tools_def: switch GCC/X64 to the PIC small model >> >> Shi, Steven (1): >> MdePkg: Enable new MS VA intrinsics for GNUC x86 64bits build >> >> BaseTools/Conf/tools_def.template | 18 +++- >> MdePkg/Include/Base.h | 29 +++- >> MdePkg/Include/X64/ProcessorBind.h | 10 +++ >> 3 files changed, 49 insertions(+), 8 deletions(-) >> mode change 100644 => 100755 MdePkg/Include/Base.h >> > > So, I did some tests. > Thanks a lot for the awesome work, and for the amount of effort you have put in. I could never have tested this as methodical and with the same coverage as you have. > First, the build tests. I built OVMF 84 times, with the following settings: > > * Dimension 1: whether your and Steven's patches were applied or not. I take it this means this series only? > The basis was always: > - upstream 04147690b59b, > - plus my personal patches that I constantly rebase, > - plus my pending "feat_ctrl_issue97_v3" patches, > - plus my work-in-progress build fixes for GCCxx and VS2015x86 warnings, > - plus OpenSSL 1.0.2g. > > These patches don't interfere with your and Steven's work. So on top of > this basis, your and Steven's patches were either applied (rebased) or > not. This is a multiplier of 2. > > * Dimension 2: the GCC toolchain used. Ranging from GCC44 through GCC49, > plus gcc-6.1 at the end (using GCC49 settings). Multiplier of 7. > > * Dimension 3: DEBUG versus RELEASE build target. Factor of 2. > > * Dimension 4: the OVMF platform. I used the following command lines: > > - build -a IA32-p OvmfPkg/OvmfPkgIa32.dsc-D SMM_REQUIRE \ > -D SECURE_BOOT_ENABLE -D HTTP_BOOT_ENABLE > - build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc -D SMM_REQUIRE \ > -D SECURE_BOOT_ENABLE -D HTTP_BOOT_ENABLE > - build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \ > -D SECURE_BOOT_ENABLE -D HTTP_BOOT_ENABLE > > As you can see, Ia32 and Ia32X64 are built with the SMM driver stack; > X64 is built without it. (This is because edk2 S3 doesn't support > X64+SMM at the moment, so I either have to build X64 without SMM, or > disable S3 support on the QEMU command line. I find the latter more > inconvenient, plus Ia32X64 is just fine for 64-bit OS + SMM + S3.) > Anyhow, this is a factor of 3. > > For these builds, I collected the log files and the report files. Values > in dimension 3 are not separated into separate files, because you can > pass -b DEBUG -b RELEASE at once, and the build utility will do the > right thing -- it will build both targets in succession. However, the > log file and the report file each will contain both DEBUG and RELEASE > output. > > For the builds I also collected the OVMF_CODE.fd firmware binaries. > > The good news is that all of the builds succeeded. You can draw size > comparisons from the build log files: see SECFV, PEIFV and DXEFV for > uncompressed sizes, and FVMAIN_COMPACT for the compressed size of > PEIFV+DXEFV together. (Thanks Mike for the reminder!) > > I'll send you the link to the results archive off-list. (I tend not to > post stuff publicly that I cannot review in detail, purely out of > privacy concerns, and this archive is just too big for that. For example > I dislike disclosing full pathnames from my system: in commit messages I > always edit them out.) > > Clearly if
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 15 July 2016 at 03:28, Gao, Liming wrote: > I see Steven patch uses -Os after enables new MS VA intrinsics. So, -Os > should work. But, I am not sure whether -Os is better than -O2 for edk2. > -Os did not work for me with GCC49 or earlier. It complains about a missing -maccumulate-outgoing-args, but adding that option results in a broken binary. >> -Original Message- >> From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of >> Laszlo Ersek >> Sent: Friday, July 15, 2016 6:20 AM >> To: Bruce Cran ; Ard Biesheuvel >> ; Shi, Steven ; Zhu, >> Yonghong ; Gao, Liming ; >> Kinney, Michael D ; Justen, Jordan L >> ; af...@apple.com; edk2-de...@ml01.01.org >> Subject: Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64 >> >> On 07/14/16 23:57, Bruce Cran wrote: >> > On 7/14/2016 7:16 AM, Ard Biesheuvel wrote: >> > >> >> Patch #3 enabled -O2 optimization for X64/RELEASE all the way back to >> >> GCC44. >> > >> > I wonder if -Os might be a better default optimization? Or perhaps >> > there's plenty of flash on devices nowadays and performance is more >> > important than size? >> > >> > At least for our driver, I see a 33% size reduction with -O2 and 44% >> > with -Os for the driver .efi file. Perhaps more importantly, the .rom >> > file (i.e. efirom output) has no reduction with -O2 and a 14% reduction >> > with -Os. >> > >> >> Beware of -Os with GCC48: >> >> http://thread.gmane.org/gmane.comp.bios.tianocore.devel/10963 >> >> Thanks >> Laszlo >> ___ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
I see Steven patch uses -Os after enables new MS VA intrinsics. So, -Os should work. But, I am not sure whether -Os is better than -O2 for edk2. > -Original Message- > From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of > Laszlo Ersek > Sent: Friday, July 15, 2016 6:20 AM > To: Bruce Cran ; Ard Biesheuvel > ; Shi, Steven ; Zhu, > Yonghong ; Gao, Liming ; > Kinney, Michael D ; Justen, Jordan L > ; af...@apple.com; edk2-de...@ml01.01.org > Subject: Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64 > > On 07/14/16 23:57, Bruce Cran wrote: > > On 7/14/2016 7:16 AM, Ard Biesheuvel wrote: > > > >> Patch #3 enabled -O2 optimization for X64/RELEASE all the way back to > >> GCC44. > > > > I wonder if -Os might be a better default optimization? Or perhaps > > there's plenty of flash on devices nowadays and performance is more > > important than size? > > > > At least for our driver, I see a 33% size reduction with -O2 and 44% > > with -Os for the driver .efi file. Perhaps more importantly, the .rom > > file (i.e. efirom output) has no reduction with -O2 and a 14% reduction > > with -Os. > > > > Beware of -Os with GCC48: > > http://thread.gmane.org/gmane.comp.bios.tianocore.devel/10963 > > Thanks > Laszlo > ___ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 07/14/16 15:16, Ard Biesheuvel wrote: > This series is not an attempt to steal Steven's thunder. I was merely > inspired by some of the changes he is proposing for GCC 5 and Clang > support, which I noticed would be applicable to older versions of GCC > as well. > > The first patch fixes the issue that __builtin_unreachable() is not > implemented by GCC 4.4 or earlier. > > Patch #2 is Steven's patch to switch to the flavor of VA_LIST that is > explicitly modeled after the MS implementation. This by itself is an > improvement, since the open coded implementation that performs arithmetic > on the address of explicit arguments to obtain the variadic arguments is > fragile and difficult to maintain, and should be best avoided. > > Patch #3 enabled -O2 optimization for X64/RELEASE all the way back to GCC44. > I tested this change with both Ovmf and EmulatorPkg built in various ways > and with various versions, with the caveat that I did not always use a > matching > binutils (i.e., of the same era). Since the issues this series deal with are > all code generation issues, I think this is reasonable, but more testing would > be appreciated. > > Patch #4 applies the visibility 'hidden' GCC pragma globally. Please refer to > the commit log for the motivation. > > Patch #5 switches GCC/X64 to the PIC small code model, which results in > smaller > code. > > Ard Biesheuvel (4): > MdePkg: avoid __builtin_unreachable() on GCC v4.4 > BaseTools/tools_def: enable O2 optimization for GCC X64 builds > MdePkg X64: force 'hidden' visibility when building with -fpic > BaseTools/tools_def: switch GCC/X64 to the PIC small model > > Shi, Steven (1): > MdePkg: Enable new MS VA intrinsics for GNUC x86 64bits build > > BaseTools/Conf/tools_def.template | 18 +++- > MdePkg/Include/Base.h | 29 +++- > MdePkg/Include/X64/ProcessorBind.h | 10 +++ > 3 files changed, 49 insertions(+), 8 deletions(-) > mode change 100644 => 100755 MdePkg/Include/Base.h > So, I did some tests. First, the build tests. I built OVMF 84 times, with the following settings: * Dimension 1: whether your and Steven's patches were applied or not. The basis was always: - upstream 04147690b59b, - plus my personal patches that I constantly rebase, - plus my pending "feat_ctrl_issue97_v3" patches, - plus my work-in-progress build fixes for GCCxx and VS2015x86 warnings, - plus OpenSSL 1.0.2g. These patches don't interfere with your and Steven's work. So on top of this basis, your and Steven's patches were either applied (rebased) or not. This is a multiplier of 2. * Dimension 2: the GCC toolchain used. Ranging from GCC44 through GCC49, plus gcc-6.1 at the end (using GCC49 settings). Multiplier of 7. * Dimension 3: DEBUG versus RELEASE build target. Factor of 2. * Dimension 4: the OVMF platform. I used the following command lines: - build -a IA32-p OvmfPkg/OvmfPkgIa32.dsc-D SMM_REQUIRE \ -D SECURE_BOOT_ENABLE -D HTTP_BOOT_ENABLE - build -a IA32 -a X64 -p OvmfPkg/OvmfPkgIa32X64.dsc -D SMM_REQUIRE \ -D SECURE_BOOT_ENABLE -D HTTP_BOOT_ENABLE - build -a X64 -p OvmfPkg/OvmfPkgX64.dsc \ -D SECURE_BOOT_ENABLE -D HTTP_BOOT_ENABLE As you can see, Ia32 and Ia32X64 are built with the SMM driver stack; X64 is built without it. (This is because edk2 S3 doesn't support X64+SMM at the moment, so I either have to build X64 without SMM, or disable S3 support on the QEMU command line. I find the latter more inconvenient, plus Ia32X64 is just fine for 64-bit OS + SMM + S3.) Anyhow, this is a factor of 3. For these builds, I collected the log files and the report files. Values in dimension 3 are not separated into separate files, because you can pass -b DEBUG -b RELEASE at once, and the build utility will do the right thing -- it will build both targets in succession. However, the log file and the report file each will contain both DEBUG and RELEASE output. For the builds I also collected the OVMF_CODE.fd firmware binaries. The good news is that all of the builds succeeded. You can draw size comparisons from the build log files: see SECFV, PEIFV and DXEFV for uncompressed sizes, and FVMAIN_COMPACT for the compressed size of PEIFV+DXEFV together. (Thanks Mike for the reminder!) I'll send you the link to the results archive off-list. (I tend not to post stuff publicly that I cannot review in detail, purely out of privacy concerns, and this archive is just too big for that. For example I dislike disclosing full pathnames from my system: in commit messages I always edit them out.) Clearly if you distill any numbers from the archive, you are going to review those in detail (and hopefully you won't disclose the entire archive), so it should suffice to constrain my privacy concerns to you personally. :) -o- Second, the run test. I executed two test scenarios here. For these, the following dimensions were fixed: - Dimension 1: your and Steven's patches were always applied. - Dimension
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
Hi Bruce, I think size is most important for the default settings. This is why it it important to measure the size changes for each set of flag options. Size measurements are a bit complex due to the interaction with compressors. I have found that measuring the used space in an FV with compression enabled is the best way to determine if a compiler/linker flag has a positive or negative influence on the size. For efirom, I recommend enabling UEFI compression (-ec option). If a specific module needs a flag a for performance optimization, that can be done in a section of DSC or [BuildOptions] section of INF. Mike > -Original Message- > From: Bruce Cran [mailto:br...@cran.org.uk] > Sent: Thursday, July 14, 2016 2:57 PM > To: Ard Biesheuvel ; Shi, Steven > ; > Zhu, Yonghong ; Gao, Liming ; > Kinney, > Michael D ; Justen, Jordan L > ; > ler...@redhat.com; af...@apple.com; edk2-devel@lists.01.org > Subject: Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64 > > On 7/14/2016 7:16 AM, Ard Biesheuvel wrote: > > > Patch #3 enabled -O2 optimization for X64/RELEASE all the way back to GCC44. > > I wonder if -Os might be a better default optimization? Or perhaps > there's plenty of flash on devices nowadays and performance is more > important than size? > > At least for our driver, I see a 33% size reduction with -O2 and 44% > with -Os for the driver .efi file. Perhaps more importantly, the .rom > file (i.e. efirom output) has no reduction with -O2 and a 14% reduction > with -Os. > > -- > Bruce ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 7/14/2016 4:20 PM, Laszlo Ersek wrote: On 07/14/16 23:57, Bruce Cran wrote: I wonder if -Os might be a better default optimization? Or perhaps there's plenty of flash on devices nowadays and performance is more important than size? Beware of -Os with GCC48: http://thread.gmane.org/gmane.comp.bios.tianocore.devel/10963 Oh, good point. Also, at this point any optimization is a _huge_ improvement! :) -- Bruce ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 07/14/16 23:57, Bruce Cran wrote: > On 7/14/2016 7:16 AM, Ard Biesheuvel wrote: > >> Patch #3 enabled -O2 optimization for X64/RELEASE all the way back to >> GCC44. > > I wonder if -Os might be a better default optimization? Or perhaps > there's plenty of flash on devices nowadays and performance is more > important than size? > > At least for our driver, I see a 33% size reduction with -O2 and 44% > with -Os for the driver .efi file. Perhaps more importantly, the .rom > file (i.e. efirom output) has no reduction with -O2 and a 14% reduction > with -Os. > Beware of -Os with GCC48: http://thread.gmane.org/gmane.comp.bios.tianocore.devel/10963 Thanks Laszlo ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 7/14/2016 7:16 AM, Ard Biesheuvel wrote: Patch #3 enabled -O2 optimization for X64/RELEASE all the way back to GCC44. I wonder if -Os might be a better default optimization? Or perhaps there's plenty of flash on devices nowadays and performance is more important than size? At least for our driver, I see a 33% size reduction with -O2 and 44% with -Os for the driver .efi file. Perhaps more importantly, the .rom file (i.e. efirom output) has no reduction with -O2 and a 14% reduction with -Os. -- Bruce ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 14 July 2016 at 16:42, Gao, Liming wrote: > Ard: > On patch 4, the visibility 'hidden' GCC pragma globally. Does it work > together with other mode, such as large? Or it only works with small and pic? > I want to clarify this change has no negative impact. The #if checks for 'defined(__pic__)', which is only true if -fpic is being passed on the command line. It works fine with other code models, it only affects the compiler's decision whether to reference symbols indirectly via the GOT. > On patch 5, I don't see any change for IA32 arch. is there no mode for IA32 > arch? Here, small and pic must be enabled together, right? Otherwise, the > assumption is to load driver below 2G address. Have you collected size data > before and after this change? > To be honest, I know very little about the IA32 ABI, but I don't think it has different code models, does it? Since the ELF relocations are stripped by the PE/COFF conversion, I don't think it is necessary to take into account that X64 and IA32 modules may interact with each other when running on a X64 platform. As far as the size is concerned, I get the following results building DxeCore.efi with GCC47 (under OvmfPkgX64.dsc) Before: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0 0 0 [ 1] .text PROGBITS 0240 00c0 0001e3d0 AX 0 0 32 [ 2] .rela.textRELA 00029528 f360 0018 I 6 1 8 [ 3] .data PROGBITS 0001e620 0001e4a0 3448 WA 0 0 32 [ 4] .rela.dataRELA 0003 1158 0018 I 6 3 8 After: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0 0 0 [ 1] .text PROGBITS 0240 00c0 0001a5a0 AX 0 0 32 [ 2] .rela.textRELA 00025d48 00010218 0018 I 6 1 8 [ 3] .data PROGBITS 0001a7e0 0001a660 3668 WA 0 0 32 [ 4] .rela.dataRELA 00035f60 1788 0018 I 6 3 8 So the text section shrinks by 16 KB (13%) in this case, and I got similar results for other modules. -- Ard. ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
Ard: On patch 4, the visibility 'hidden' GCC pragma globally. Does it work together with other mode, such as large? Or it only works with small and pic? I want to clarify this change has no negative impact. On patch 5, I don't see any change for IA32 arch. is there no mode for IA32 arch? Here, small and pic must be enabled together, right? Otherwise, the assumption is to load driver below 2G address. Have you collected size data before and after this change? Thanks Liming -Original Message- From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of Ard Biesheuvel Sent: Thursday, July 14, 2016 9:16 PM To: Shi, Steven ; Zhu, Yonghong ; Gao, Liming ; Kinney, Michael D ; Justen, Jordan L ; ler...@redhat.com; af...@apple.com; edk2-devel@lists.01.org Cc: Ard Biesheuvel Subject: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64 This series is not an attempt to steal Steven's thunder. I was merely inspired by some of the changes he is proposing for GCC 5 and Clang support, which I noticed would be applicable to older versions of GCC as well. The first patch fixes the issue that __builtin_unreachable() is not implemented by GCC 4.4 or earlier. Patch #2 is Steven's patch to switch to the flavor of VA_LIST that is explicitly modeled after the MS implementation. This by itself is an improvement, since the open coded implementation that performs arithmetic on the address of explicit arguments to obtain the variadic arguments is fragile and difficult to maintain, and should be best avoided. Patch #3 enabled -O2 optimization for X64/RELEASE all the way back to GCC44. I tested this change with both Ovmf and EmulatorPkg built in various ways and with various versions, with the caveat that I did not always use a matching binutils (i.e., of the same era). Since the issues this series deal with are all code generation issues, I think this is reasonable, but more testing would be appreciated. Patch #4 applies the visibility 'hidden' GCC pragma globally. Please refer to the commit log for the motivation. Patch #5 switches GCC/X64 to the PIC small code model, which results in smaller code. Ard Biesheuvel (4): MdePkg: avoid __builtin_unreachable() on GCC v4.4 BaseTools/tools_def: enable O2 optimization for GCC X64 builds MdePkg X64: force 'hidden' visibility when building with -fpic BaseTools/tools_def: switch GCC/X64 to the PIC small model Shi, Steven (1): MdePkg: Enable new MS VA intrinsics for GNUC x86 64bits build BaseTools/Conf/tools_def.template | 18 +++- MdePkg/Include/Base.h | 29 +++- MdePkg/Include/X64/ProcessorBind.h | 10 +++ 3 files changed, 49 insertions(+), 8 deletions(-) mode change 100644 => 100755 MdePkg/Include/Base.h -- 2.7.4 ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
On 14 July 2016 at 15:16, Ard Biesheuvel wrote: > This series is not an attempt to steal Steven's thunder. I was merely > inspired by some of the changes he is proposing for GCC 5 and Clang > support, which I noticed would be applicable to older versions of GCC > as well. > > The first patch fixes the issue that __builtin_unreachable() is not > implemented by GCC 4.4 or earlier. > > Patch #2 is Steven's patch to switch to the flavor of VA_LIST that is > explicitly modeled after the MS implementation. This by itself is an > improvement, since the open coded implementation that performs arithmetic > on the address of explicit arguments to obtain the variadic arguments is > fragile and difficult to maintain, and should be best avoided. > > Patch #3 enabled -O2 optimization for X64/RELEASE all the way back to GCC44. > I tested this change with both Ovmf and EmulatorPkg built in various ways > and with various versions, with the caveat that I did not always use a > matching > binutils (i.e., of the same era). Since the issues this series deal with are > all code generation issues, I think this is reasonable, but more testing would > be appreciated. > > Patch #4 applies the visibility 'hidden' GCC pragma globally. Please refer to > the commit log for the motivation. > > Patch #5 switches GCC/X64 to the PIC small code model, which results in > smaller > code. > Code can be found here https://git.linaro.org/people/ard.biesheuvel/uefi-next.git/shortlog/refs/heads/gcc-x64-opt ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
[edk2] [PATCH 0/5] MdePkg BaseTools: GCC optimization for X64
This series is not an attempt to steal Steven's thunder. I was merely inspired by some of the changes he is proposing for GCC 5 and Clang support, which I noticed would be applicable to older versions of GCC as well. The first patch fixes the issue that __builtin_unreachable() is not implemented by GCC 4.4 or earlier. Patch #2 is Steven's patch to switch to the flavor of VA_LIST that is explicitly modeled after the MS implementation. This by itself is an improvement, since the open coded implementation that performs arithmetic on the address of explicit arguments to obtain the variadic arguments is fragile and difficult to maintain, and should be best avoided. Patch #3 enabled -O2 optimization for X64/RELEASE all the way back to GCC44. I tested this change with both Ovmf and EmulatorPkg built in various ways and with various versions, with the caveat that I did not always use a matching binutils (i.e., of the same era). Since the issues this series deal with are all code generation issues, I think this is reasonable, but more testing would be appreciated. Patch #4 applies the visibility 'hidden' GCC pragma globally. Please refer to the commit log for the motivation. Patch #5 switches GCC/X64 to the PIC small code model, which results in smaller code. Ard Biesheuvel (4): MdePkg: avoid __builtin_unreachable() on GCC v4.4 BaseTools/tools_def: enable O2 optimization for GCC X64 builds MdePkg X64: force 'hidden' visibility when building with -fpic BaseTools/tools_def: switch GCC/X64 to the PIC small model Shi, Steven (1): MdePkg: Enable new MS VA intrinsics for GNUC x86 64bits build BaseTools/Conf/tools_def.template | 18 +++- MdePkg/Include/Base.h | 29 +++- MdePkg/Include/X64/ProcessorBind.h | 10 +++ 3 files changed, 49 insertions(+), 8 deletions(-) mode change 100644 => 100755 MdePkg/Include/Base.h -- 2.7.4 ___ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel