Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
On Sat, 24 Dec 2011, Alexander Best wrote: On Sat Dec 24 11, Bruce Evans wrote: On Sat, 24 Dec 2011, Alexander Best wrote: On Sat Dec 24 11, Bruce Evans wrote: On Fri, 23 Dec 2011, Alexander Best wrote: ... the gcc(1) man page states the following: This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2. the comment in sys/conf/kern.mk however sorta suggests that the default alignment of 4 bytes might improve performance. The default stack alignment is 16 bytes, which unimproves performance. maybe the part of the comment in sys/conf/kern.mk, which mentions that a stack alignment of 16 bytes might improve micro benchmark results should be removed. this would prevent people (like me) from thinking, using a stack alignment of 4 bytes is a compromise between size and efficiently. it isn't! currently a stack alignment of 16 bytes has no advantages towards one with 4 bytes on i386. I think the comment is clear enough. It it mentions all the tradeoffs. It is only slightly cryptic in saying that these are tradeoffs and that the configuration is our best guess at the best tradeoff -- it just says while for both. It goes without saying that we don't use our worst guess. Anyone wanting to change this should run benchmarks and beware that micro-benchmarks are especially useless. The changed comment is not so good since it no longer mentions micro-bencharmarks or says while. if micro benchmark results aren't of any use, why should the claim that the default stack alignment of 16 bytes might produce better outcome stay? Because: - the actual claim is the opposite of that (it is that the default 16-byte alignments is probably a loss overall) - the claim that the default 16-byte alignment may benefit micro-benchmarks is true, even without the weaselish miswording of might in it. There is always at least 1 micro-benchmark that will benefit from almost any change, and here we expect a benefit in many microbenchmarks that don't bust the caches. Except, 16-byte alignment isn't supported (*) in the kernel, so we actually expect a loss from many microbenchmarks that don't bust the caches. - the second claim warns inexperienced benchmarkers not to claim that the default is better because it is better in microbenchmarks. it doesn't seem as if anybody has micro benchmarked 16 bytes vs. 4 bytes stack alignment, until now. so the micro benchmark statement in the comment seems to be pure speculation. No, it is obviously true. even worse...it indicates that by removing the -mpreferred-stack-boundary=2 flag, one can gain a performance boost by sacrifying a few more bytes of kernel (and module) size. No, it is part of the sentence explaining why removing the -mpreferred-stack-boundary=2 flag will probably regain the overall loss that is avoided by using the flag. this suggests that the behavior -mpreferred-stack-boundary=2 vs. not specyfing it, losely equals the semantics of -Os vs. -O2. No, -Os guarantees slower execution by forcing optimization to prefer space savings over time savings in more ways. Except, -Os is completely broken in -current (in the kernel), and gives very large negative space savings (about 50%). It last worked with gcc-3. Its brokenness with gcc-4 is related to kern.pre.mk still specifying -finline-limit flags that are more suitable for gcc-3 (gcc has _many_ flags for giving more delicate control over inlining, and better defaults for them) and excessive inlining in gcc-4 given by -funit-at-a-time -finline-functions-called-once. These apparently cause gcc's inliner to go insane with -Os. When I tried to fix this by reducing inlining, I couldn't find any threshold that fixed -Os without breaking inlining of functions that are declared inline. (*) A primary part of the lack of support for 16-byte stack alignment in the kernel no special stack alignment for the main kernel entry point, namely syscall(). From i386/exception.s: % SUPERALIGN_TEXT % IDTVEC(int0x80_syscall) At this point, the stack has 5 words on it (it was 16-byte aligned before that). % pushl $2 /* sizeof int 0x80 */ % subl$4,%esp /* skip over tf_trapno */ % pushal % pushl %ds % pushl %es % pushl %fs % SET_KERNEL_SREGS % cld % FAKE_MCOUNT(TF_EIP(%esp)) % pushl %esp We push 14 more words. This gives perfect misaligment to the worst odd word boundary (perfect if only word boundaries are allowed). gcc wants the stack to be aligned to a 4*n word boundary before function calls, but here we have a 4*n+3 word boundary. (4*n+3 is worse than 4*n+1 since 2 more words instead of 4 will cross the next 16-byte boundary). % callsyscall Using the default
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
On Sat Dec 24 11, Bruce Evans wrote: On Fri, 23 Dec 2011, Alexander Best wrote: is -mpreferred-stack-boundary=2 really necessary for i386 builds any longer? i built GENERIC (including modules) with and without that flag. the results are: The same as it has always been. It avoids some bloat. 1654496 bytes with the flag set vs. 1654952 bytes with the flag unset I don't believe this. GENERIC is enormously bloated, so it has size more like 16MB than 1.6MB. Even a savings of 4K instead of 456 bytes i'm sorry. i used du(1) to get those numbers, so i believe those numbers represent the ammount of 512-byte blocks. if i'm correct GENERIC is even more bloated than you feared and almost reaches 1GB: 807,859375 megabytes with flag set vs. 808,0820313 megabytes without the flag set is hard to believe. I get a savings of 9K (text) in a 5MB kernel. Changing the default target arch from i386 to pentium-undocumented has reduced the text space savings a little, since the default for passing args is now to preallocate stack space for them and store to this, instead of to push them; this preallocation results in more functions needing to allocate some stack space explicitly, and when some is allocated explicitly, the text space cost for this doesn't depend on the size of the allocation. Anyway, the savings are mostly from from avoiding cache misses from sparse allocation on stacks. Also, FreeBSD-i386 hasn't been programmed to support aligned stacks: - KSTACK_PAGES on i386 is 2, while on amd64 it is 4. Using more stack might push something over the edge - not much care is taken to align the initial stack or to keep the stack aligned in calls from asm code. E.g., any alignment for mi_startup() (and thus proc0?) is accidental. This may result in perfect alignment or perfect misalignment. Hopefully, more care is taken with thread startup. For gcc, the alignment is done bogusly in main() in userland, but there is no main() in the kernel. The alignment doesn't matter much (provided the perfect misalignment is still to a multiple of 4), but when it matters, the random misalignment that results from not trying to do it at all is better than perfect misalignment from getting it wrong. With 4-byte alignment, the only cases that it helps are with 64-bit variables. the gcc(1) man page states the following: This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2. the comment in sys/conf/kern.mk however sorta suggests that the default alignment of 4 bytes might improve performance. The default stack alignment is 16 bytes, which unimproves performance. clang handles stack alignment correctly (only does it when it is needed) so it doesn't need a -mpreferred-stack-boundary option and doesn't always break without alignment in main(). Well, at least it used to, IIRC. Testing it now shows that it does the necessary andl of the stack pointer for __aligned(32), but for __aligned(16) it now assumes that the stack is aligned by the caller. So it now needs -mpreferred-stack-boundary=2, but doesn't have it. OTOH, clang doesn't do the andl in main() like gcc does (unless you put a dummy __aligned(32) there), but requires crt to pass an aligned stack. Bruce ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
On Sat Dec 24 11, Bruce Evans wrote: On Fri, 23 Dec 2011, Alexander Best wrote: is -mpreferred-stack-boundary=2 really necessary for i386 builds any longer? i built GENERIC (including modules) with and without that flag. the results are: The same as it has always been. It avoids some bloat. 1654496 bytes with the flag set vs. 1654952 bytes with the flag unset I don't believe this. GENERIC is enormously bloated, so it has size more like 16MB than 1.6MB. Even a savings of 4K instead of 456 bytes is hard to believe. I get a savings of 9K (text) in a 5MB kernel. Changing the default target arch from i386 to pentium-undocumented has reduced the text space savings a little, since the default for passing args is now to preallocate stack space for them and store to this, instead of to push them; this preallocation results in more functions needing to allocate some stack space explicitly, and when some is allocated explicitly, the text space cost for this doesn't depend on the size of the allocation. Anyway, the savings are mostly from from avoiding cache misses from sparse allocation on stacks. Also, FreeBSD-i386 hasn't been programmed to support aligned stacks: - KSTACK_PAGES on i386 is 2, while on amd64 it is 4. Using more stack might push something over the edge - not much care is taken to align the initial stack or to keep the stack aligned in calls from asm code. E.g., any alignment for mi_startup() (and thus proc0?) is accidental. This may result in perfect alignment or perfect misalignment. Hopefully, more care is taken with thread startup. For gcc, the alignment is done bogusly in main() in userland, but there is no main() in the kernel. The alignment doesn't matter much (provided the perfect misalignment is still to a multiple of 4), but when it matters, the random misalignment that results from not trying to do it at all is better than perfect misalignment from getting it wrong. With 4-byte alignment, the only cases that it helps are with 64-bit variables. the gcc(1) man page states the following: This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2. the comment in sys/conf/kern.mk however sorta suggests that the default alignment of 4 bytes might improve performance. The default stack alignment is 16 bytes, which unimproves performance. maybe the part of the comment in sys/conf/kern.mk, which mentions that a stack alignment of 16 bytes might improve micro benchmark results should be removed. this would prevent people (like me) from thinking, using a stack alignment of 4 bytes is a compromise between size and efficiently. it isn't! currently a stack alignment of 16 bytes has no advantages towards one with 4 bytes on i386. so specifying -mpreferred-stack-boundary=2 on i386 is absolutely mandatory. please see the attached patch, which also introduduces a line break in order to describe the stack alignment issue in a paragraph of its own. cheers. alex clang handles stack alignment correctly (only does it when it is needed) so it doesn't need a -mpreferred-stack-boundary option and doesn't always break without alignment in main(). Well, at least it used to, IIRC. Testing it now shows that it does the necessary andl of the stack pointer for __aligned(32), but for __aligned(16) it now assumes that the stack is aligned by the caller. So it now needs -mpreferred-stack-boundary=2, but doesn't have it. OTOH, clang doesn't do the andl in main() like gcc does (unless you put a dummy __aligned(32) there), but requires crt to pass an aligned stack. Bruce Index: /usr/src/sys/conf/kern.mk === --- /usr/src/sys/conf/kern.mk (revision 228845) +++ /usr/src/sys/conf/kern.mk (working copy) @@ -30,12 +30,12 @@ # On i386, do not align the stack to 16-byte boundaries. Otherwise GCC 2.95 # and above adds code to the entry and exit point of every function to align the # stack to 16-byte boundaries -- thus wasting approximately 12 bytes of stack -# per function call. While the 16-byte alignment may benefit micro benchmarks, -# it is probably an overall loss as it makes the code bigger (less efficient -# use of code cache tag lines) and uses more stack (less efficient use of data -# cache tag lines). Explicitly prohibit the use of FPU, SSE and other SIMD -# operations inside the kernel itself. These operations are exclusively -# reserved for user applications. +# per function call. This makes the code bigger (less efficient use of code +# cache tag lines) and uses more stack (less efficient use of data cache tag +# lines). +# Explicitly prohibit the use of FPU, SSE and other SIMD operations inside the +# kernel
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
Am 24.12.2011 00:56, schrieb Alexander Best: hi there, is -mpreferred-stack-boundary=2 really necessary for i386 builds any longer? i built GENERIC (including modules) with and without that flag. the results are: 1654496 bytes with the flag set vs. 1654952 bytes with the flag unset the gcc(1) man page states the following: This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2. What do the numbers above have to do with *stack* alignment or size (which is a run-time figure, and cannot be statically determined if any variable-depth recursion takes place). What are those 16... numbers, anyways? How did you obtain them? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
Am 24.12.2011 um 12:06 schrieb Bruce Evans: On Fri, 23 Dec 2011, Adrian Chadd wrote: Well, the whole kernel is bloated at the moment, sorry. I've been trying to build the _bare minimum_ required to bootstrap -HEAD on these embedded boards and I can't get the kernel down below 5 megabytes - ie, one with FFS (with options disabled), MIPS, INET (no INET6), net80211, ath (which admittedly is big, but I need it no matter what, right?) comes in at: -r-xr-xr-x 1 root wheel 5307021 Nov 29 19:14 kernel.LSSR71 And with INET6, on another board (and this includes MSDOS and the relevant geom modules): -r-xr-xr-x 1 root wheel 5916759 Nov 28 12:00 kernel.RSPRO .. honestly, that's what should be addressed. That's honestly a bit ridiculous. It's disgusting, but what problems does it cause apart from minor slowness from cache misses? The flash chip on these devices only has 8MB; some of the really cheap ones only have 4MB (yes MB, not GB). And many have only 32MB RAM. It would be nice to have space for actual applications :-) Stefan -- Stefan Bethke s...@lassitu.de Fon +49 151 14070811 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
On Sat Dec 24 11, Bruce Evans wrote: On Sat, 24 Dec 2011, Alexander Best wrote: On Sat Dec 24 11, Bruce Evans wrote: On Fri, 23 Dec 2011, Alexander Best wrote: is -mpreferred-stack-boundary=2 really necessary for i386 builds any longer? i built GENERIC (including modules) with and without that flag. the results are: The same as it has always been. It avoids some bloat. 1654496bytes with the flag set vs. 1654952bytes with the flag unset I don't believe this. GENERIC is enormously bloated, so it has size more like 16MB than 1.6MB. Even a savings of 4K instead of 456 bytes i'm sorry. i used du(1) to get those numbers, so i believe those numbers represent the ammount of 512-byte blocks. if i'm correct GENERIC is even more bloated than you feared and almost reaches 1GB: 807,859375 megabytes with flag set vs. 808,0820313 megabytes without the flag set That's certainly bloated. It counts all object files and modules, and probably everything is compiled with -g. I only counted kernel text size. yeah, but for demonstrating the different size between the build with -mpreferred-stack-boundary=2 set and -mpreferred-stack-boundary=2 unset, it doesn't really matter how big the directories are and if object files are included. the difference in size is 1 megabyte. so setting -mpreferred-stack-boundary=2 doesn't aid in reducing the kernel (or modules) size, but merely to improve improve stack performance/efficiency. cheers. alex Bruce ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
On Sat Dec 24 11, Bruce Evans wrote: On Fri, 23 Dec 2011, Adrian Chadd wrote: Well, the whole kernel is bloated at the moment, sorry. I've been trying to build the _bare minimum_ required to bootstrap -HEAD on these embedded boards and I can't get the kernel down below 5 megabytes - ie, one with FFS (with options disabled), MIPS, INET (no INET6), net80211, ath (which admittedly is big, but I need it no matter what, right?) comes in at: -r-xr-xr-x 1 root wheel 5307021 Nov 29 19:14 kernel.LSSR71 And with INET6, on another board (and this includes MSDOS and the relevant geom modules): -r-xr-xr-x 1 root wheel 5916759 Nov 28 12:00 kernel.RSPRO .. honestly, that's what should be addressed. That's honestly a bit ridiculous. It's disgusting, but what problems does it cause apart from minor slowness from cache misses? I used to monitor the size of a minimal i386 kernel: % machine i386 % cpu I686_CPU % ident MIN % options SCHED_4BSD In FreeBSD-5-CURRENT between 5.1R and 5.2R, this had size: text data bss dec hex filename 931241 86524 62356 1080121 107b39 /sysc/i386/compile/min/kernel A minimal kernel is not useful, but maybe you can add some i/o to it without bloating it too much. This almost builds in -current too. I had to add the following: - NO_MODULES to de-bloat the compile time - MK_CTF=no to build -current on FreeBSD.9. The kernel .mk files are still broken (depend on nonstandard/new features in sys.mk). strange. the build(7) man page claims that: WITH_CTF If defined, the build process will run the DTrace CTF conversion tools on built objects. Please note that this WITH_ option is handled differently than all other WITH_ options (there is no WITHOUT_CTF, or correspond- ing MK_CTF in the build system). ... so setting MK_CTF to anything shouldn't have (according to the man page). cheers. alex - comment out a line in if.c that refers to Vloif. if.c is standard but the loop device is optional. A few more changes to remove non-minimalities that are not defaults made little difference: % machine i386 % cpu I686_CPU % ident MIN % options SCHED_4BSD % % # XXX kill default misconfigurations. % makeoptions NO_MODULES=yes % makeoptions COPTFLAGS=-O -pipe % % # XXX from here on is to try to kill everything in DEFAULTS. % % # nodevice isa # needed for DELAY... % # nooptions ISAPNP # needed ... % % nodevicenpx % % nodevicemem % nodeviceio % % nodeviceuart_ns8250 % % nooptions GEOM_PART_BSD % nooptions GEOM_PART_EBR % nooptions GEOM_PART_EBR_COMPAT % nooptions GEOM_PART_MBR % % # nooptions NATIVE # needed ... % # nodevice atpic # needed ... % % nooptions NEW_PCIB % % nooptions VFS_ALLOW_NONMPSAFE text data bss dec hex filename 1663902110632 136892 1911426 1d2a82 kernel (This was about 100K larger with -O2 and all DEFAULTS). The bloat since FreeBSD-5 is only 70%. Here are some sizes for my standard kernel (on i386). The newer versions have about the same number of features since they don't support so many old isa devices or so many NICs: text data bss dec hex filename 1483269106972 172524 1762765 1ae5cd FreeBSD-3/kernel 1917408157472 194228 2269108 229fb4 FreeBSD-4/kernel 2604498198948 237720 3041166 2e678e FreeBSD-5.1.5/kernel 2833842206856 242936 3283634 321ab2 FreeBSD-5.1.5/kernel-with-acpi 2887573192456 288696 3368725 336715 FreeBSD-5.1.5/kernel with my changes, -O2 and usb added relative to the above 2582782195756 298936 3077474 2ef562 previous, with some excessive inlining avoided, and without -O2, and with ipfilter 1998276159436 137748 2295460 2306a4 kernel.4 a more up to date and less hacked on FreeBSD-4 4365549262656 209588 4837793 49d1a1 kernel.7 4406155266496 496532 5169183 4ee01f kernel.7.invariants 3953248242464 207252 4402964 432f14 kernel.7.noacpi 4418063268288 240084 4926435 4b2be3 kernel.7.smp various fairly stock FreeBSD-7R kernels 3669544262848 249712 4182104 3fd058 kernel.c 4174317258240 540144 4972701 4be09d kernel.c.invariants 3964455250656 249808 4464919 442117 kernel.c.noacpi 3213928
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
On Sat Dec 24 11, Bruce Evans wrote: On Sat, 24 Dec 2011, Alexander Best wrote: On Sat Dec 24 11, Bruce Evans wrote: On Fri, 23 Dec 2011, Alexander Best wrote: ... the gcc(1) man page states the following: This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2. the comment in sys/conf/kern.mk however sorta suggests that the default alignment of 4 bytes might improve performance. The default stack alignment is 16 bytes, which unimproves performance. maybe the part of the comment in sys/conf/kern.mk, which mentions that a stack alignment of 16 bytes might improve micro benchmark results should be removed. this would prevent people (like me) from thinking, using a stack alignment of 4 bytes is a compromise between size and efficiently. it isn't! currently a stack alignment of 16 bytes has no advantages towards one with 4 bytes on i386. I think the comment is clear enough. It it mentions all the tradeoffs. It is only slightly cryptic in saying that these are tradeoffs and that the configuration is our best guess at the best tradeoff -- it just says while for both. It goes without saying that we don't use our worst guess. Anyone wanting to change this should run benchmarks and beware that micro-benchmarks are especially useless. The changed comment is not so good since it no longer mentions micro-bencharmarks or says while. if micro benchmark results aren't of any use, why should the claim that the default stack alignment of 16 bytes might produce better outcome stay? it doesn't seem as if anybody has micro benchmarked 16 bytes vs. 4 bytes stack alignment, until now. so the micro benchmark statement in the comment seems to be pure speculation. even worse...it indicates that by removing the -mpreferred-stack-boundary=2 flag, one can gain a performance boost by sacrifying a few more bytes of kernel (and module) size. this suggests that the behavior -mpreferred-stack-boundary=2 vs. not specyfing it, losely equals the semantics of -Os vs. -O2. i don't see how a 4 byte stack alignment for the kernel has any tradeoffs against the default 16 byte alignment. so if there are no tradeoffs, the comment shouldn't imply that there are. cheers. alex so specifying -mpreferred-stack-boundary=2 on i386 is absolutely mandatory. Not mandatory; just an optimization. please see the attached patch, which also introduduces a line break in order to describe the stack alignment issue in a paragraph of its own. There should also be an empty line for a paragraph break. % +# Explicitly prohibit the use of FPU, SSE and other SIMD operations inside the % +# kernel itself. These operations are exclusively reserved for user % +# applications. This part was actually wronger: - these operations are not really reserved, but were just not supported in the kernel - they have been supported in the kernel for some time, although anything wanting to use the compiler to generate them would have to do something to kill the options added here. Kernel code using them must inform the kernel that it is doing so, using fpu_kern*(9undoc), and this is only valid in some contexts (more or less for kernel-only threads) so we still prevent compilers from using them routinely. The makefile is not the right place to describe any of this, Bruce ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
On Fri, 23 Dec 2011, Alexander Best wrote: is -mpreferred-stack-boundary=2 really necessary for i386 builds any longer? i built GENERIC (including modules) with and without that flag. the results are: The same as it has always been. It avoids some bloat. 1654496 bytes with the flag set vs. 1654952 bytes with the flag unset I don't believe this. GENERIC is enormously bloated, so it has size more like 16MB than 1.6MB. Even a savings of 4K instead of 456 bytes is hard to believe. I get a savings of 9K (text) in a 5MB kernel. Changing the default target arch from i386 to pentium-undocumented has reduced the text space savings a little, since the default for passing args is now to preallocate stack space for them and store to this, instead of to push them; this preallocation results in more functions needing to allocate some stack space explicitly, and when some is allocated explicitly, the text space cost for this doesn't depend on the size of the allocation. Anyway, the savings are mostly from from avoiding cache misses from sparse allocation on stacks. Also, FreeBSD-i386 hasn't been programmed to support aligned stacks: - KSTACK_PAGES on i386 is 2, while on amd64 it is 4. Using more stack might push something over the edge - not much care is taken to align the initial stack or to keep the stack aligned in calls from asm code. E.g., any alignment for mi_startup() (and thus proc0?) is accidental. This may result in perfect alignment or perfect misalignment. Hopefully, more care is taken with thread startup. For gcc, the alignment is done bogusly in main() in userland, but there is no main() in the kernel. The alignment doesn't matter much (provided the perfect misalignment is still to a multiple of 4), but when it matters, the random misalignment that results from not trying to do it at all is better than perfect misalignment from getting it wrong. With 4-byte alignment, the only cases that it helps are with 64-bit variables. the gcc(1) man page states the following: This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2. the comment in sys/conf/kern.mk however sorta suggests that the default alignment of 4 bytes might improve performance. The default stack alignment is 16 bytes, which unimproves performance. clang handles stack alignment correctly (only does it when it is needed) so it doesn't need a -mpreferred-stack-boundary option and doesn't always break without alignment in main(). Well, at least it used to, IIRC. Testing it now shows that it does the necessary andl of the stack pointer for __aligned(32), but for __aligned(16) it now assumes that the stack is aligned by the caller. So it now needs -mpreferred-stack-boundary=2, but doesn't have it. OTOH, clang doesn't do the andl in main() like gcc does (unless you put a dummy __aligned(32) there), but requires crt to pass an aligned stack. Bruce ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
On Fri, 23 Dec 2011, Adrian Chadd wrote: Well, the whole kernel is bloated at the moment, sorry. I've been trying to build the _bare minimum_ required to bootstrap -HEAD on these embedded boards and I can't get the kernel down below 5 megabytes - ie, one with FFS (with options disabled), MIPS, INET (no INET6), net80211, ath (which admittedly is big, but I need it no matter what, right?) comes in at: -r-xr-xr-x 1 root wheel 5307021 Nov 29 19:14 kernel.LSSR71 And with INET6, on another board (and this includes MSDOS and the relevant geom modules): -r-xr-xr-x 1 root wheel 5916759 Nov 28 12:00 kernel.RSPRO .. honestly, that's what should be addressed. That's honestly a bit ridiculous. It's disgusting, but what problems does it cause apart from minor slowness from cache misses? I used to monitor the size of a minimal i386 kernel: % machine i386 % cpu I686_CPU % ident MIN % options SCHED_4BSD In FreeBSD-5-CURRENT between 5.1R and 5.2R, this had size: textdata bss dec hex filename 931241 86524 62356 1080121 107b39 /sysc/i386/compile/min/kernel A minimal kernel is not useful, but maybe you can add some i/o to it without bloating it too much. This almost builds in -current too. I had to add the following: - NO_MODULES to de-bloat the compile time - MK_CTF=no to build -current on FreeBSD.9. The kernel .mk files are still broken (depend on nonstandard/new features in sys.mk). - comment out a line in if.c that refers to Vloif. if.c is standard but the loop device is optional. A few more changes to remove non-minimalities that are not defaults made little difference: % machine i386 % cpu I686_CPU % ident MIN % options SCHED_4BSD % % # XXX kill default misconfigurations. % makeoptions NO_MODULES=yes % makeoptions COPTFLAGS=-O -pipe % % # XXX from here on is to try to kill everything in DEFAULTS. % % # nodevice isa # needed for DELAY... % # nooptions ISAPNP # needed ... % % nodevice npx % % nodevice mem % nodevice io % % nodevice uart_ns8250 % % nooptions GEOM_PART_BSD % nooptions GEOM_PART_EBR % nooptions GEOM_PART_EBR_COMPAT % nooptions GEOM_PART_MBR % % # nooptions NATIVE # needed ... % # nodeviceatpic # needed ... % % nooptions NEW_PCIB % % nooptions VFS_ALLOW_NONMPSAFE textdata bss dec hex filename 1663902 110632 136892 1911426 1d2a82 kernel (This was about 100K larger with -O2 and all DEFAULTS). The bloat since FreeBSD-5 is only 70%. Here are some sizes for my standard kernel (on i386). The newer versions have about the same number of features since they don't support so many old isa devices or so many NICs: textdata bss dec hex filename 1483269 106972 172524 1762765 1ae5cd FreeBSD-3/kernel 1917408 157472 194228 2269108 229fb4 FreeBSD-4/kernel 2604498 198948 237720 3041166 2e678e FreeBSD-5.1.5/kernel 2833842 206856 242936 3283634 321ab2 FreeBSD-5.1.5/kernel-with-acpi 2887573 192456 288696 3368725 336715 FreeBSD-5.1.5/kernel with my changes, -O2 and usb added relative to the above 2582782 195756 298936 3077474 2ef562 previous, with some excessive inlining avoided, and without -O2, and with ipfilter 1998276 159436 137748 2295460 2306a4 kernel.4 a more up to date and less hacked on FreeBSD-4 4365549 262656 209588 4837793 49d1a1 kernel.7 4406155 266496 496532 5169183 4ee01f kernel.7.invariants 3953248 242464 207252 4402964 432f14 kernel.7.noacpi 4418063 268288 240084 4926435 4b2be3 kernel.7.smp various fairly stock FreeBSD-7R kernels 3669544 262848 249712 4182104 3fd058 kernel.c 4174317 258240 540144 4972701 4be09d kernel.c.invariants 3964455 250656 249808 4464919 442117 kernel.c.noacpi 3213928 240160 240596 3694684 38605c kernel.c.noacpi-ule 4285040 268288 286160 4839488 49d840 kernel.c.smp current before FreeBSD-8R not all built at the same time or with the same options. The 20% bloat between kernel.c.noacpi.ule and kernel.c.noacpi is mainly from not killing the default of -O2. 4742714 315008 401692 5459414 534dd6 kernel.8 4816900 319200 1813916 6950016 6a0c80 kernel.8.invariants 4490209 304832 395260 5190301 4f329d kernel.8.noacpi 4795475 323680 475420 5594575 555dcf kernel.8.smp
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
On Sat, 24 Dec 2011, Alexander Best wrote: On Sat Dec 24 11, Bruce Evans wrote: On Fri, 23 Dec 2011, Alexander Best wrote: is -mpreferred-stack-boundary=2 really necessary for i386 builds any longer? i built GENERIC (including modules) with and without that flag. the results are: The same as it has always been. It avoids some bloat. 1654496 bytes with the flag set vs. 1654952 bytes with the flag unset I don't believe this. GENERIC is enormously bloated, so it has size more like 16MB than 1.6MB. Even a savings of 4K instead of 456 bytes i'm sorry. i used du(1) to get those numbers, so i believe those numbers represent the ammount of 512-byte blocks. if i'm correct GENERIC is even more bloated than you feared and almost reaches 1GB: 807,859375 megabytes with flag set vs. 808,0820313 megabytes without the flag set That's certainly bloated. It counts all object files and modules, and probably everything is compiled with -g. I only counted kernel text size. Bruce ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
On Sat, 24 Dec 2011, Alexander Best wrote: On Sat Dec 24 11, Bruce Evans wrote: This almost builds in -current too. I had to add the following: - NO_MODULES to de-bloat the compile time - MK_CTF=no to build -current on FreeBSD.9. The kernel .mk files are still broken (depend on nonstandard/new features in sys.mk). strange. the build(7) man page claims that: WITH_CTF If defined, the build process will run the DTrace CTF conversion tools on built objects. Please note that this WITH_ option is handled differently than all other WITH_ options (there is no WITHOUT_CTF, or correspond- ing MK_CTF in the build system). ... so setting MK_CTF to anything shouldn't have (according to the man page). MK_CTF is an implementation detail. It is normally set in bsd.own.mk (not in sys.mk line I said -- this gives another, much larger bug (*)). But when usr/share/mk is old, it doesn't know anything about MK_CTF. (For example, in FreeBSD-9, sys.mk sets NO_CTF to 1 if WITH_CTF is not defined. This corresponds to bsd.own.mk in -current setting MK_CTF to no if WITH_CTF is not defined. Go back to an older version of FreeBSD and /usr/share/mk/* won't know anything about any CTF variable.) So when you try to build a current kernel under an old version of FreeBSD, MK_CTF is used uninitialized and the build fails. (Of course, you build kernels normally and don't use the bloated buildkernel method.) The bug is in the following files: kern.post.mk:.if ${MK_CTF} != no kern.pre.mk:.if ${MK_CTF} != no kmod.mk:.if defined(MK_CTF) ${MK_CTF} != no except for the last one where it has been fixed. (*) Well, not completely broken, but just annoyingly unportabile. Consider the following makefile: %%% foo: foo.c %%% Invoking this under FreeBSD-9 gives: %%% cc -O2 -pipe foo.c -o foo [ -z ctfconvert -o -n 1 ] || (echo ctfconvert -L VERSION foo ctfconvert -L VERSION foo) %%% This is the old ctf method. It is ugly but is fairly portable. Invoking this under FreeBSD-9 but with -mpath-to-current-mk-directory gives %%% cc -O2 -pipe foo.c -o foo ${CTFCONVERT_CMD} expands to empty string %%% This is because: - the rule in sys.mk says ${CTFCONVERT_CMD} - CTFCONVERT_CMD is normally defined in bsd.own.mk. But bsd.own.mk is only included by BSD makefiles. It is never included by portable makefiles. So ${CTFCONVERT_CMD} is used uninitialized. - for some reason, using variables uninitialized is not fatal in this context, although it is for the comparisons of ${MK_CTF} above. - ${CTFCONVERT_CMD} is replaced by the empty string. Old versions of make warn about the use of an empty string as a shell command. - the code that is supposed to prevent the previous warning is in bsd.own.mk, where it is not reached for portable makefiles. It is: % .if ${MK_CTF} != no % CTFCONVERT_CMD= ${CTFCONVERT} ${CTFFLAGS} ${.TARGET} This uses the full ctfconvert if WITH_CTF. % .elif ${MAKE_VERSION} = 520300 % CTFCONVERT_CMD= make(1) has been modified to not complain about the empty string. The version test detects which versions of make don't complain. % .else % CTFCONVERT_CMD= @: The default is to generate this non-empty string and an extra shell command to execute it, for old versions of make. % .endif But none of this works for portable makefiles, since it is not reached. Bruce ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
[rfc] removing -mpreferred-stack-boundary=2 flag for i386?
hi there, is -mpreferred-stack-boundary=2 really necessary for i386 builds any longer? i built GENERIC (including modules) with and without that flag. the results are: 1654496 bytes with the flag set vs. 1654952 bytes with the flag unset the gcc(1) man page states the following: This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2. the comment in sys/conf/kern.mk however sorta suggests that the default alignment of 4 bytes might improve performance. cheers. alex ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?
Well, the whole kernel is bloated at the moment, sorry. I've been trying to build the _bare minimum_ required to bootstrap -HEAD on these embedded boards and I can't get the kernel down below 5 megabytes - ie, one with FFS (with options disabled), MIPS, INET (no INET6), net80211, ath (which admittedly is big, but I need it no matter what, right?) comes in at: -r-xr-xr-x 1 root wheel 5307021 Nov 29 19:14 kernel.LSSR71 And with INET6, on another board (and this includes MSDOS and the relevant geom modules): -r-xr-xr-x 1 root wheel 5916759 Nov 28 12:00 kernel.RSPRO .. honestly, that's what should be addressed. That's honestly a bit ridiculous. 2c, Adrian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org