Re: New option -flimit-function-alignment
Hi Jeff, >> gcc/ >> * common.opt (flimit-function-alignment): New. >> * doc/invoke.texi (-flimit-function-alignment): Document. >> * emit-rtl.h (struct rtl_data): Add max_insn_address field. >> * final.c (shorten_branches): Set it. >> * varasm.c (assemble_start_function): Limit alignment if >> requested. >> >> gcc/testsuite/ >> * gcc.target/i386/align-limit.c: New test. > OK. Sorry for the long delay. unfortunately, this broke Solaris 12/SPARC bootstrap with /bin/as: /vol/gcc/src/hg/trunk/local/gcc/varasm.c: In function 'void assemble_start_function(tree, const char*)': /vol/gcc/src/hg/trunk/local/gcc/varasm.c:1794:11: error: unused variable 'align_log' [-Werror=unused-variable] int align_log = align_functions_log; ^ The following fixes it and allowed bootstrap to complete successfully. I'm going to commit it as obvious. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2016-11-24 Rainer Orth* varasm.c (assemble_start_function): Wrap align_log definition in ASM_OUTPUT_MAX_SKIP_ALIGN. diff --git a/gcc/varasm.c b/gcc/varasm.c --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -1791,7 +1791,9 @@ assemble_start_function (tree decl, cons && align_functions_log > align && optimize_function_for_speed_p (cfun)) { +#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN int align_log = align_functions_log; +#endif int max_skip = align_functions - 1; if (flag_limit_function_alignment && crtl->max_insn_address > 0 && max_skip >= crtl->max_insn_address)
Re: New option -flimit-function-alignment
On 10/14/2016 12:28 PM, Bernd Schmidt wrote: On 10/12/2016 09:27 PM, Denys Vlasenko wrote: Yes, something like "if max_skip >= func_size, temporarily lower max_skip to func_size-1" (because otherwise we can create padding bigger-or-equal to the entire function in size, which is stupid - it's better to just put the function in that space). This would be a nice. That would be this patch. Bootstrapped and tested on x86_64-linux, ok? Bernd limit-align-v2b.diff gcc/ * common.opt (flimit-function-alignment): New. * doc/invoke.texi (-flimit-function-alignment): Document. * emit-rtl.h (struct rtl_data): Add max_insn_address field. * final.c (shorten_branches): Set it. * varasm.c (assemble_start_function): Limit alignment if requested. gcc/testsuite/ * gcc.target/i386/align-limit.c: New test. OK. Sorry for the long delay. jeff
Re: New option -flimit-function-alignment
On 10/14/2016 08:28 PM, Bernd Schmidt wrote: On 10/12/2016 09:27 PM, Denys Vlasenko wrote: Yes, something like "if max_skip >= func_size, temporarily lower max_skip to func_size-1" (because otherwise we can create padding bigger-or-equal to the entire function in size, which is stupid - it's better to just put the function in that space). This would be a nice. That would be this patch. Bootstrapped and tested on x86_64-linux, ok? Ping. https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01187.html Bernd
Re: New option -flimit-function-alignment
On 10/12/2016 09:27 PM, Denys Vlasenko wrote: Yes, something like "if max_skip >= func_size, temporarily lower max_skip to func_size-1" (because otherwise we can create padding bigger-or-equal to the entire function in size, which is stupid - it's better to just put the function in that space). This would be a nice. That would be this patch. Bootstrapped and tested on x86_64-linux, ok? Bernd gcc/ * common.opt (flimit-function-alignment): New. * doc/invoke.texi (-flimit-function-alignment): Document. * emit-rtl.h (struct rtl_data): Add max_insn_address field. * final.c (shorten_branches): Set it. * varasm.c (assemble_start_function): Limit alignment if requested. gcc/testsuite/ * gcc.target/i386/align-limit.c: New test. Index: gcc/common.opt === --- gcc/common.opt (revision 240861) +++ gcc/common.opt (working copy) @@ -906,6 +906,9 @@ Align the start of functions. falign-functions= Common RejectNegative Joined UInteger Var(align_functions) +flimit-function-alignment +Common Report Var(flag_limit_function_alignment) Optimization Init(0) + falign-jumps Common Report Var(align_jumps,0) Optimization UInteger Align labels which are only reached by jumping. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 240861) +++ gcc/doc/invoke.texi (working copy) @@ -368,7 +368,7 @@ Objective-C and Objective-C++ Dialects}. -fno-ira-share-spill-slots @gol -fisolate-erroneous-paths-dereference -fisolate-erroneous-paths-attribute @gol -fivopts -fkeep-inline-functions -fkeep-static-functions @gol --fkeep-static-consts -flive-range-shrinkage @gol +-fkeep-static-consts -flimit-function-alignment -flive-range-shrinkage @gol -floop-block -floop-interchange -floop-strip-mine @gol -floop-unroll-and-jam -floop-nest-optimize @gol -floop-parallelize-all -flra-remat -flto -flto-compression-level @gol @@ -8262,6 +8262,12 @@ If @var{n} is not specified or is zero, Enabled at levels @option{-O2}, @option{-O3}. +@item -flimit-function-alignment +If this option is enabled, the compiler tries to avoid unnecessarily +overaligning functions. It attempts to instruct the assembler to align +by the amount specified by @option{-falign-functions}, but not to +skip more bytes than the size of the function. + @item -falign-labels @itemx -falign-labels=@var{n} @opindex falign-labels Index: gcc/emit-rtl.h === --- gcc/emit-rtl.h (revision 240861) +++ gcc/emit-rtl.h (working copy) @@ -284,6 +284,9 @@ struct GTY(()) rtl_data { to eliminable regs (like the frame pointer) are set if an asm sets them. */ HARD_REG_SET asm_clobbers; + + /* The highest address seen during shorten_branches. */ + int max_insn_address; }; #define return_label (crtl->x_return_label) Index: gcc/final.c === --- gcc/final.c (revision 240861) +++ gcc/final.c (working copy) @@ -1462,7 +1462,7 @@ shorten_branches (rtx_insn *first) if (!increasing) break; } - + crtl->max_insn_address = insn_current_address; free (varying_length); } Index: gcc/testsuite/gcc.target/i386/align-limit.c === --- gcc/testsuite/gcc.target/i386/align-limit.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/align-limit.c (working copy) @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -falign-functions=64 -flimit-function-alignment" } */ +/* { dg-final { scan-assembler ".p2align 6,,1" } } */ +/* { dg-final { scan-assembler-not ".p2align 6,,63" } } */ + +void +test_func (void) +{ +} Index: gcc/varasm.c === --- gcc/varasm.c (revision 240861) +++ gcc/varasm.c (working copy) @@ -1791,9 +1791,14 @@ assemble_start_function (tree decl, cons && align_functions_log > align && optimize_function_for_speed_p (cfun)) { + int align_log = align_functions_log; + int max_skip = align_functions - 1; + if (flag_limit_function_alignment && crtl->max_insn_address > 0 + && max_skip >= crtl->max_insn_address) + max_skip = crtl->max_insn_address - 1; + #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN - ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, - align_functions_log, align_functions - 1); + ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_log, max_skip); #else ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log); #endif
Re: New option -flimit-function-alignment
On 10/11/2016 10:14 PM, Bernd Schmidt wrote: On 10/11/2016 04:23 PM, Denys Vlasenko wrote: This is better than current behavior, but this is not what I want. 15-byte function does not need to be aligned to 16 bytes on a machine with 128-byte L1I cachelines. It needs to be aligned to 128 bytes if there are less than 15 bytes remaining; if there are MORE than 15 bytes remaining, why align at all? It's a waste of space. So that sounds like you'd want to modify the max_skip value based on the size of the function - which would be a fairly minor change. Yes, something like "if max_skip >= func_size, temporarily lower max_skip to func_size-1" (because otherwise we can create padding bigger-or-equal to the entire function in size, which is stupid - it's better to just put the function in that space). This would be a nice.
Re: New option -flimit-function-alignment
On 10/11/2016 04:23 PM, Denys Vlasenko wrote: This is better than current behavior, but this is not what I want. 15-byte function does not need to be aligned to 16 bytes on a machine with 128-byte L1I cachelines. It needs to be aligned to 128 bytes if there are less than 15 bytes remaining; if there are MORE than 15 bytes remaining, why align at all? It's a waste of space. So that sounds like you'd want to modify the max_skip value based on the size of the function - which would be a fairly minor change. We could call that -falign-functions-max-skip-size or something like that. Bernd
Re: New option -flimit-function-alignment
On 10/11/2016 04:11 PM, Bernd Schmidt wrote: Denys has submitted some patches to add more capabilities to the -falign-* options, but these still have some issues, and the original ideas seems to have been to allow for large alignments without over-aligning small functions. The following patch implements that idea by taking into account the function size as computed during shorten_branches: let's say we use "-falign-functions=128 -flimit-function-alignment", then a 15-byte function will be 16-byte aligned rather than 128-byte aligned. Bootstrapped and tested on x86_64-linux. Denys, does this solve your problem? This is better than current behavior, but this is not what I want. 15-byte function does not need to be aligned to 16 bytes on a machine with 128-byte L1I cachelines. It needs to be aligned to 128 bytes if there are less than 15 bytes remaining; if there are MORE than 15 bytes remaining, why align at all? It's a waste of space.
New option -flimit-function-alignment
Denys has submitted some patches to add more capabilities to the -falign-* options, but these still have some issues, and the original ideas seems to have been to allow for large alignments without over-aligning small functions. The following patch implements that idea by taking into account the function size as computed during shorten_branches: let's say we use "-falign-functions=128 -flimit-function-alignment", then a 15-byte function will be 16-byte aligned rather than 128-byte aligned. Bootstrapped and tested on x86_64-linux. Denys, does this solve your problem? Anyone else, ok to commit? Bernd gcc/ * common.opt (flimit-function-alignment): New. * doc/invoke.texi (-flimit-function-alignment): Document. * emit-rtl.h (struct rtl_data): Add max_insn_address field. * final.c (shorten_branches): Set it. * varasm.c (assemble_start_function): Limit alignment to it if requested. gcc/testsuite/ * gcc.target/i386/align-limit.c: New test. Index: gcc/common.opt === --- gcc/common.opt (revision 240861) +++ gcc/common.opt (working copy) @@ -906,6 +906,9 @@ Align the start of functions. falign-functions= Common RejectNegative Joined UInteger Var(align_functions) +flimit-function-alignment +Common Report Var(flag_limit_function_alignment) Optimization Init(0) + falign-jumps Common Report Var(align_jumps,0) Optimization UInteger Align labels which are only reached by jumping. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 240861) +++ gcc/doc/invoke.texi (working copy) @@ -368,7 +368,7 @@ Objective-C and Objective-C++ Dialects}. -fno-ira-share-spill-slots @gol -fisolate-erroneous-paths-dereference -fisolate-erroneous-paths-attribute @gol -fivopts -fkeep-inline-functions -fkeep-static-functions @gol --fkeep-static-consts -flive-range-shrinkage @gol +-fkeep-static-consts -flimit-function-alignment -flive-range-shrinkage @gol -floop-block -floop-interchange -floop-strip-mine @gol -floop-unroll-and-jam -floop-nest-optimize @gol -floop-parallelize-all -flra-remat -flto -flto-compression-level @gol @@ -8262,6 +8262,11 @@ If @var{n} is not specified or is zero, Enabled at levels @option{-O2}, @option{-O3}. +@item -flimit-function-alignment +When function alignment is requested by @option{-falign-functions}, use +a size estimate to prevent functions from being over-aligned. This +limits the alignment to be no larger than the function itself. + @item -falign-labels @itemx -falign-labels=@var{n} @opindex falign-labels Index: gcc/emit-rtl.h === --- gcc/emit-rtl.h (revision 240861) +++ gcc/emit-rtl.h (working copy) @@ -284,6 +284,9 @@ struct GTY(()) rtl_data { to eliminable regs (like the frame pointer) are set if an asm sets them. */ HARD_REG_SET asm_clobbers; + + /* The highest address seen during shorten_branches. */ + unsigned HOST_WIDE_INT max_insn_address; }; #define return_label (crtl->x_return_label) Index: gcc/final.c === --- gcc/final.c (revision 240861) +++ gcc/final.c (working copy) @@ -1462,7 +1462,7 @@ shorten_branches (rtx_insn *first) if (!increasing) break; } - + crtl->max_insn_address = insn_current_address; free (varying_length); } Index: gcc/varasm.c === --- gcc/varasm.c (revision 240861) +++ gcc/varasm.c (working copy) @@ -1791,9 +1791,19 @@ assemble_start_function (tree decl, cons && align_functions_log > align && optimize_function_for_speed_p (cfun)) { + int align_log = align_functions_log; + int max_skip = align_functions - 1; + if (flag_limit_function_alignment && crtl->max_insn_address > 0) + { + int size_log = ceil_log2 (crtl->max_insn_address); + if (size_log < align_log) + align_log = size_log; + int sz = 1 << size_log; + if (sz < max_skip) + max_skip = sz - 1; + } #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN - ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, - align_functions_log, align_functions - 1); + ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_log, max_skip); #else ASM_OUTPUT_ALIGN (asm_out_file, align_functions_log); #endif Index: gcc/testsuite/gcc.target/i386/align-limit.c === --- gcc/testsuite/gcc.target/i386/align-limit.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/align-limit.c (working copy) @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -falign-functions=64 -flimit-function-alignment" } */ +/* { dg-final { scan-assembler ".p2align 1,,1" } } */ +/* { dg-final { scan-assembler-not ".p2align 6,,63" } } */ + +void +test_func (void) +{ +}