[PATCH] Account for prologue spills in reg_pressure scheduling
Hi, This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers. At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers. The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue. However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself. In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers. The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes. The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default). A "thanks" goes to Richard good discussion of the problem and suggestions on the approach to fix it. The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu. OK to apply? -- Maxim Kuvyrkov www.linaro.org 0001-sched_class_reg_num.ChangeLog Description: Binary data 0001-sched_class_reg_num.patch Description: Binary data
Re: ping x 7: [PATCH] [libgomp] make it possible to use OMP on both sides of a fork
Hi Jakub, Thanks for your feedback! See below. On Thu, Oct 16, 2014 at 4:52 PM, Jakub Jelinek wrote: > On Mon, Oct 13, 2014 at 10:16:19PM +0100, Nathaniel Smith wrote: >> Got total silence the last 4 times I posted this, and users have been >> bugging me about it offline, so trying again. >> >> This patch fixes a showstopper problem preventing the transparent use >> of OpenMP in scientific libraries, esp. with Python. Specifically, it >> is currently not possible to use GNU OpenMP -- even in a limited, >> temporary manner -- in any program that uses (or might use) fork() for >> parallelism, even if the fork() and the use of OpenMP occur at totally >> different times. This limitation is unique to GNU OpenMP -- every >> competing OpenMP implementation already contains something like this >> patch. While technically not fully POSIX-compliant (because POSIX >> gives much much weaker guarantees around fork() than any real Unix), >> the approach used in this patch (a) performs only POSIX-compliant >> operations when the host program is itself fully POSIX-compliant, and >> (b) actually works perfectly reliably in practice on all commonly used >> platforms I'm aware of. > > 1) gomp_we_are_forked in your patch will attempt to free the pool >of the thread that encounters it, which is racy; consider a program >after fork calling pthread_create several times, each thread >thusly created then ~ at the same time doing #pragma omp parallel >and the initial thread too. You really should clean up the pool >data structure only in the initial thread and nowhere else; >for native TLS (non-emulated, IE model) the best would be to have a flag >in the gomp_thread_pool structure, >struct gomp_thread *thr = gomp_thread (); >if (thr && thr->thread_pool) > thr->thread_pool->after_fork = true; >should in that case be safe in the atfork child handler. >For !HAVE_TLS or emulated TLS not sure if it is completely safe, >it would call pthread_getspecific. Perhaps just don't register >atfork handler on those targets at all? Good point. The updated patch below takes a slightly different approach. I moved we_are_forked to the per-thread struct, and then I moved the setting of it into the *parent* process's fork handlers -- the before-fork handler toggles it to true, then the child spawns off and inherits this setting, and then the parent after-fork handler toggles it back again. (Since it's per-thread, there's no race condition here.) This lets us remove the child after-fork handler entirely, and -- since the parent handlers aren't subject to any restrictions on what they can call -- it works on all platforms regardless of the TLS implementation. > 2) can you explain why are you removing the cleanups from >gomp_free_pool_helper ? They aren't removed, but rather moved from the helper function (which runs in the helper threads) into gomp_free_thread_pool (which runs in the main thread) -- which makes it easier to run the appropriate cleanups even in the case where the helper threads aren't running. (But see below -- we might prefer to drop this part of the patch entirely.) > 3) you can call pthread_atfork many times (once for each pthread >that creates a thread pool), that is undesirable, you want to do that >only if the initial thread creates thread pool Good point. I've moved the pthread_atfork call to initialize_team, which is an __attribute__((constructor)). I am a little uncertain whether this is the best approach, though, because of the comment in team_destructor about wanting to correctly handle dlopen/dlclose. One of pthread_atfork's many (many) limitations is that there's no way to unregister handlers, so if dlopen/dlclose is important (is it?) then we can't call pthread_atfork from initialize_team. If this is a problem, then we could delay the pthread_atfork until e.g. the first thread pool is spawned -- would this be preferred? > 4) the testcase is clearly not portable enough, should be probably limited >to *-*-linux* only, fork etc. will likely not work on many targets. I think it should work on pretty much any target that has fork(); we definitely care about having this functionality on e.g. OS X. I've added some genericish target specifications. > In any case, even with the patch, are you aware that you'll leak megabytes > of thread stacks etc.? Well, err, I wasn't, no :-). Thanks for pointing that out. To me this does clinch the argument that a better approach would be the one I suggested in https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00979.html i.e., of tracking whether any threadprivate variables were present, and if not then simply shutting down the thread pools before forking. But this would be a much more invasive change to gomp (I wouldn't know where to start). In the mean time, the current patch is still worthwhile. The cost is not that bad: I wouldn't think of it as "leaking" so much as "overhead of supporting OMP->fork->OMP".
Re: [GOOGLE] Increase max-early-inliner-iterations to 2 for profile-gen and use
On Sat, Oct 18, 2014 at 4:19 PM, Xinliang David Li wrote: > On Sat, Oct 18, 2014 at 3:27 PM, Jan Hubicka wrote: >>> The difference in instrumentation runtime is huge -- as topn profiler >>> is pretty expensive to run. >>> >>> With FDO, it is probably better to make early inlining more aggressive >>> in order to get more context sensitive profiling. >> >> I agree with that, I just would like to understand where increasing the >> iterations >> helps and if we can handle it without iterating (because Richi originally >> requested to >> drop the iteration for correcness issues) >> Do you have some examples? > > We can do FDO experiment by shutting down einline. (Note that > increasing iteration to 2 did not actually improve performance with > our benchmarks). Early inlining itself has large performance impact for FDO (the runtime of the profile-use build). With it disabled, the FDO performance drops by >2% on average. The degradation is seen across all benchmarks except for one. David > > David > >> Honza >>> >>> David >>> >>> On Sat, Oct 18, 2014 at 10:05 AM, Jan Hubicka wrote: >>> >> Increasing the number of early inliner iterations from 1 to 2 enables >>> >> more >>> >> indirect calls to be promoted/inlined before instrumentation. This in >>> >> turn >>> >> reduces the instrumentation overhead, particularly for more expensive >>> >> indirect >>> >> call topn profiling. >>> > >>> > How much difference you get here? One posibility would be also to run >>> > specialized >>> > ipa-cp before profile instrumentation. >>> > >>> > Honza >>> >> >>> >> Passes internal testing and regression tests. Ok for google/4_9? >>> >> >>> >> 2014-10-18 Teresa Johnson >>> >> >>> >> Google ref b/17934523 >>> >> * opts.c (finish_options): Increase max-early-inliner-iterations >>> >> to 2 >>> >> for profile-gen and profile-use builds. >>> >> >>> >> Index: opts.c >>> >> === >>> >> --- opts.c (revision 216286) >>> >> +++ opts.c (working copy) >>> >> @@ -870,6 +869,14 @@ finish_options (struct gcc_options *opts, struct g >>> >> opts->x_param_values, opts_set->x_param_values); >>> >> } >>> >> >>> >> + if (opts->x_profile_arc_flag >>> >> + || opts->x_flag_branch_probabilities) >>> >> +{ >>> >> + maybe_set_param_value >>> >> + (PARAM_EARLY_INLINER_MAX_ITERATIONS, 2, >>> >> +opts->x_param_values, opts_set->x_param_values); >>> >> +} >>> >> + >>> >>if (!(opts->x_flag_auto_profile >>> >> || (opts->x_profile_arc_flag || >>> >> opts->x_flag_branch_probabilities))) >>> >> { >>> >> >>> >> >>> >> -- >>> >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Disable -fdevirtualize by default
No, Google branch is not yet synced to 4.9 head. David On Sun, Oct 19, 2014 at 1:11 PM, Jan Hubicka wrote: >> It was in upstream 4.9 branch, but got reverted in r215061 to fix >> PR/61214 and PR/62224. > > Was your tests done with this change or without? > > Honza
Re: [fortran,patch] Handle infinities and NaNs in intrinsics code generation
> Looks good to me. Thanks for taking care of F2003's IEEE support. Committed as rev. 216443, thanks for the review. > PS: You might want to browse through the current (F2008 + corrigenda > + first F2015 additions) draft at > http://j3-fortran.org/doc/year/14/14-007r2.pdf > > See especially the list at the beginning under the item > "Changes to the intrinsic modules IEEE_ARITHMETIC, IEEE_EXCEPTIONS, and > IEEE_FEATURES for conformance with ISO/IEC/IEEE 60559:2011: [...]" > and then later in that file. Thanks for the link. I’d rather wait until later in the process, and let the existing F2003 / F2008 parts mature & be tested for now. FX
Re: [PATCH] Fix for PR63583
> Hello. > > I added missing gimple_asm_string comparison for a function with an asm > statement. > Bootstrap and regression tests still running, ready for trunk after it > finishes? OK. (I remember pointing this out at review :)) Honza > > Thank you, > Martin > gcc/ChangeLog: > > 2014-10-19 Martin Liska > > * ipa-icf-gimple.c (func_checker::compare_gimple_asm): > Gimple tempate string is compared. > > gcc/testsuite/ChangeLog: > > 2014-10-19 Martin Liska > > * gcc.dg/ipa/pr63595.c: New test. > diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c > index 792a3e4..1369b74 100644 > --- a/gcc/ipa-icf-gimple.c > +++ b/gcc/ipa-icf-gimple.c > @@ -863,6 +863,9 @@ func_checker::compare_gimple_asm (gimple g1, gimple g2) >if (gimple_asm_nclobbers (g1) != gimple_asm_nclobbers (g2)) > return false; > > + if (strcmp (gimple_asm_string (g1), gimple_asm_string (g2)) != 0) > +return return_false_with_msg ("ASM strings are different"); > + >for (unsigned i = 0; i < gimple_asm_ninputs (g1); i++) > { >tree input1 = gimple_asm_input_op (g1, i); > diff --git a/gcc/testsuite/gcc.dg/ipa/pr63595.c > b/gcc/testsuite/gcc.dg/ipa/pr63595.c > new file mode 100644 > index 000..9c9f3bf > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/ipa/pr63595.c > @@ -0,0 +1,26 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-ipa-icf-details" } */ > + > +static int f(int t) __attribute__((noinline)); > + > +static int g(int t) __attribute__((noinline)); > +static int g(int t) > +{ > +asm("addl %0, 1": "+r"(t)); > + return t; > +} > +static int f(int t) > +{ > +asm("addq %0, -1": "+r"(t)); > + return t; > +} > + > + > +int h(int t) > +{ > +return f(t) + g(t); > +} > + > +/* { dg-final { scan-ipa-dump "ASM strings are different" "icf" } } */ > +/* { dg-final { scan-ipa-dump "Equal symbols: 0" "icf" } } */ > +/* { dg-final { cleanup-ipa-dump "icf" } } */
[PATCH] Fix for PR63583
Hello. I added missing gimple_asm_string comparison for a function with an asm statement. Bootstrap and regression tests still running, ready for trunk after it finishes? Thank you, Martin gcc/ChangeLog: 2014-10-19 Martin Liska * ipa-icf-gimple.c (func_checker::compare_gimple_asm): Gimple tempate string is compared. gcc/testsuite/ChangeLog: 2014-10-19 Martin Liska * gcc.dg/ipa/pr63595.c: New test. diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c index 792a3e4..1369b74 100644 --- a/gcc/ipa-icf-gimple.c +++ b/gcc/ipa-icf-gimple.c @@ -863,6 +863,9 @@ func_checker::compare_gimple_asm (gimple g1, gimple g2) if (gimple_asm_nclobbers (g1) != gimple_asm_nclobbers (g2)) return false; + if (strcmp (gimple_asm_string (g1), gimple_asm_string (g2)) != 0) +return return_false_with_msg ("ASM strings are different"); + for (unsigned i = 0; i < gimple_asm_ninputs (g1); i++) { tree input1 = gimple_asm_input_op (g1, i); diff --git a/gcc/testsuite/gcc.dg/ipa/pr63595.c b/gcc/testsuite/gcc.dg/ipa/pr63595.c new file mode 100644 index 000..9c9f3bf --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr63595.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-ipa-icf-details" } */ + +static int f(int t) __attribute__((noinline)); + +static int g(int t) __attribute__((noinline)); +static int g(int t) +{ +asm("addl %0, 1": "+r"(t)); + return t; +} +static int f(int t) +{ +asm("addq %0, -1": "+r"(t)); + return t; +} + + +int h(int t) +{ +return f(t) + g(t); +} + +/* { dg-final { scan-ipa-dump "ASM strings are different" "icf" } } */ +/* { dg-final { scan-ipa-dump "Equal symbols: 0" "icf" } } */ +/* { dg-final { cleanup-ipa-dump "icf" } } */
Re: [AArch64] Add --enable-fix-cortex-a53-835769 configure-time option
On Friday 2014-10-10 11:53, Kyrill Tkachov wrote: This adds a new configure-time option --enable-fix-cortex-a53-835769 that will enable the Cortex-A53 erratum fix by default so you don't have to specify -mfix-cortex-a53-835769 every time. Documentation in install.texi is added. Thank you. Can you please also update gcc-5/changes.html on the web side of things? Gerald
Re: [wwwdocs] Add recent C++ changes to gcc-5/changes.html
On Friday 2014-10-17 13:34, Jonathan Wakely wrote: Index: htdocs/gcc-5/changes.html === @@ -128,12 +164,13 @@ Class std::experimental::any; Function template std::experimental::apply; + Variable templates for type traits; The trailing semi-colon feels a bit odd, doesn't it? (Also, when using a semi-colon, wouldn't the next word -- Function and Variable here -- be lower-case?) Gerald
Re: [GOOGLE] Disable -fdevirtualize by default
> It was in upstream 4.9 branch, but got reverted in r215061 to fix > PR/61214 and PR/62224. Was your tests done with this change or without? Honza
[PATCH, committed] Set SECTION_EXCLUDE flag for LTO sections.
Hello, This patch sets SECTION_EXCLUDE flag for LTO sections in lhd_begin_section and fixes the corresponding gcc_GAS_CHECK_FEATURE. Approved here: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01535.html Bootstrapped and regtested on i686-linux and x86_64-linux, committed to trunk. -- Ilya 2014-10-19 Ilya Verbin gcc/ * configure: Regenerate. * configure.ac: Move the test for section attribute specifier "e" in GAS out to all i[34567]86-*-* | x86_64-*-* targets and add --fatal-warnings. * langhooks.c (lhd_begin_section): Set SECTION_EXCLUDE flag. * varasm.c (default_elf_asm_named_section): Guard SECTION_EXCLUDE with ifdef HAVE_GAS_SECTION_EXCLUDE. --- diff --git a/gcc/configure b/gcc/configure index bd1215d..16f128f 100755 --- a/gcc/configure +++ b/gcc/configure @@ -24676,9 +24676,12 @@ $as_echo "$as_me: WARNING: LTO for $target requires binutils >= 2.20.1, but vers ;; esac fi - # Test if the assembler supports the section flag 'e' for specifying - # an excluded section. - { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .section with e" >&5 + ;; +esac + +# Test if the assembler supports the section flag 'e' for specifying +# an excluded section. +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .section with e" >&5 $as_echo_n "checking assembler for .section with e... " >&6; } if test "${gcc_cv_as_section_has_e+set}" = set; then : $as_echo_n "(cached) " >&6 @@ -24691,7 +24694,7 @@ fi elif test x$gcc_cv_as != x; then $as_echo '.section foo1,"e" .byte 0,0,0,0' > conftest.s -if { ac_try='$gcc_cv_as $gcc_cv_as_flags -o conftest.o conftest.s >&5' +if { ac_try='$gcc_cv_as $gcc_cv_as_flags --fatal-warnings -o conftest.o conftest.s >&5' { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5 (eval $ac_try) 2>&5 ac_status=$? @@ -24714,8 +24717,6 @@ cat >>confdefs.h <<_ACEOF #define HAVE_GAS_SECTION_EXCLUDE `if test $gcc_cv_as_section_has_e = yes; then echo 1; else echo 0; fi` _ACEOF - ;; -esac { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for filds and fists mnemonics" >&5 $as_echo_n "checking assembler for filds and fists mnemonics... " >&6; } diff --git a/gcc/configure.ac b/gcc/configure.ac index 8f7c814..35ce9ee 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -3799,18 +3799,19 @@ foo:nop ;; esac fi - # Test if the assembler supports the section flag 'e' for specifying - # an excluded section. - gcc_GAS_CHECK_FEATURE([.section with e], gcc_cv_as_section_has_e, - [2,22,51],, -[.section foo1,"e" -.byte 0,0,0,0]) - AC_DEFINE_UNQUOTED(HAVE_GAS_SECTION_EXCLUDE, - [`if test $gcc_cv_as_section_has_e = yes; then echo 1; else echo 0; fi`], - [Define if your assembler supports specifying the section flag e.]) ;; esac +# Test if the assembler supports the section flag 'e' for specifying +# an excluded section. +gcc_GAS_CHECK_FEATURE([.section with e], gcc_cv_as_section_has_e, + [2,22,51], [--fatal-warnings], +[.section foo1,"e" +.byte 0,0,0,0]) +AC_DEFINE_UNQUOTED(HAVE_GAS_SECTION_EXCLUDE, + [`if test $gcc_cv_as_section_has_e = yes; then echo 1; else echo 0; fi`], + [Define if your assembler supports specifying the section flag e.]) + gcc_GAS_CHECK_FEATURE([filds and fists mnemonics], gcc_cv_as_ix86_filds,,, [filds mem; fists mem],, diff --git a/gcc/langhooks.c b/gcc/langhooks.c index 7d4c294..4bdeaa0 100644 --- a/gcc/langhooks.c +++ b/gcc/langhooks.c @@ -660,7 +660,7 @@ lhd_begin_section (const char *name) saved_section = text_section; /* Create a new section and switch to it. */ - section = get_section (name, SECTION_DEBUG, NULL); + section = get_section (name, SECTION_DEBUG | SECTION_EXCLUDE, NULL); switch_to_section (section); } diff --git a/gcc/varasm.c b/gcc/varasm.c index abb743b..4ae9d58 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -6141,8 +6141,10 @@ default_elf_asm_named_section (const char *name, unsigned int flags, if (!(flags & SECTION_DEBUG)) *f++ = 'a'; +#if defined (HAVE_GAS_SECTION_EXCLUDE) && HAVE_GAS_SECTION_EXCLUDE == 1 if (flags & SECTION_EXCLUDE) *f++ = 'e'; +#endif if (flags & SECTION_WRITE) *f++ = 'w'; if (flags & SECTION_CODE) -- 1.7.1
[PATCH, rtl-optimization]: Remove const_alias_set
Hello! The fix that fixed scheduler issues with AND addresses (the fix prevented early exit for MEM_READONLY_P addresses when AND alignment addresses were involved) caused some fall-out for libgo testsuite. These tests triggered an assert in mems_in_disjoint_alias_sets_p, which checks for zero alias set when flag_strict_aliasing is false. We have had some off-list discussion with Ian Lance Taylor about this issue. The problem was, that Go dynamically switches off flag_strict_aliasing after compilation started and when "unsafe" package is imported (similar to when__attribute__ ((optimize ("-fno-strict-aliasing"))) is used in c). To mitigate this issue, the Go frontend called varasm_init_once again to recalculated (= cleared) const_alias_set in this case. As observed in [1], the fix for canon_true_depence [2] that introduced quick exit for a MEM_READONLY_P operands made const_alias_set redundant, it is no longer user for anything. The patch that fixed scheduling of AND operands removed early MEM_READONLY_P exit for memory operands with AND realignment, so operands could reach more complex code later in the function that was able to determine dependence of memory operands. This code includes the call to mems_in_disjoint_alias_sets_p, and the assert triggered again for some MEM_READONLY_P operands that have had non-zero alias set, set from the value, cached in const_alias_set from before flag_strict_aliasing flag was cleared. The proposed solution is to remove const_alias_set altogether. The MEM_READONLY_P successfully supersedes const_alias_set functionality, and this is also confirmed by the removal of the second varasm_init_once call in the Go frontend. In an off-list discussion, Ian agrees that attached patch should also fix the problem. 2014-10-19 Uros Bizjak * varasm.c (const_alias_set): Remove. (init_varasm_once): Remove initialization of const_alias_set. (build_constant_desc): Do not set alias set to const_alias_set. The patch was tested on alpha-linux-gnu [3], alphaev68-linux-gnu and x86_64-linux-gnu {,-m32} for all default languages plus Go and obj-c++. The patch fixes all mentioned libgo failures on alpha. OK for mainline? [1] https://gcc.gnu.org/ml/gcc-patches/2013-07/msg01033.html [2] https://gcc.gnu.org/ml/gcc-patches/2010-07/msg01758.html [3] https://gcc.gnu.org/ml/gcc-testresults/2014-10/msg02041.html Uros. Index: varasm.c === --- varasm.c(revision 216362) +++ varasm.c(working copy) @@ -98,11 +98,6 @@ tree last_assemble_variable_decl; bool first_function_block_is_cold; -/* We give all constants their own alias set. Perhaps redundant with - MEM_READONLY_P, but pre-dates it. */ - -static alias_set_type const_alias_set; - /* Whether we saw any functions with no_split_stack. */ static bool saw_no_split_stack; @@ -3231,7 +3226,6 @@ build_constant_desc (tree exp) rtl = gen_const_mem (TYPE_MODE (TREE_TYPE (exp)), symbol); set_mem_attributes (rtl, exp, 1); set_mem_alias_set (rtl, 0); - set_mem_alias_set (rtl, const_alias_set); /* We cannot share RTX'es in pool entries. Mark this piece of RTL as required for unsharing. */ @@ -5928,7 +5922,6 @@ init_varasm_once (void) object_block_htab = hash_table::create_ggc (31); const_desc_htab = hash_table::create_ggc (1009); - const_alias_set = new_alias_set (); shared_constant_pool = create_constant_pool (); #ifdef TEXT_SECTION_ASM_OP
Re: [PATCH][3/n] Merge from match-and-simplify, first patterns and questions
Hello, looking though the patterns on the branch (not specifically the ones attached here), I am surprised to see so few calls to has_single_use. In RTL-land, we don't even valueize if there are several uses, so the question doesn't occur. In generic, we assume everything is single use (CSE could later disagree, but that's the user's fault for writing his code that way). In tree-ssa-forwprop.c, helpers like get_prop_source_stmt do test for single use. Since has_single_use is a bit painful to use in .pd files (separate test for generic and constants), it might deserve another helper function, or a special syntax. -- Marc Glisse
Re: Is there an error in GCC Internals Manual, chapter 16.4?
Panu-Kristian Poiksalo writes: > Chapter 16.4, "RTL Template" in GCC Internals Manual, found at > https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#RTL-Template > has this to say about match_scratch: > > " > (match_scratch:m n constraint) > This expression is also a placeholder for operand number n and > indicates that operand must be a scratch or reg expression. > > When matching patterns, this is equivalent to > > (match_operand:m n "scratch_operand" pred) > " > I'm wondering what is "pred" in this context? I think it is supposed to be "constraint". This error is already present in the first revision of the file. I have installed this patch in trunk as obvious. Thanks for the report. Andreas. * doc/md.texi (RTL Template) [match_scratch]: Correct equivalent match_operand expression. Index: doc/md.texi === --- doc/md.texi (revision 216440) +++ doc/md.texi (working copy) @@ -297,7 +297,7 @@ When matching patterns, this is equivalent to @smallexample -(match_operand:@var{m} @var{n} "scratch_operand" @var{pred}) +(match_operand:@var{m} @var{n} "scratch_operand" @var{constraint}) @end smallexample but, when generating RTL, it produces a (@code{scratch}:@var{m}) -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [GOOGLE] Disable -fdevirtualize by default
It was in upstream 4.9 branch, but got reverted in r215061 to fix PR/61214 and PR/62224. David On Sun, Oct 19, 2014 at 1:21 AM, Jan Hubicka wrote: >> Hi Honza, >> >> As David says, we will do some more experiments with the change you >> suggest and speculative devirtualization, but we needed to make this >> change in part to get an internal release out. One of the issues was a >> recent change to cp/decl2.c to make virtual function decls needed >> under flag_devirtualize. > > I think that one is not in current 4.9 branch (only in mainline). > > Honza >> >> Thanks! >> Teresa >> >> On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka wrote: >> >> >> >> The virtual functions will be emitted in some modules, right? If they >> >> are hot, they will be included with the auxiliary modules. Note that >> >> LIPO indirect call profile will point to the comdat copy that is >> >> picked by the linker in the instrumentation build, so it guarantees to >> >> exist. >> > >> > If you have COMDAT virtual inline function that is used by module FOO and >> > LIPO's indirect inlining will work out that it goes to comdat copy of >> > module BAR >> > (that won the merging). It will make C++ parser to also parse BAR to get >> > the function, but callgarph builder will ignore it, too. >> > (this is new logic in analyze_function that with -fno-devirtualize will >> > just >> > prevent all virtual functions not directly reachable to appear in symbol >> > table) >> > >> > I am surprised you hit the size limits with 4.9 only - for quite some time >> > we keep all virtual functions in callgarph until inlining. In fact 4.9 is >> > first >> > that works harder to drop them early (because I hit the problem with LTO >> > where they artifically bloat the size of LTO object files) >> > >> > Honza >> >> >> >> David >> >> >> >> >> >> > >> >> > Honza >> >> >> >> >> >> David >> >> >> >> >> >> >> >> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka wrote: >> >> >> >> Disabling devirtualization reduces code size, both for >> >> >> >> instrumentation (because >> >> >> >> many more virtual functions are kept longer and therefore >> >> >> >> instrumented) and for >> >> >> >> normal optimization. >> >> >> > >> >> >> > OK, with profile instrumentation (that you seem to try to minimize) >> >> >> > i can see >> >> >> > how you get noticeably more counters because virtual functions are >> >> >> > kept longer. >> >> >> > (note that 4.9 is a lot more agressive on removing unreacable >> >> >> > virtual functions >> >> >> > than earlier compilers). >> >> >> > >> >> >> > Instead of disabling -fdevirtualize completely (that will get you >> >> >> > more indirect >> >> >> > calls and thus more topn profiling) you may consider just hacking >> >> >> > ipa.c:walk_polymorphic_call_targets to not make the possible targets >> >> >> > as >> >> >> > reachable. (see the conditional on before_inlining_p). >> >> >> > >> >> >> > Of course this will get you less devirtualization (but with LTO the >> >> >> > difference >> >> >> > should not be big - perhaps I could make switch for that for >> >> >> > mainline) and less >> >> >> > accurate profiles when you get speculative devirtualization via topn. >> >> >> > >> >> >> > I would be very interested to see how much difference this makes. >> >> >> > >> >> >> > Honza >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Disable -fdevirtualize by default
> > In opts.c I have: > > case OPT_fprofile_use: > > if (!opts_set->x_flag_branch_probabilities) > > ... > > /* Indirect call profiling should do all useful transformations > > speculative devirtualization does. */ > > if (!opts_set->x_flag_devirtualize_speculatively > > && opts->x_flag_value_profile_transformations) > > opts->x_flag_devirtualize_speculatively = false; > > > > so perhaps this hunk is somehow skipped with LIPO? > > The 1.68% size reduction I measured is with plain O2 compilation, not LIPO. OK, in that case I may look into trimming down the speculation with non-LTO compilation. The code makes assumptions about completeness of type hiearchy that without LTO are quite far reached. I tested it on firefox and there it seemed to do mostly good job, so I decided to try to enable it for non-LTO, too. It would be nice to understand what really happens in your case and if there are lot of wrong speculations or if the speculated calls simply happens to not be on hot paths in your benchmark. > > > > > Speculative devirtualization is somehwat less useful (may have more falce > > positives) without LTO depending on how your headers are constructed. > > It would be interesting to see if it does a lot of mistakes on your > > codebase. > > (this can be easily done by forcing it to run with profile feedback, too and > > it will tell you when its speculation differs from speculation already > > there). > >> > >> By the way, you mentioned 'hacking the > >> ipa.c:walk_polymorphic_call_targets to not make the possible targets > >> as > >> reachable' -- is that something worth doing in trunk? With that, we > >> can probably just turn off speculative devirtualization. > > > > Well, the check is there to enable inlining. Disabling it for > > -fprofile-generate will result in lost profile samples for virtual > > functions. > > Disabling it by default will prevent inlining of devirtualized calls making > > devirtualization not really useful. > > Perhaps with LIPO situation is bit different because you bring in the other > > module just to inline the call as you describe. > > What I meant is whether it is suitable for plain build ? I have not > looked at the details. Yes, dropping the reachability walk should just work, only result in less inlining. Honza > > > > > One thing I can imagine doing is to make inliner consider the reachable > > (in post-inlining sense, that is after removing extern inlines and virtual > > functions) calls with priority and account only those to unit growth model. > > This would make it more consistent over -fdevirtualize and more realistic > > about resulting code size. > > > > I sort of considered this option but did not have any good data suggesting > > I should implement it. > > > > In general it would be nice to understand this problem. Also I plan to do > > some retunning for 5.0 so it would be nice to know if you have other issues > > with 4.9? (I did not closely followed Google branch changes, so if you can > > point out those that are relevant for IPA tuning, I would be very interested > > to see what problems you hit). > > Ok I will collect a list of inlining related changes and let you know latter. > > thanks, > > David > > > > > Honza > >> > >> David > >> > >> > >> > >> > > >> > Honza > >> >> > >> >> David > >> >> > >> >> > >> >> > > >> >> > Honza > >> >> >> > >> >> >> David > >> >> >> > >> >> >> > >> >> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka wrote: > >> >> >> >> Disabling devirtualization reduces code size, both for > >> >> >> >> instrumentation (because > >> >> >> >> many more virtual functions are kept longer and therefore > >> >> >> >> instrumented) and for > >> >> >> >> normal optimization. > >> >> >> > > >> >> >> > OK, with profile instrumentation (that you seem to try to > >> >> >> > minimize) i can see > >> >> >> > how you get noticeably more counters because virtual functions are > >> >> >> > kept longer. > >> >> >> > (note that 4.9 is a lot more agressive on removing unreacable > >> >> >> > virtual functions > >> >> >> > than earlier compilers). > >> >> >> > > >> >> >> > Instead of disabling -fdevirtualize completely (that will get you > >> >> >> > more indirect > >> >> >> > calls and thus more topn profiling) you may consider just hacking > >> >> >> > ipa.c:walk_polymorphic_call_targets to not make the possible > >> >> >> > targets as > >> >> >> > reachable. (see the conditional on before_inlining_p). > >> >> >> > > >> >> >> > Of course this will get you less devirtualization (but with LTO > >> >> >> > the difference > >> >> >> > should not be big - perhaps I could make switch for that for > >> >> >> > mainline) and less > >> >> >> > accurate profiles when you get speculative devirtualization via > >> >> >> > topn. > >> >> >> > > >> >> >> > I would be very interested to see how much difference this makes. > >> >> >> > > >> >> >> > Honza
Re: [C PATCH] Another initialization fix (PR c/63567)
On Sun, 19 Oct 2014, Marek Polacek wrote: > It turned out that there is another spot where we need to allow > initializing objects with static storage duration with compound > literals even in C99 -- when the compound literal is inside the > initializer. Fixed in the same way as previously. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > 2014-10-18 Marek Polacek > > PR c/63567 > * c-typeck.c (output_init_element): Allow initializing objects with > static storage duration with compound literals even in C99 and add > pedwarn for it. > > * gcc.dg/pr63567-3.c: New test. > * gcc.dg/pr63567-4.c: New test. OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Fix PR preprocessor/42014
On Sat, 18 Oct 2014, Krzesimir Nowak wrote: > + pp_verbatim (context->printer, > +"%s from %r%s:%d%R", prefix, "locus", > + diagnostic_report_from (context, map, "In file included"); We don't want to split up diagnostic text like that, because for translation it may be necessary to translate the whole "In file included from" text together rather than expecting two fragments to go together in the same way they do in English. Now, right now this message isn't marked for translation anyway (an independent bug there's no need for you to fix), but still as a design principle things should be structured to avoid splitting up English fragments like that. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH doc] Explain options precedence and difference between -pedantic-errors and -Werror=pedantic
On Sat, 18 Oct 2014, Manuel López-Ibáñez wrote: > What about this version? > > Give an error whenever the @dfn{base standard} (see @option{-Wpedantic}) > requires a diagnostic, in cases where there is undefined behavior at > compile-time Only in *some* such cases of compile-time undefined behavior. -- Joseph S. Myers jos...@codesourcery.com
Re: [GOOGLE] Disable -fdevirtualize by default
On Sun, Oct 19, 2014 at 1:19 AM, Jan Hubicka wrote: >> > >> > I am surprised you hit the size limits with 4.9 only - for quite some time >> > we keep all virtual functions in callgarph until inlining. In fact 4.9 is >> > first >> > that works harder to drop them early (because I hit the problem with LTO >> > where they artifically bloat the size of LTO object files) >> >> We can dig it more to later understand why only 4.9 hits the problem. > > This would be very interesting, because 4.9 ought to be better here > (removing more virtuals early) than previous compilers. > There was number of changes in 4.9 that may affect this - some fixes at > C++ side giving middle end more inline candidates and also this change > https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00834.html > So perhaps most of your virtual functions are !COMDAT&&!EXTERNAL, but that > does not seem to make much sense to me either :( >> >> My size results with -fno-devirtualize-speculatively is out. It >> shrinks size by 1.68% -- slightly more than -fdevirtualize can do in >> O2 compile. > > Hmm, this is interesting, too. 1.68% is definitly a lot more than I would > expect or have I seen on other testcases. You can take a look at summaries in > -fdump-ipa-devirt pass. > > In opts.c I have: > case OPT_fprofile_use: > if (!opts_set->x_flag_branch_probabilities) > ... > /* Indirect call profiling should do all useful transformations > speculative devirtualization does. */ > if (!opts_set->x_flag_devirtualize_speculatively > && opts->x_flag_value_profile_transformations) > opts->x_flag_devirtualize_speculatively = false; > > so perhaps this hunk is somehow skipped with LIPO? The 1.68% size reduction I measured is with plain O2 compilation, not LIPO. > > Speculative devirtualization is somehwat less useful (may have more falce > positives) without LTO depending on how your headers are constructed. > It would be interesting to see if it does a lot of mistakes on your codebase. > (this can be easily done by forcing it to run with profile feedback, too and > it will tell you when its speculation differs from speculation already there). >> >> By the way, you mentioned 'hacking the >> ipa.c:walk_polymorphic_call_targets to not make the possible targets >> as >> reachable' -- is that something worth doing in trunk? With that, we >> can probably just turn off speculative devirtualization. > > Well, the check is there to enable inlining. Disabling it for > -fprofile-generate will result in lost profile samples for virtual functions. > Disabling it by default will prevent inlining of devirtualized calls making > devirtualization not really useful. > Perhaps with LIPO situation is bit different because you bring in the other > module just to inline the call as you describe. What I meant is whether it is suitable for plain build ? I have not looked at the details. > > One thing I can imagine doing is to make inliner consider the reachable > (in post-inlining sense, that is after removing extern inlines and virtual > functions) calls with priority and account only those to unit growth model. > This would make it more consistent over -fdevirtualize and more realistic > about resulting code size. > > I sort of considered this option but did not have any good data suggesting > I should implement it. > > In general it would be nice to understand this problem. Also I plan to do > some retunning for 5.0 so it would be nice to know if you have other issues > with 4.9? (I did not closely followed Google branch changes, so if you can > point out those that are relevant for IPA tuning, I would be very interested > to see what problems you hit). Ok I will collect a list of inlining related changes and let you know latter. thanks, David > > Honza >> >> David >> >> >> >> > >> > Honza >> >> >> >> David >> >> >> >> >> >> > >> >> > Honza >> >> >> >> >> >> David >> >> >> >> >> >> >> >> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka wrote: >> >> >> >> Disabling devirtualization reduces code size, both for >> >> >> >> instrumentation (because >> >> >> >> many more virtual functions are kept longer and therefore >> >> >> >> instrumented) and for >> >> >> >> normal optimization. >> >> >> > >> >> >> > OK, with profile instrumentation (that you seem to try to minimize) >> >> >> > i can see >> >> >> > how you get noticeably more counters because virtual functions are >> >> >> > kept longer. >> >> >> > (note that 4.9 is a lot more agressive on removing unreacable >> >> >> > virtual functions >> >> >> > than earlier compilers). >> >> >> > >> >> >> > Instead of disabling -fdevirtualize completely (that will get you >> >> >> > more indirect >> >> >> > calls and thus more topn profiling) you may consider just hacking >> >> >> > ipa.c:walk_polymorphic_call_targets to not make the possible targets >> >> >> > as >> >> >> > reachable. (see the conditional on before_inlining_p). >> >> >> > >> >> >> > Of course this will g
[C PATCH] Another initialization fix (PR c/63567)
It turned out that there is another spot where we need to allow initializing objects with static storage duration with compound literals even in C99 -- when the compound literal is inside the initializer. Fixed in the same way as previously. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2014-10-18 Marek Polacek PR c/63567 * c-typeck.c (output_init_element): Allow initializing objects with static storage duration with compound literals even in C99 and add pedwarn for it. * gcc.dg/pr63567-3.c: New test. * gcc.dg/pr63567-4.c: New test. diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c index 0dd3366..ee874da 100644 --- gcc/c/c-typeck.c +++ gcc/c/c-typeck.c @@ -8251,11 +8251,14 @@ output_init_element (location_t loc, tree value, tree origtype, value = array_to_pointer_conversion (input_location, value); if (TREE_CODE (value) == COMPOUND_LITERAL_EXPR - && require_constant_value && !flag_isoc99 && pending) + && require_constant_value && pending) { /* As an extension, allow initializing objects with static storage duration with compound literals (which are then treated just as the brace enclosed list they contain). */ + if (flag_isoc99) + pedwarn_init (loc, OPT_Wpedantic, "initializer element is not " + "constant"); tree decl = COMPOUND_LITERAL_EXPR_DECL (value); value = DECL_INITIAL (decl); } diff --git gcc/testsuite/gcc.dg/pr63567-3.c gcc/testsuite/gcc.dg/pr63567-3.c index e69de29..d626406 100644 --- gcc/testsuite/gcc.dg/pr63567-3.c +++ gcc/testsuite/gcc.dg/pr63567-3.c @@ -0,0 +1,7 @@ +/* PR c/63567 */ +/* { dg-do compile } */ +/* { dg-options "" } */ + +struct T { int i; }; +struct S { struct T t; }; +struct S s = { .t = { (int) { 1 } } }; diff --git gcc/testsuite/gcc.dg/pr63567-4.c gcc/testsuite/gcc.dg/pr63567-4.c index e69de29..0ca6c45 100644 --- gcc/testsuite/gcc.dg/pr63567-4.c +++ gcc/testsuite/gcc.dg/pr63567-4.c @@ -0,0 +1,7 @@ +/* PR c/63567 */ +/* { dg-do compile } */ +/* { dg-options "-Wpedantic" } */ + +struct T { int i; }; +struct S { struct T t; }; +struct S s = { .t = { (int) { 1 } } }; /* { dg-warning "initializer element is not constant" } */ Marek
Re: [committed] Remove hppa -mjump-in-delay option
On 19-Oct-14, at 10:13 AM, Segher Boessenkool wrote: You can set it to "Ignored" in pa.opt now? And remove the mask? Yes, will update. Dave -- John David Anglin dave.ang...@bell.net
Re: [committed] Remove hppa -mjump-in-delay option
On Sat, Oct 18, 2014 at 12:04:36PM -0400, John David Anglin wrote: > The attached change removes support for the hppa -mjump-in-delay > option. You can set it to "Ignored" in pa.opt now? And remove the mask? Segher
Re: -fuse-caller-save - Collect register usage information
On 17-10-14 21:24, Eric Botcazou wrote: Let's look at the effect of the option (after the recent fix for PR61605) on gcc.target/i386/fuse-calller-save.c: ... foo: .LFB1: .cfi_startproc - pushq %rbx - .cfi_def_cfa_offset 16 - .cfi_offset 3, -16 - movl%edi, %ebx callbar - addl%ebx, %eax - popq%rbx - .cfi_def_cfa_offset 8 + addl%edi, %eax ret .cfi_endproc .LFE1: ... So, the effect is: instead of using a callee-save register, we use a caller-save register to store a value that's live over a call, without needing to add a caller-save, as would be normally the case. If I see an option -foptimize-caller-saves, I'd expect the effect to be that without, there are some caller-saves and with, there are less. This is not the case in the diff above. To me it is, "movl %edi, %ebx"/"addl %ebx, %eax" is a caller-save/restore. I agree that it can look like that. But the insn 'movl %edi, %ebx' is generated by assign_parm_setup_reg at expand. AFAIU, the purpose is to decouple the value of the argument and its uses from the register it's passed in. The definition of -fcaller-saves below explains why insn 'movl %edi, %ebx' is not a caller-save: because it's not generated before a call, but rather at the start of a function. This seems to be confirmed by the fact that the insn 'movl %edi, %ebx' is still generated with -fno-caller-saves. I'm starting to lean towards -foptimize-call-clobbers or similar. Yes, that's also a good name and was my initial preference. But you pointed out the existing -fcaller-saves: `-fcaller-saves' Enable allocation of values to registers that are clobbered by function calls, by emitting extra instructions to save and restore the registers around such calls. Such allocation is done only when it seems to result in better code. so -foptimize-caller-saves can be understood as optimizing out the "extra instructions to save and restore the registers around such calls" and, thus, as having a direct relationship with -fcaller-saves. Agree. But, given the preference of a number of others for fipa-ra, could you live with that? Thanks, - Tom
Re: [Patch] Fix PR61889 for the w64-mingw32 case
> Honza, not sure if this patch is idea, but this will unblock mingw > build problems. Can this one get in? Hmm, the patch is somewhat ugly and I do not know why MingW32 defines mkdir macro and how. If Kai Tietz or other MingW32 maintainer is OK about it, the patch is OK. Honza > > thanks, > > David > > On Wed, Sep 24, 2014 at 8:22 AM, Rainer Emrich > wrote: > > The following patch fixes PR61889 for x86_64-w64-mingw32. Details can be > > found > > on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61889 > > > > The patch was bootstrapped on x86_64-w64-mingw32. > > > > If patch the patch is ok, Kai would you apply, please? > > > > Rainer > > > > 2014-09-24 Rainer Emrich > > > > PR gcov-profile/61889 > > * gcc/gcov-tool.c: Remove wrong #if !defined(_WIN32) > > * libgcc/libgcov-driver-system.c: undefine clashing macro for mkdir > > > > > > Index: gcc/gcov-tool.c > > === > > --- gcc/gcov-tool.c (Revision 215554) > > +++ gcc/gcov-tool.c (Arbeitskopie) > > @@ -89,11 +89,7 @@ gcov_output_files (const char *out, stru > >/* Try to make directory if it doesn't already exist. */ > >if (access (out, F_OK) == -1) > > { > > -#if !defined(_WIN32) > >if (mkdir (out, S_IRWXU | S_IRWXG | S_IRWXO) == -1 && errno != > > EEXIST) > > -#else > > - if (mkdir (out) == -1 && errno != EEXIST) > > -#endif > > fatal_error ("Cannot make directory %s", out); > > } else > >unlink_profile_dir (out); > > Index: libgcc/libgcov-driver-system.c > > === > > --- libgcc/libgcov-driver-system.c (Revision 215554) > > +++ libgcc/libgcov-driver-system.c (Arbeitskopie) > > @@ -66,6 +66,9 @@ create_file_directory (char *filename) > > #ifdef TARGET_POSIX_IO > > && mkdir (filename, 0755) == -1 > > #else > > +#ifdef mkdir > > +#undef mkdir > > +#endif > > && mkdir (filename) == -1 > > #endif > > /* The directory might have been made by another process. */
Re: [PATCH] AutoFDO patch for trunk
> >> +/* Member functions for string_table. */ > >> + > >> +string_table * > >> +string_table::create () > > > > Why this is not a constructor? > > We use static initializer because it's not suggested to put too much > logic in constructor. Why not? :) > >> +} > > > > The two hunks probably can be unified. > > Why get_index_by_decl does not already use dwarf name and why do you need > > to consider both? > > get_index_by_decl is actually a wrapper of get_index. It tolerates > error when assembler name is not emitted in the debug binary (mostly > for functions that are fully inlined). In this case, we will first try > to find the index of the assembler name, if not found, then bfd name. > If still not found, then we try to find name from its abstract > location. Yep, I am just somewhat concerned that you need to try different variations of the name. I would expect the assembler name (minus the random seed mangling) to be sufficient. > > > > >> + if (DECL_ABSTRACT_ORIGIN (decl)) > >> +return get_function_instance_by_decl (lineno, DECL_ABSTRACT_ORIGIN > >> (decl)); > > > > What really happens for ipa-cp function clones and for split functions? > > Their name suffix will be stripped before matching names. I see and profile merged, that seems resonable. > > I am not sure it is a win to have the things represented as VPT histograms > > when you shoot > > for quite special purpose problem. But lets deal with these incrementally, > > too. > > (in general I do not see why you try to share so much with the gcov based > > VTP - it seems > > easier to do the speculation by hand) > > Basically there are 2 problem: > > * before annotation. I agree this is a special purpose problem > * after annotation. This should be the same problem as VPT. Yep, I think adding VPT histograms to get devirtualization happen inter-procedurally with LTO is a good idea. You will need (incrementally I guess) to translate your indexes into the profile-id used or you will need to initialize profile-ids accordingly, so the IPA pass is able to identify the target cross-module. > >> +{ > >> + if (gcov_open (auto_profile_file, 1) == 0) > >> +error ("Cannot open profile file %s.", auto_profile_file); > >> + > >> + if (gcov_read_unsigned () != GCOV_DATA_MAGIC) > >> +error ("AutoFDO profile magic number does not mathch."); > >> + > >> + /* Skip the version number. */ > >> + gcov_read_unsigned (); > >> + > >> + /* Skip the empty integer. */ > >> + gcov_read_unsigned (); > > > > Perhaps we can just check the values? > > What do you mean? Check the value against 0? Yes, if you have version in there it seems to make sense to check it ;) I guess it is there so you can add extra stuff into the file format that will make it incompatible with current one, so probably we should reject all versions greater than 0. (especially because the tool is off-tree) > > > > Does this have chance to work inter-module? > > Yes, it works fine with LIPO. Important (for me) is to make it work with LTO, because LIPO is apparently not landing to 5.0. > > Yes, this could be done. > > After a second thought, I think we actually need to expose einliner > interface to autofdo (instead of doing vpt in early inliner). This is > because we need to mark icall as "promoted", so that later vpt passes > will not try to promote it again. Currently in AutoFDO, we use a > self-contained way (use a "set" to mark all promoted stmts). If we do > vpt in einline, then we will need some flag attached to gimple to > indicate whether it's already promoted. If we manage to remove the iteration loop that repeativly applies the inline plan, you only need way to mark edge as promoted. I guess you can just test the speculative flag on the edge to skip ones you dealt with earlier. Lets do that incrementally though. > + > +@item -fauto-profile > +@itemx -fauto-profile=@var{path} > +@opindex fauto-profile > +Enable sampling based feedback directed optimizations, and optimizations > +generally profitable only with profile feedback available. > + > +The following options are enabled: @code{-fbranch-probabilities}, > @code{-fvpt}, > +@code{-funroll-loops}, @code{-fpeel-loops}, @code{-ftracer}, > @code{-ftree-vectorize}, > +@code{ftree-loop-distribute-patterns} I wonder about loop peeling - we do not enale it by default because we do not believe our static branch predictor can well identify loops that have low expected trip count (except ones that can be fully inlined). I am not sure auto-FDO is much better here - how reliable the info about number of iteratoins is? But this is again something that can be handled separately. Documentation seems out of date though - it seems to enable also -finline, -fipa-cp and cloning, predictive commoning, unswitching, gcse-after-reload, You do not want to enable reorder-functions because autoFDO does not include the time profiler. Also I think speculative devirtualization should not be disabled. > + > +I
Re: [PATCH] PR36312
On 18 October 2014 14:43, Anthony Brandon wrote: > Never mind about functions.texi. I figured out how to do it. > Here is the new diff and changelog. > > libiberty/ChangeLog: > > 2014-10-18 Anthony Brandon > > * filename_cmp.c (filename_eq): No change. Unfortunately mklog is not 100% perfect (actually, it is 'diff -p' which is far from perfect). You need to revise that the ChageLog makes sense (or make mklog smarter). Thus, drop (filename_eq)... Also, the changelogs should say PR driver/36312 (https://gcc.gnu.org/codingconventions.html#ChangeLogs). I think you need to explain the difference between using fatal_error() and fatal_error(UNKNOWN_LOCATION). Yes, I should know because I wrote this part of the patch. But to be honest, I don't remember why I did this change. +This function first normalizes the file names so that different file names +pointing to the same underlying file are treated as being identical. I would suggest: "This function compares the canonical versions of the filenames as returned by @code{lrealpath()}, so that ..." I cannot approve the patch, but it looks fine to me. If you don't get a reply in a few days, you should ping the relevant maintainers: https://gcc.gnu.org/wiki/Community#ping Great first contribution! What are your plans next? Cheers, Manuel.
Re: [GOOGLE] Disable -fdevirtualize by default
> Hi Honza, > > As David says, we will do some more experiments with the change you > suggest and speculative devirtualization, but we needed to make this > change in part to get an internal release out. One of the issues was a > recent change to cp/decl2.c to make virtual function decls needed > under flag_devirtualize. I think that one is not in current 4.9 branch (only in mainline). Honza > > Thanks! > Teresa > > On Sat, Oct 18, 2014 at 4:22 PM, Jan Hubicka wrote: > >> > >> The virtual functions will be emitted in some modules, right? If they > >> are hot, they will be included with the auxiliary modules. Note that > >> LIPO indirect call profile will point to the comdat copy that is > >> picked by the linker in the instrumentation build, so it guarantees to > >> exist. > > > > If you have COMDAT virtual inline function that is used by module FOO and > > LIPO's indirect inlining will work out that it goes to comdat copy of > > module BAR > > (that won the merging). It will make C++ parser to also parse BAR to get > > the function, but callgarph builder will ignore it, too. > > (this is new logic in analyze_function that with -fno-devirtualize will just > > prevent all virtual functions not directly reachable to appear in symbol > > table) > > > > I am surprised you hit the size limits with 4.9 only - for quite some time > > we keep all virtual functions in callgarph until inlining. In fact 4.9 is > > first > > that works harder to drop them early (because I hit the problem with LTO > > where they artifically bloat the size of LTO object files) > > > > Honza > >> > >> David > >> > >> > >> > > >> > Honza > >> >> > >> >> David > >> >> > >> >> > >> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka wrote: > >> >> >> Disabling devirtualization reduces code size, both for > >> >> >> instrumentation (because > >> >> >> many more virtual functions are kept longer and therefore > >> >> >> instrumented) and for > >> >> >> normal optimization. > >> >> > > >> >> > OK, with profile instrumentation (that you seem to try to minimize) i > >> >> > can see > >> >> > how you get noticeably more counters because virtual functions are > >> >> > kept longer. > >> >> > (note that 4.9 is a lot more agressive on removing unreacable virtual > >> >> > functions > >> >> > than earlier compilers). > >> >> > > >> >> > Instead of disabling -fdevirtualize completely (that will get you > >> >> > more indirect > >> >> > calls and thus more topn profiling) you may consider just hacking > >> >> > ipa.c:walk_polymorphic_call_targets to not make the possible targets > >> >> > as > >> >> > reachable. (see the conditional on before_inlining_p). > >> >> > > >> >> > Of course this will get you less devirtualization (but with LTO the > >> >> > difference > >> >> > should not be big - perhaps I could make switch for that for > >> >> > mainline) and less > >> >> > accurate profiles when you get speculative devirtualization via topn. > >> >> > > >> >> > I would be very interested to see how much difference this makes. > >> >> > > >> >> > Honza > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [GOOGLE] Disable -fdevirtualize by default
> > > > I am surprised you hit the size limits with 4.9 only - for quite some time > > we keep all virtual functions in callgarph until inlining. In fact 4.9 is > > first > > that works harder to drop them early (because I hit the problem with LTO > > where they artifically bloat the size of LTO object files) > > We can dig it more to later understand why only 4.9 hits the problem. This would be very interesting, because 4.9 ought to be better here (removing more virtuals early) than previous compilers. There was number of changes in 4.9 that may affect this - some fixes at C++ side giving middle end more inline candidates and also this change https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00834.html So perhaps most of your virtual functions are !COMDAT&&!EXTERNAL, but that does not seem to make much sense to me either :( > > My size results with -fno-devirtualize-speculatively is out. It > shrinks size by 1.68% -- slightly more than -fdevirtualize can do in > O2 compile. Hmm, this is interesting, too. 1.68% is definitly a lot more than I would expect or have I seen on other testcases. You can take a look at summaries in -fdump-ipa-devirt pass. In opts.c I have: case OPT_fprofile_use: if (!opts_set->x_flag_branch_probabilities) ... /* Indirect call profiling should do all useful transformations speculative devirtualization does. */ if (!opts_set->x_flag_devirtualize_speculatively && opts->x_flag_value_profile_transformations) opts->x_flag_devirtualize_speculatively = false; so perhaps this hunk is somehow skipped with LIPO? Speculative devirtualization is somehwat less useful (may have more falce positives) without LTO depending on how your headers are constructed. It would be interesting to see if it does a lot of mistakes on your codebase. (this can be easily done by forcing it to run with profile feedback, too and it will tell you when its speculation differs from speculation already there). > > By the way, you mentioned 'hacking the > ipa.c:walk_polymorphic_call_targets to not make the possible targets > as > reachable' -- is that something worth doing in trunk? With that, we > can probably just turn off speculative devirtualization. Well, the check is there to enable inlining. Disabling it for -fprofile-generate will result in lost profile samples for virtual functions. Disabling it by default will prevent inlining of devirtualized calls making devirtualization not really useful. Perhaps with LIPO situation is bit different because you bring in the other module just to inline the call as you describe. One thing I can imagine doing is to make inliner consider the reachable (in post-inlining sense, that is after removing extern inlines and virtual functions) calls with priority and account only those to unit growth model. This would make it more consistent over -fdevirtualize and more realistic about resulting code size. I sort of considered this option but did not have any good data suggesting I should implement it. In general it would be nice to understand this problem. Also I plan to do some retunning for 5.0 so it would be nice to know if you have other issues with 4.9? (I did not closely followed Google branch changes, so if you can point out those that are relevant for IPA tuning, I would be very interested to see what problems you hit). Honza > > David > > > > > > > Honza > >> > >> David > >> > >> > >> > > >> > Honza > >> >> > >> >> David > >> >> > >> >> > >> >> On Sat, Oct 18, 2014 at 10:10 AM, Jan Hubicka wrote: > >> >> >> Disabling devirtualization reduces code size, both for > >> >> >> instrumentation (because > >> >> >> many more virtual functions are kept longer and therefore > >> >> >> instrumented) and for > >> >> >> normal optimization. > >> >> > > >> >> > OK, with profile instrumentation (that you seem to try to minimize) i > >> >> > can see > >> >> > how you get noticeably more counters because virtual functions are > >> >> > kept longer. > >> >> > (note that 4.9 is a lot more agressive on removing unreacable virtual > >> >> > functions > >> >> > than earlier compilers). > >> >> > > >> >> > Instead of disabling -fdevirtualize completely (that will get you > >> >> > more indirect > >> >> > calls and thus more topn profiling) you may consider just hacking > >> >> > ipa.c:walk_polymorphic_call_targets to not make the possible targets > >> >> > as > >> >> > reachable. (see the conditional on before_inlining_p). > >> >> > > >> >> > Of course this will get you less devirtualization (but with LTO the > >> >> > difference > >> >> > should not be big - perhaps I could make switch for that for > >> >> > mainline) and less > >> >> > accurate profiles when you get speculative devirtualization via topn. > >> >> > > >> >> > I would be very interested to see how much difference this makes. > >> >> > > >> >> > Honza
Re: [PATCH 5/5] New tests introduction
Martin Liška writes: > diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c > b/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c > new file mode 100644 > index 000..7358e43 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c > @@ -0,0 +1,27 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-ipa-icf" } */ > + > +#include > + > +__attribute__ ((noinline)) > +void foo() > +{ > + float x = 1.2345f; > + __m128 v =_mm_load1_ps(&x); > +} > + > +__attribute__ ((noinline)) > +void bar() > +{ > + float x = 1.2345f; > + __m128 v =_mm_load1_ps(&x); > +} > + > +int main() > +{ > + return 2; > +} > + > +/* { dg-final { scan-ipa-dump "Semantic equality hit:bar->foo" "icf" } } */ > +/* { dg-final { scan-ipa-dump "Equal symbols: 1" "icf" } } */ > +/* { dg-final { cleanup-ipa-dump "icf" } } */ FAIL: gcc.dg/ipa/ipa-icf-21.c (test for excess errors) Excess errors: /usr/local/gcc/gcc-20141019/gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c:4:23: fatal e\ rror: xmmintrin.h: No such file or directory compilation terminated. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: IPA ICF fallout: fix for two ipa-icf-*.C tests
Martin Liška writes: > diff --git a/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C > b/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C > index 9d17889..9434289 100644 > --- a/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C > +++ b/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C > @@ -44,5 +44,5 @@ int main() > } > > /* { dg-final { scan-ipa-dump "Varpool alias has been created" "icf" } } */ > -/* { dg-final { scan-ipa-dump "Equal symbols: 2" "icf" } } */ > +/* { dg-final { scan-ipa-dump "Equal symbols: 6" "icf" } } */ > /* { dg-final { cleanup-ipa-dump "icf" } } */ On ia64: FAIL: g++.dg/ipa/ipa-icf-4.C -std=gnu++98 scan-ipa-dump icf "Varpool alias has been created" FAIL: g++.dg/ipa/ipa-icf-4.C -std=gnu++98 scan-ipa-dump icf "Equal symbols: 6" $ grep -e Equal -e Varpool ipa-icf-4.C.045i.icf Equal symbols: 5 Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."