Re: [RFC] Tweak gcc.c-torture/execute/pr39228.c
Hello! It looks that alpha has the similar issue: https://gcc.gnu.org/ml/gcc-testresults/2014-08/msg02660.html alpha and sh redefine dg-options to -mieee in the test case instead of the default dg-options -w and get the above warning. The patch below tweaks the test to fix it. Perhaps the first two lines are enough to avoid the error but avoiding the root cause of warnings would be better. Tested on i686-linux and sh4-linux. --- ORIG/trunk/gcc/testsuite/gcc.c-torture/execute/pr39228.c 2014-08-26 09:26:20.0 +0900 +++ trunk/gcc/testsuite/gcc.c-torture/execute/pr39228.c 2014-09-03 07:42:30.085524983 +0900 @@ -1,23 +1,23 @@ -/* { dg-options -mieee { target sh*-*-* alpha*-*-* } } */ +/* { dg-options -w -mieee { target sh*-*-* alpha*-*-* } } */ /* { dg-skip-if No Inf/NaN support { spu-*-* } * } */ Please use /* { dg-add-options ieee } */ directive here. There is another one possible in pr44683.c. Uros.
Re: [RFC] Tweak gcc.c-torture/execute/pr39228.c
Uros Bizjak ubiz...@gmail.com wrote: -/* { dg-options -mieee { target sh*-*-* alpha*-*-* } } */ +/* { dg-options -w -mieee { target sh*-*-* alpha*-*-* } } */ /* { dg-skip-if No Inf/NaN support { spu-*-* } * } */ Please use /* { dg-add-options ieee } */ directive here. There is another one possible in pr44683.c. Thanks for your suggestion. Here is a take 3 patch. I'll take a look at pr44683.c. Regards, kaz -- * gcc.c-torture/execute/pr39228.c: Use dg-add-options instead of dg-options. Add inline keyword to test functions. --- ORIG/trunk/gcc/testsuite/gcc.c-torture/execute/pr39228.c2014-08-26 09:26:20.0 +0900 +++ trunk/gcc/testsuite/gcc.c-torture/execute/pr39228.c 2014-09-03 15:23:15.219917354 +0900 @@ -1,23 +1,23 @@ -/* { dg-options -mieee { target sh*-*-* alpha*-*-* } } */ +/* { dg-add-options ieee } */ /* { dg-skip-if No Inf/NaN support { spu-*-* } * } */ extern void abort (void); -static int __attribute__((always_inline)) testf (float b) +static inline int __attribute__((always_inline)) testf (float b) { float c = 1.01f * b; return __builtin_isinff (c); } -static int __attribute__((always_inline)) test (double b) +static inline int __attribute__((always_inline)) test (double b) { double c = 1.01 * b; return __builtin_isinf (c); } -static int __attribute__((always_inline)) testl (long double b) +static inline int __attribute__((always_inline)) testl (long double b) { long double c = 1.01L * b;
Re: [Patch, Fortran] PRs 61881/61888 - Fix issues with SIZEOF, CLASS(*) and assumed-rank
Thomas Schwinge wrote: On Sat, 26 Jul 2014 01:47:02 +0200, Tobias Burnus bur...@net-b.de wrote: [...] 2014-07-26 Tobias Burnus bur...@net-b.de * gfortran.dg/sizeof_4.f90: New. [...] I noticed that the sizeof_4.f90 test case has not been checked in, probably just forgot to svn add the file? I have now committed it as Rev. 214843 - it was lying indeed around in my commit-tree, lacking the svn add. Tobias
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
On 2 September 2014 22:40:50 CEST, Richard Henderson r...@redhat.com wrote: On 06/20/2014 05:17 PM, Sriraman Tallam wrote: Index: config/i386/i386.c === --- config/i386/i386.c (revision 211826) +++ config/i386/i386.c (working copy) @@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp) return true; } else if (!SYMBOL_REF_FAR_ADDR_P (op0) -SYMBOL_REF_LOCAL_P (op0) +(SYMBOL_REF_LOCAL_P (op0) + || (TARGET_64BIT ix86_copyrelocs flag_pie +!SYMBOL_REF_FUNCTION_P (op0))) ix86_cmodel != CM_LARGE_PIC) return true; break; This is the wrong place to patch. You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified TARGET_BINDS_LOCAL_P. Note in particular that I believe that you are doing the wrong thing with weak and COMMON symbols, in that you probably ought not force a copy reloc there. Note the complexity of default_binds_local_p_1, and the fact that all you really want to modify is /* If PIC, then assume that any global name can be overridden by symbols resolved from other modules. */ else if (shlib) local_p = false; near the bottom of that function. Reminds me of PR32219 https://gcc.gnu.org/ml/gcc-patches/2010-03/msg00665.html but admittedly that is not PIE imposed but still fails on current trunk..
Re: [PATCH][ARM][2/2] Vectorise lroundf, lfloorf, lceilf using the new ARMv8-A vcvt* instructions
Hi Kyrill, I've noticed that the tests you added with this patch fail (scan-tree-dump-times) for the armeb-none-linux-gnueabihf target. Not sure if you want to fix your patch or the tests? Christophe. On 2 September 2014 17:48, Ramana Radhakrishnan ramana.radhakrish...@arm.com wrote: On 02/09/14 16:34, Kyrill Tkachov wrote: Hi all, In continuation of patch [1/2]... We can use the vector forms of the vcvt{a,p,m} instructions to vectorise the l{round, ceil, floor}f functions. Builtins are added and the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION implementation is updated to wire up the vectorised forms of these functions to the midend. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Ok - thanks. Ramana Thanks, Kyrill 2014-09-02 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/62275 * config/arm/neon.md (neon_vcvtNEON_VCVT:nvrint_variantsu_optabVCVTF:mode v_cmp_result): New pattern. * config/arm/iterators.md (NEON_VCVT): New int iterator. * config/arm/arm_neon_builtins.def (vcvtav2sf, vcvtav4sf, vcvtauv2sf, vcvtauv4sf, vcvtpv2sf, vcvtpv4sf, vcvtpuv2sf, vcvtpuv4sf, vcvtmv2sf, vcvtmv4sf, vcvtmuv2sf, vcvtmuv4sf): New builtin definitions. * config/arm/arm.c (arm_builtin_vectorized_function): Handle BUILT_IN_LROUNDF, BUILT_IN_LFLOORF, BUILT_IN_LCEILF. 2014-09-02 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/62275 * gcc.target/arm/vect-lceilf_1.c: New test. * gcc.target/arm/vect-lfloorf_1.c: Likewise. * gcc.target/arm/vect-lroundf_1.c: Likewise.
Re: [PATCH][ARM/AArch64] Add scheduling info for ARMv8-A FPU new instructions in Cortex-A53
On 22 August 2014 11:36, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, The Cortex-A53 scheduler description is missing rules for insn types used by instructions such as vrint*, vmaxnm, vminnm causing them to be assigned to the nothing unit. This patch causes such instructions to be treated the same way as other simple FPU instructions. Bootstrapped and tested on aarch64-linux and tested on arm-none-eabi as well. Ok for trunk? Looks good to me. /Marcus
[PATCH, testsuite]: Compile gcc.dg/20111227-?.c for x86 targets only.
Hello! These testcases were intended to be compiled on x86 targets only [1]. 2014-09-03 Uros Bizjak ubiz...@gmail.com * gcc.dg/20111227-2.c: Compile only for x86 targets. * gcc.dg/20111227-3.c: Ditto. Tested on x86_64-linux-gnu {-m32} and committed to mainline SVN. [1] https://gcc.gnu.org/ml/gcc-patches/2012-10/msg02329.html Uros. Index: gcc.dg/20111227-2.c === --- gcc.dg/20111227-2.c (revision 214845) +++ gcc.dg/20111227-2.c (working copy) @@ -1,6 +1,8 @@ /* Testcase derived from 20111227-1.c to ensure that REE is combining redundant zero extends with zero extend to wider mode. */ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ /* { dg-options -fdump-rtl-ree -O -free } */ + extern void abort (void); unsigned short s; Index: gcc.dg/20111227-3.c === --- gcc.dg/20111227-3.c (revision 214845) +++ gcc.dg/20111227-3.c (working copy) @@ -1,5 +1,6 @@ /* Testcase derived from 20111227-1.c to ensure that REE is combining redundant sign extends with sign extend to wider mode. */ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ /* { dg-options -fdump-rtl-ree -O -free } */ extern void abort (void);
Re: fix gfcov regression
I've committed the patch now. It (r214840) breaks bootstrap on darwin: ... /opt/gcc/build_w/./gcc/xgcc -B/opt/gcc/build_w/./gcc/ -B/opt/gcc/gcc4.10w/x86_64-apple-darwin13.3.0/bin/ -B/opt/gcc/gcc4.10w/x86_64-apple-darwin13.3.0/lib/ -isystem /opt/gcc/gcc4.10w/x86_64-apple-darwin13.3.0/include -isystem /opt/gcc/gcc4.10w/x86_64-apple-darwin13.3.0/sys-include-g -O2 -m32 -O2 -g -O2 -DIN_GCC-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -pipe -fno-common -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -pipe -fno-common -I. -I. -I../../.././gcc -I../../../../work/libgcc -I../../../../work/libgcc/. -I../../../../work/libgcc/../gcc -I../../../../work/libgcc/../include -DHAVE_CC_TLS -DUSE_EMUTLS -o _gcov_reset.o -MT _gcov_reset.o -MD -MP -MF _gcov_reset.dep -DL_gcov_reset -c ../../../../work/libgcc/libgcov-interface.c ../../../../work/libgcc/libgcov-interface.c:136:33: error: only weak aliases are supported in this configuration STRONG_ALIAS (__gcov_reset_int, __gcov_reset); ^ ../../../../work/libgcc/libgcov-interface.c:49:25: note: in definition of macro 'STRONG_ALIAS' extern __typeof (src) dst __attribute__((alias (#src))) ^ make[5]: *** [_gcov_reset.o] Error 1 ... TIA Dominique
[PATCH] Enhance array types debug info. for Ada
Hi! I'm currently working on improving the debug information output for GNAT (the Ada frontend in GCC), which currently uses non-standard DWARF to describe complex types. Lately, I focused on debug information for arrays and the attached inter-dependent patches are an attempt to do so: - they enhance the existing array_descr_info language hook; - they adjust the Fortran front-end accordingly (it's the only array_descr_info user currently); - they make the Ada front-end use this hook. Here are more details about what motivated each patch: 1. This first patch enhances the array_descr_info language hook so that front-end can pass more information about array types to the DWARF backend: - Array ordering (column/row major) so that the information that Fortran arrays are column major ordered comes from the Fortran front-end and so that later GNAT can decide itself for each array type (they can be both column and row major ordered). - Bounds type: Ada arrays can be indexed by integers but also characters, enumerated types, etc. However it seems that the middle-end makes the assumption that every array index is sizetype so this information is needed here for accurate debug info. It also makes the language hook generate GNAT descriptive type attributes for array types, just as the regular array types handling in dwarf2out.c does. Finally, it makes the DWARF back-end initialize the array_descr_info structure so that new fields can be added to it later without affecting existing front-ends that use this hook. 2. Currently, this language hook is enabled only when (dwarf_version = 3 || !dwarf_strict). The hook generates information that is mostly valid for strict DWARFv2, though. The second patch enables this language hook every time and instead prevents the emission for some attributes when needed. 3. This one enables the array_descr_info hook in GNAT. 4. This one enhances debug helpers in dwarf2out.c to ease location descriptions (DWARF expressions) bugs investigation. 5. The array_descr_info hook has its own circuitry in dwarf2out.c to generate location description: add_descr_info_field. It is a duplicate of loc_list_from_tree and less powerful except that it handles self-referencial attributes. This final patch is an attempt to merge these two circuitries so that this hook can generate more complex DWARF expressions. It also adjusts the Fortran front-end accordingly. These patches were tested on x86_64-pc-linux-gnu. They trigger no regression in the GCC DejaGNU testsuite nor in the GDB one (they fix some failures however). Ok for trunk? Thank you very much for reading until this point and thank you in advance for your review! ;-) -- Pierre-Marie de Rodat From 15f8a12782cfcb084205c751d8c37c7a360bea2f Mon Sep 17 00:00:00 2001 From: Pierre-Marie de Rodat dero...@adacore.com Date: Wed, 3 Sep 2014 09:46:17 +0200 Subject: [PATCH 1/5] Complete information generated through the array descr language hook gcc/ * dwarf2out.h (enum array_descr_ordering): New. (array_descr_dimen): Add a bounds_type structure field. (struct array_descr_info): Add a field to hold index type information and another one to hold ordering information. * dwarf2out.c (init_array_descr_info): New. (gen_type_die_with_usage): Initialize the array_descr_info structure before calling the lang-hook. (gen_descr_array_type_die): Use gen_type_die if not processing the main type variant. Replace Fortran-specific code with generic one using this new field. Add a GNAT descriptive type, if any. Output type information for the array bound subrange, if any. gcc/fortran/ * trans-types.c (gfc_get_array_descr_info): Describe all Fortran arrays with column major ordering. --- gcc/dwarf2out.c | 63 ++- gcc/dwarf2out.h | 13 ++ gcc/fortran/trans-types.c | 1 + 3 files changed, 71 insertions(+), 6 deletions(-) diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 21afc3f..835446e 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -17359,18 +17359,36 @@ static void gen_descr_array_type_die (tree type, struct array_descr_info *info, dw_die_ref context_die) { - dw_die_ref scope_die = scope_die_for (type, context_die); + dw_die_ref scope_die; dw_die_ref array_die; int dim; + /* Instead of producing a dedicated DW_TAG_array_type DIE for this type, let + the circuitry wrap the main variant with DIEs for qualifiers (for + instance: DW_TAG_const_type, ...). */ + if (type != TYPE_MAIN_VARIANT (type)) +{ + gen_type_die (TYPE_MAIN_VARIANT (type), context_die); + return; +} + + scope_die = scope_die_for (type, context_die); array_die = new_die (DW_TAG_array_type, scope_die, type); add_name_attribute (array_die, type_tag (type)); equate_type_number_to_die (type, array_die); - /* For Fortran multidimensional arrays use DW_ORD_col_major
[PATCH 1/2, PR 61654] Handle newly truly expanded artificial_thunks
Hi, I did not think it was possible, but it can happen that when duplicate_thunk_for_node creates a duplicate of a thunk which previously expand_thunk left alone to be expanded into assembly by the back end, the newly created thunk does get expanded by expand_thunk. When this happens, we end up with an un-analyzed node which triggers an assert later on. This patch deals with the situation by analyzing the newly expanded thunk. This revealed that DECL_ARGUMENTS were insufficiently copied for the new decl and it was sharing them with the old one. So this patch fixes this as well. Bootstrapped and tested on x86_64-linux and i686-linux (where the bug triggered), OK for trunk and the 4.9 branch? Thanks, Martin 2014-09-01 Martin Jambor mjam...@suse.cz PR ipa/61654 * cgraphclones.c (duplicate_thunk_for_node): Copy arguments of the new decl properly. Analyze the new thunk if it is expanded. gcc/testsuite/ * g++.dg/ipa/pr61654.C: New test. --- gcc/cgraphclones.c | 21 gcc/testsuite/g++.dg/ipa/pr61654.C | 40 ++ 2 files changed, 61 insertions(+) create mode 100644 gcc/testsuite/g++.dg/ipa/pr61654.C diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index eb04418..2a17de5 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -334,6 +334,22 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node) node-clone.args_to_skip, false); } + + tree *link = DECL_ARGUMENTS (new_decl); + int i = 0; + for (tree pd = DECL_ARGUMENTS (thunk-decl); pd; pd = DECL_CHAIN (pd), i++) +{ + if (!node-clone.args_to_skip + || !bitmap_bit_p (node-clone.args_to_skip, i)) + { + tree nd = copy_node (pd); + DECL_CONTEXT (nd) = new_decl; + *link = nd; + link = DECL_CHAIN (nd); + } +} + *link = NULL_TREE; + gcc_checking_assert (!DECL_STRUCT_FUNCTION (new_decl)); gcc_checking_assert (!DECL_INITIAL (new_decl)); gcc_checking_assert (!DECL_RESULT (new_decl)); @@ -357,6 +373,11 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node) symtab-call_edge_duplication_hooks (thunk-callees, e); if (!new_thunk-expand_thunk (false, false)) new_thunk-analyzed = true; + else +{ + new_thunk-thunk.thunk_p = false; + new_thunk-analyze (); +} symtab-call_cgraph_duplication_hooks (thunk, new_thunk); return new_thunk; diff --git a/gcc/testsuite/g++.dg/ipa/pr61654.C b/gcc/testsuite/g++.dg/ipa/pr61654.C new file mode 100644 index 000..d07e458 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/pr61654.C @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options -O3 } */ + +/* The bug only presented itself on a 32 bit i386 but in theory it might also + pop up elsewhere and we do not want to put -m32 options to testcase + options. */ + +struct A +{ + virtual int a (int, int = 0) = 0; + void b (); + void c (); + int d; +}; + +struct B : virtual A +{ + int a (int, int); + int e; +}; + +int f; + +void +A::b () +{ + a (0); +} + +void +A::c () +{ + a (f); +} + +int +B::a (int, int) +{ + return e; +} -- 1.8.4.5
[PATCH 2/2] Set analyzed flag of unexpanded thunks in expand_thunk
Hi, this is a followup to my previous PR-fixing patch. At ever more places we currently do if (!node-expand_thunk (false, whatever)) node-analyzed = true; and we always set the flag when expand_thunk returns with false (it only can when the first parameter is false). So I thought it would be much nicer to set the analyzed flag in expand_thunk itself when it returns false, especially given that we probably want to set the flag at as few places as reasonably possible. Bootstrapped and tested on x86_64-linux. OK for trunk? Thanks, Martin 2014-09-01 Martin Jambor mjam...@suse.cz * cgraphunit.c (expand_thunk): If not expanding, set analyzed flag. (analyze): Do not set analyze flag if expand_thunk returns false;. (create_wrapper): Likewise. * cgraphclones.c (duplicate_thunk_for_node): Likewise. --- gcc/cgraphclones.c | 4 +--- gcc/cgraphunit.c | 10 +- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index 2a17de5..224bb55 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -371,9 +371,7 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node) CGRAPH_FREQ_BASE); e-call_stmt_cannot_inline_p = true; symtab-call_edge_duplication_hooks (thunk-callees, e); - if (!new_thunk-expand_thunk (false, false)) -new_thunk-analyzed = true; - else + if (new_thunk-expand_thunk (false, false)) { new_thunk-thunk.thunk_p = false; new_thunk-analyze (); diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 3a99729..3e3b8d2 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -576,7 +576,6 @@ cgraph_node::analyze (void) if (!expand_thunk (false, false)) { thunk.alias = NULL; - analyzed = true; return; } thunk.alias = NULL; @@ -1451,7 +1450,10 @@ cgraph_node::expand_thunk (bool output_asm_thunks, bool force_gimple_thunk) tree restype = TREE_TYPE (TREE_TYPE (thunk_fndecl)); if (!output_asm_thunks) - return false; + { + analyzed = true; + return false; + } if (in_lto_p) get_body (); @@ -2313,9 +2315,7 @@ cgraph_node::create_wrapper (cgraph_node *target) cgraph_edge *e = create_edge (target, NULL, 0, CGRAPH_FREQ_BASE); -if (!expand_thunk (false, true)) - analyzed = true; - +expand_thunk (false, true); e-call_stmt_cannot_inline_p = true; /* Inline summary set-up. */ -- 1.8.4.5
[PATCH, PR 61986] Produce aggregate replacement nodes in ascending order of offsets
Hi, intersecting known aggregate values coming along a given set of call graph edges requires that all lists are in ascending order of offsets in order to perform it in only one sweep through each of them. However, aggregate replacement nodes are produced in exactly the opposite order. This makes us miss an item in the intersection and assert later. The ordering is fixed by the following patch. Bootstrapped and tested on x86_64-linux. OK for the trunk and all problematic branches (4.9 for sure, I am not sure about 4.8 at this moment). Thanks, Martin 2014-09-02 Martin Jambor mjam...@suse.cz PR ipa/61986 * ipa-cp.c (find_aggregate_values_for_callers_subset): Chain created replacements in ascending order of offsets. (known_aggs_to_agg_replacement_list): Likewise. testsuite/ * gcc.dg/ipa/pr61986.c: New test. diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 44d4c9a..58121d4 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -3146,7 +3146,8 @@ find_aggregate_values_for_callers_subset (struct cgraph_node *node, veccgraph_edge * callers) { struct ipa_node_params *dest_info = IPA_NODE_REF (node); - struct ipa_agg_replacement_value *res = NULL; + struct ipa_agg_replacement_value *res; + struct ipa_agg_replacement_value **tail = res; struct cgraph_edge *cs; int i, j, count = ipa_get_param_count (dest_info); @@ -3190,14 +3191,15 @@ find_aggregate_values_for_callers_subset (struct cgraph_node *node, v-offset = item-offset; v-value = item-value; v-by_ref = plats-aggs_by_ref; - v-next = res; - res = v; + *tail = v; + tail = v-next; } next_param: if (inter.exists ()) inter.release (); } + *tail = NULL; return res; } @@ -3206,7 +3208,8 @@ find_aggregate_values_for_callers_subset (struct cgraph_node *node, static struct ipa_agg_replacement_value * known_aggs_to_agg_replacement_list (vecipa_agg_jump_function known_aggs) { - struct ipa_agg_replacement_value *res = NULL; + struct ipa_agg_replacement_value *res; + struct ipa_agg_replacement_value **tail = res; struct ipa_agg_jump_function *aggjf; struct ipa_agg_jf_item *item; int i, j; @@ -3220,9 +3223,10 @@ known_aggs_to_agg_replacement_list (vecipa_agg_jump_function known_aggs) v-offset = item-offset; v-value = item-value; v-by_ref = aggjf-by_ref; - v-next = res; - res = v; + *tail = v; + tail = v-next; } + *tail = NULL; return res; } diff --git a/gcc/testsuite/gcc.dg/ipa/pr61986.c b/gcc/testsuite/gcc.dg/ipa/pr61986.c new file mode 100644 index 000..8d2f658 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr61986.c @@ -0,0 +1,48 @@ +/* { dg-do compile } */ +/* { dg-options -O3 } */ + +int a, b, c; + +struct S +{ + int f0; + int f1; +} d; + +static int fn2 (struct S); +void fn3 (struct S); + +void +fn1 (struct S p) +{ + struct S h = { 0, 0 }; + fn3 (p); + fn2 (h); +} + +int +fn2 (struct S p) +{ + struct S j = { 0, 0 }; + fn3 (p); + fn2 (j); + return 0; +} + +void +fn3 (struct S p) +{ + for (; b; a++) +c = p.f0; + fn1 (d); +} + +void +fn4 () +{ + for (;;) +{ + struct S f = { 0, 0 }; + fn1 (f); +} +}
[PATCH, PR 62015] Clear aggregate values intersection when jump function flag require us to punt
Hi, this PR revealed that the aggregate value intersection code in IPA-CP has one more problem in it, namely when jump function flags show that a PASS_THROUGH jump function cannot be used at all, it must also clear the intersection when punting. Fixed thusly. Bootstrapped and tested on x86_64-linux (so far only on trunk, testing on branches in progress). OK for trunk and all the problematic branches (IIRC both 4.9 and 4.8)? Thanks, Martin 2014-09-02 Martin Jambor mjam...@suse.cz PR ipa/62015 * ipa-cp.c (intersect_aggregates_with_edge): Handle impermissible pass-trough jump functions correctly. testsuite/ * g++.dg/ipa/pr62015.C: New test. diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 44d4c9a..afbec25 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -3048,6 +3048,11 @@ intersect_aggregates_with_edge (struct cgraph_edge *cs, int index, intersect_with_agg_replacements (cs-caller, src_idx, inter, 0); } + else + { + inter.release (); + return vNULL; + } } else { @@ -3063,6 +3068,11 @@ intersect_aggregates_with_edge (struct cgraph_edge *cs, int index, else intersect_with_plats (src_plats, inter, 0); } + else + { + inter.release (); + return vNULL; + } } } else if (jfunc-type == IPA_JF_ANCESTOR diff --git a/gcc/testsuite/g++.dg/ipa/pr62015.C b/gcc/testsuite/g++.dg/ipa/pr62015.C new file mode 100644 index 000..950b46e --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/pr62015.C @@ -0,0 +1,55 @@ +/* { dg-do run } */ +/* { dg-options -O3 -std=c++11 } */ + + +extern C int printf(const char *fmt, ...); +extern C void abort(void); + +struct Side { +enum _Value { Left, Right, Invalid }; + +constexpr Side() : _value(Invalid) {} +constexpr Side(_Value value) : _value(value) {} +operator _Value() const { return (_Value)_value; } + + private: +char _value; +}; + +struct A { +void init(); +void adjust(Side side, bool final); +void move(Side side); +}; + +void A::init() +{ +adjust(Side::Invalid, false); +} + +static void __attribute__((noinline)) +check (int v, int final) +{ +if (v != 0) + abort(); +} + + +__attribute__((noinline)) +void A::adjust(Side side, bool final) +{ + check ((int)side, final); +} + +void A::move(Side side) +{ +adjust(side, false); +adjust(side, true); +} + +int main() +{ +A t; +t.move(Side::Left); +return 0; +}
Re: [PATCH] Don't init ira_spilled_reg_stack_slots in ira if using lra.
ping! On Wed, Aug 27, 2014 at 10:49 PM, Kito Cheng kito.ch...@gmail.com wrote: Hi all: This patch is clean up useless initialize for IRA with LRA. 2014-08-27 Kito Cheng k...@0xlab.org * ira.c (ira): Don't initialize ira_spilled_reg_stack_slots and ira_spilled_reg_stack_slots_num if using lra. (do_reload): Remove release ira_spilled_reg_stack_slots part. * ira-color.c (ira_sort_regnos_for_alter_reg): Add assertion to make sure not using lra. (ira_reuse_stack_slot): Likewise. (ira_mark_new_stack_slot): Likewise.
Re: [PATCH][ARM][2/2] Vectorise lroundf, lfloorf, lceilf using the new ARMv8-A vcvt* instructions
On 03/09/14 08:42, Christophe Lyon wrote: Hi Kyrill, I've noticed that the tests you added with this patch fail (scan-tree-dump-times) for the armeb-none-linux-gnueabihf target. Not sure if you want to fix your patch or the tests? Hi Christophe, Ah, I reproduced it on armeb-none-eabi. The problem is that our NEON movmisalign pattern is disabled for big-endian so the vectoriser refuses to do load from the input pointer: vect-lceilf_1.c:13:3: note: Setting misalignment to -1. vect-lceilf_1.c:13:3: note: not vectorized: unsupported unaligned load.*_9 vect-lceilf_1.c:13:3: note: bad data alignment. Seems like that's deliberate: (define_expand movmisalignmode [(set (match_operand:VDQX 0 neon_perm_struct_or_reg_operand) (unspec:VDQX [(match_operand:VDQX 1 neon_perm_struct_or_reg_operand)] UNSPEC_MISALIGNED_ACCESS))] TARGET_NEON !BYTES_BIG_ENDIAN unaligned_access I can also see the following tests fail on big-endian: FAIL: gcc.target/arm/vect-rounding-btruncf.c scan-tree-dump-times vect vectorized 1 loops 1 FAIL: gcc.target/arm/vect-rounding-ceilf.c scan-tree-dump-times vect vectorized 1 loops 1 FAIL: gcc.target/arm/vect-rounding-floorf.c scan-tree-dump-times vect vectorized 1 loops 1 FAIL: gcc.target/arm/vect-rounding-roundf.c scan-tree-dump-times vect vectorized 1 loops 1 presumably for the same reason. I guess the way to fix this is to make the input and output arrays global variables and force them to align to 128 bits so we don't have to use misaligned accesses. I'll fix the tests up. Thanks, Kyrill Christophe. On 2 September 2014 17:48, Ramana Radhakrishnan ramana.radhakrish...@arm.com wrote: On 02/09/14 16:34, Kyrill Tkachov wrote: Hi all, In continuation of patch [1/2]... We can use the vector forms of the vcvt{a,p,m} instructions to vectorise the l{round, ceil, floor}f functions. Builtins are added and the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION implementation is updated to wire up the vectorised forms of these functions to the midend. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Ok - thanks. Ramana Thanks, Kyrill 2014-09-02 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/62275 * config/arm/neon.md (neon_vcvtNEON_VCVT:nvrint_variantsu_optabVCVTF:mode v_cmp_result): New pattern. * config/arm/iterators.md (NEON_VCVT): New int iterator. * config/arm/arm_neon_builtins.def (vcvtav2sf, vcvtav4sf, vcvtauv2sf, vcvtauv4sf, vcvtpv2sf, vcvtpv4sf, vcvtpuv2sf, vcvtpuv4sf, vcvtmv2sf, vcvtmv4sf, vcvtmuv2sf, vcvtmuv4sf): New builtin definitions. * config/arm/arm.c (arm_builtin_vectorized_function): Handle BUILT_IN_LROUNDF, BUILT_IN_LFLOORF, BUILT_IN_LCEILF. 2014-09-02 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/62275 * gcc.target/arm/vect-lceilf_1.c: New test. * gcc.target/arm/vect-lfloorf_1.c: Likewise. * gcc.target/arm/vect-lroundf_1.c: Likewise.
Re: [PATCH] Force rtl templates to be inlined
On Tue, Sep 2, 2014 at 6:52 PM, Andi Kleen a...@linux.intel.com wrote: Or we simply should make -finline work at -O0 (I suppose it might already work?) and use it. Yes that's probably better. There are more hot inlines in the stage 1 profile (like wi::storage_ref or vec::length) I suspect with the ongoing C++'ification that will get worse. Like with this (untested apart from on a small testcase). Of course it usually won't help bootstrap because your host compiler doesn't support -O0 -finline yet. Also it might not help that much because while it certainly removes call overhead it will still not optimize anything else. Also inline analysis correctly assumes that no stmts go away so you only have the call overhead as room to allow inlining (so with checking enabled the is_a/as_a stuff is probably too large - didn't check). It may also negatively affect debugging (no var-tracking at -O0). Anyway, removing !optimize checks in favor of flag_no_inline checks and initializing that properly is a cleanup as well. Now to see how to best test this ... Richard. -Andi -- a...@linux.intel.com -- Speaking for myself only make-inlining-work-at-O0 Description: Binary data
Re: [PATCH] Force rtl templates to be inlined
On Tue, Sep 2, 2014 at 6:52 PM, Andi Kleen a...@linux.intel.com wrote: Or we simply should make -finline work at -O0 (I suppose it might already work?) and use it. Yes that's probably better. There are more hot inlines in the stage 1 profile (like wi::storage_ref or vec::length) I suspect with the ongoing C++'ification that will get worse. Btw, it's C++ which I considered that -Og might replace -O0 exactly because of all the abstraction penalty which usually doesn't help debugging at -O0. Also the idea was that -Og might even compile faster than -O0, but that's far from true unfortunately ... Richard. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: RFA: Merge definitions of get_some_local_dynamic_name
On Tue, Sep 2, 2014 at 8:36 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Several targets define a function like i386's get_some_local_dynamic_name. The function looks through the current output function and returns the first (arbitrary) local-dynamic symbol that it finds. The result can be used in a call to __tls_get_addr, since all local-dynamic symbols have the same base. This patch replaces the various target functions with a single generic one. The only difference between the implementations was that s390 checked for constant pool references while the others didn't need to (because they don't allow TLS symbols to be forced into the pool). Checking for constant pool references is unnecessary but harmless for the other ports. Also, the walk is needed only once per TLS-referencing output function, so it's hardly critical in terms of compile time. All uses of this function are in final. In general it wouldn't be safe to call the function earlier than that, since the symbol reference could in principle be deleted by any rtl pass. I've therefore cached it in a variable local to final rather than in cfun (which is where the ports used to cache it). Also, i386 was robust against uses of % in inline asm. The patch makes sure the other ports are too. Using % in inline asm would often be a mistake, but it should at least trigger a proper error rather than an ICE. Tested on x86_64-linux-gnu. Also tested by building cross compilers before and after the change on: alpha-linux-gnu powerpc64-linux-gnu s390x-linux-gnu sparc64-linux-gnu OK to install? Ok. Thanks, Richard. Thanks, Richard gcc/ * output.h (get_some_local_dynamic_name): Declare. * final.c (some_local_dynamic_name): New variable. (get_some_local_dynamic_name): New function. (final_end_function): Clear some_local_dynamic_name. * config/alpha/alpha.c (machine_function): Remove some_ld_name. (get_some_local_dynamic_name, get_some_local_dynamic_name_1): Delete. (print_operand): Report an error if '%' is used inappropriately. * config/i386/i386.c (get_some_local_dynamic_name): Delete. (get_some_local_dynamic_name_1): Delete. * config/rs6000/rs6000.c (machine_function): Remove some_ld_name. (rs6000_get_some_local_dynamic_name): Delete. (rs6000_get_some_local_dynamic_name_1): Delete. (print_operand): Report an error if '%' is used inappropriately. * config/s390/s390.c (machine_function): Remove some_ld_name. (get_some_local_dynamic_name, get_some_local_dynamic_name_1): Delete. (print_operand): Assert that get_some_local_dynamic_name is nonnull. * config/sparc/sparc.c: Include rtl-iter.h. (machine_function): Remove some_ld_name. (sparc_print_operand): Report an error if '%' is used inappropriately. (get_some_local_dynamic_name, get_some_local_dynamic_name_1): Delete. Index: gcc/output.h === --- gcc/output.h2014-08-31 21:05:04.701330252 +0100 +++ gcc/output.h2014-09-02 19:02:59.820482510 +0100 @@ -52,6 +52,8 @@ extern int get_attr_min_length (rtx); any branches of variable length if possible. */ extern void shorten_branches (rtx_insn *); +const char *get_some_local_dynamic_name (); + /* Output assembler code for the start of a function, and initialize some of the variables in this file for the new function. The label for the function and associated Index: gcc/final.c === --- gcc/final.c 2014-08-31 21:05:04.701330252 +0100 +++ gcc/final.c 2014-09-02 19:17:08.573876805 +0100 @@ -1719,6 +1719,38 @@ reemit_insn_block_notes (void) reorder_blocks (); } +static const char *some_local_dynamic_name; + +/* Locate some local-dynamic symbol still in use by this function + so that we can print its name in local-dynamic base patterns. + Return null if there are no local-dynamic references. */ + +const char * +get_some_local_dynamic_name () +{ + subrtx_iterator::array_type array; + rtx_insn *insn; + + if (some_local_dynamic_name) +return some_local_dynamic_name; + + for (insn = get_insns (); insn ; insn = NEXT_INSN (insn)) +if (NONDEBUG_INSN_P (insn)) + FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL) + { + const_rtx x = *iter; + if (GET_CODE (x) == SYMBOL_REF) + { + if (SYMBOL_REF_TLS_MODEL (x) == TLS_MODEL_LOCAL_DYNAMIC) + return some_local_dynamic_name = XSTR (x, 0); + if (CONSTANT_POOL_ADDRESS_P (x)) + iter.substitute (get_pool_constant (x)); + } + } + + return 0; +} + /* Output assembler code for the start of a function, and initialize some of the variables in this file for the new function.
[patch] prevent tree sinking of trapping stmts
Hello, For the testcase below, the tree-ssa-sink pass sinks the first a = b + c; assignment within the if branch. This is problematic when the + operation on floats could trap, as it gets moved out of the path that dominates the call in the else branch and a trap on the original + should prevent the call from taking place. The attached patch is a proposal to address this by refusing to sink statements that could trap, except load/stores. Trapping loads or stores typically yield undefined behavior anyway, and not sinking a load or store as soon as it is potentially trapping pessimizes quite a bit for no valid reason. Bootstrapped and regression tested on x86_64-linux-gnu. OK to commit ? Thanks in advance for your feedback, With Kind Regards, Olivier 2014-09-03 Olivier Hainque hain...@adacore.com * tree-ssa-sink.c (statement_sink_location): Don't sink !load-or-store stmts that could trap. testsuite/ * gcc.dg/tree-ssa/ssa-sink-13.c: New test. [see attached file: sinktrap.diff] -- extern void bar (); float foo (int call) { float a, b, c; a = b + c; if (!call) ; else { bar (); b = c * 2; a = b + c; } return a; } sinktrap.diff Description: Binary data
Re: [PATCH][AArch64] Fix wrong .cfi_def_cfa_offset in epilogue
On 20 August 2014 09:43, Jiong Wang jiong.w...@arm.com wrote: gcc/ * config/aarch64/aarch64.c (aarch64_expand_epilogue): Remove redundant cfa offset update. OK /Marcus
[PATCH][match-and-simplify] Fix error in VCE pattern
The pattern (and the fold_unary code it was derived from) stripping inner conversions from VIEW_CONVERT_EXPRs is bogus as in that it doesn't make sure the the size of the types match. This triggers a IL verification for gfortran.fortran-torture/compile/forall-1.f90 otherwise. Committed to the branch. Richard. 2014-09-03 Richard Biener rguent...@suse.de * match-conversions.pd ((view_convert (convert @0)) - (view_convert @0)): Restrict to conversions that do not change size. Index: gcc/match-conversions.pd === --- gcc/match-conversions.pd(revision 214864) +++ gcc/match-conversions.pd(working copy) @@ -77,12 +77,13 @@ TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@0))) (convert @0))) -/* Strip inner integral conversions that do not change the precision. */ +/* Strip inner integral conversions that do not change precision or size. */ (simplify (view_convert (convert@0 @1)) (if ((INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0))) (INTEGRAL_TYPE_P (TREE_TYPE (@1)) || POINTER_TYPE_P (TREE_TYPE (@1))) -(TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1 +(TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) +(TYPE_SIZE (TREE_TYPE (@0)) == TYPE_SIZE (TREE_TYPE (@1 (view_convert @1)))
Re: [RFC] Tweak gcc.c-torture/execute/pr39228.c
On Wed, Sep 3, 2014 at 8:28 AM, Kaz Kojima kkoj...@rr.iij4u.or.jp wrote: Uros Bizjak ubiz...@gmail.com wrote: -/* { dg-options -mieee { target sh*-*-* alpha*-*-* } } */ +/* { dg-options -w -mieee { target sh*-*-* alpha*-*-* } } */ /* { dg-skip-if No Inf/NaN support { spu-*-* } * } */ Please use /* { dg-add-options ieee } */ directive here. There is another one possible in pr44683.c. Thanks for your suggestion. Here is a take 3 patch. I'll take a look at pr44683.c. Ok. Thanks, Richard. Regards, kaz -- * gcc.c-torture/execute/pr39228.c: Use dg-add-options instead of dg-options. Add inline keyword to test functions. --- ORIG/trunk/gcc/testsuite/gcc.c-torture/execute/pr39228.c2014-08-26 09:26:20.0 +0900 +++ trunk/gcc/testsuite/gcc.c-torture/execute/pr39228.c 2014-09-03 15:23:15.219917354 +0900 @@ -1,23 +1,23 @@ -/* { dg-options -mieee { target sh*-*-* alpha*-*-* } } */ +/* { dg-add-options ieee } */ /* { dg-skip-if No Inf/NaN support { spu-*-* } * } */ extern void abort (void); -static int __attribute__((always_inline)) testf (float b) +static inline int __attribute__((always_inline)) testf (float b) { float c = 1.01f * b; return __builtin_isinff (c); } -static int __attribute__((always_inline)) test (double b) +static inline int __attribute__((always_inline)) test (double b) { double c = 1.01 * b; return __builtin_isinf (c); } -static int __attribute__((always_inline)) testl (long double b) +static inline int __attribute__((always_inline)) testl (long double b) { long double c = 1.01L * b;
Re: [PATCH, PR 61986] Produce aggregate replacement nodes in ascending order of offsets
On Wed, Sep 3, 2014 at 10:46 AM, Martin Jambor mjam...@suse.cz wrote: Hi, intersecting known aggregate values coming along a given set of call graph edges requires that all lists are in ascending order of offsets in order to perform it in only one sweep through each of them. However, aggregate replacement nodes are produced in exactly the opposite order. This makes us miss an item in the intersection and assert later. The ordering is fixed by the following patch. Bootstrapped and tested on x86_64-linux. OK for the trunk and all problematic branches (4.9 for sure, I am not sure about 4.8 at this moment). Ok. Thanks, Richard. Thanks, Martin 2014-09-02 Martin Jambor mjam...@suse.cz PR ipa/61986 * ipa-cp.c (find_aggregate_values_for_callers_subset): Chain created replacements in ascending order of offsets. (known_aggs_to_agg_replacement_list): Likewise. testsuite/ * gcc.dg/ipa/pr61986.c: New test. diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 44d4c9a..58121d4 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -3146,7 +3146,8 @@ find_aggregate_values_for_callers_subset (struct cgraph_node *node, veccgraph_edge * callers) { struct ipa_node_params *dest_info = IPA_NODE_REF (node); - struct ipa_agg_replacement_value *res = NULL; + struct ipa_agg_replacement_value *res; + struct ipa_agg_replacement_value **tail = res; struct cgraph_edge *cs; int i, j, count = ipa_get_param_count (dest_info); @@ -3190,14 +3191,15 @@ find_aggregate_values_for_callers_subset (struct cgraph_node *node, v-offset = item-offset; v-value = item-value; v-by_ref = plats-aggs_by_ref; - v-next = res; - res = v; + *tail = v; + tail = v-next; } next_param: if (inter.exists ()) inter.release (); } + *tail = NULL; return res; } @@ -3206,7 +3208,8 @@ find_aggregate_values_for_callers_subset (struct cgraph_node *node, static struct ipa_agg_replacement_value * known_aggs_to_agg_replacement_list (vecipa_agg_jump_function known_aggs) { - struct ipa_agg_replacement_value *res = NULL; + struct ipa_agg_replacement_value *res; + struct ipa_agg_replacement_value **tail = res; struct ipa_agg_jump_function *aggjf; struct ipa_agg_jf_item *item; int i, j; @@ -3220,9 +3223,10 @@ known_aggs_to_agg_replacement_list (vecipa_agg_jump_function known_aggs) v-offset = item-offset; v-value = item-value; v-by_ref = aggjf-by_ref; - v-next = res; - res = v; + *tail = v; + tail = v-next; } + *tail = NULL; return res; } diff --git a/gcc/testsuite/gcc.dg/ipa/pr61986.c b/gcc/testsuite/gcc.dg/ipa/pr61986.c new file mode 100644 index 000..8d2f658 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr61986.c @@ -0,0 +1,48 @@ +/* { dg-do compile } */ +/* { dg-options -O3 } */ + +int a, b, c; + +struct S +{ + int f0; + int f1; +} d; + +static int fn2 (struct S); +void fn3 (struct S); + +void +fn1 (struct S p) +{ + struct S h = { 0, 0 }; + fn3 (p); + fn2 (h); +} + +int +fn2 (struct S p) +{ + struct S j = { 0, 0 }; + fn3 (p); + fn2 (j); + return 0; +} + +void +fn3 (struct S p) +{ + for (; b; a++) +c = p.f0; + fn1 (d); +} + +void +fn4 () +{ + for (;;) +{ + struct S f = { 0, 0 }; + fn1 (f); +} +}
Re: [PATCH, PR 62015] Clear aggregate values intersection when jump function flag require us to punt
On Wed, Sep 3, 2014 at 10:47 AM, Martin Jambor mjam...@suse.cz wrote: Hi, this PR revealed that the aggregate value intersection code in IPA-CP has one more problem in it, namely when jump function flags show that a PASS_THROUGH jump function cannot be used at all, it must also clear the intersection when punting. Fixed thusly. Bootstrapped and tested on x86_64-linux (so far only on trunk, testing on branches in progress). OK for trunk and all the problematic branches (IIRC both 4.9 and 4.8)? Ok. Thanks, Richard. Thanks, Martin 2014-09-02 Martin Jambor mjam...@suse.cz PR ipa/62015 * ipa-cp.c (intersect_aggregates_with_edge): Handle impermissible pass-trough jump functions correctly. testsuite/ * g++.dg/ipa/pr62015.C: New test. diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 44d4c9a..afbec25 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -3048,6 +3048,11 @@ intersect_aggregates_with_edge (struct cgraph_edge *cs, int index, intersect_with_agg_replacements (cs-caller, src_idx, inter, 0); } + else + { + inter.release (); + return vNULL; + } } else { @@ -3063,6 +3068,11 @@ intersect_aggregates_with_edge (struct cgraph_edge *cs, int index, else intersect_with_plats (src_plats, inter, 0); } + else + { + inter.release (); + return vNULL; + } } } else if (jfunc-type == IPA_JF_ANCESTOR diff --git a/gcc/testsuite/g++.dg/ipa/pr62015.C b/gcc/testsuite/g++.dg/ipa/pr62015.C new file mode 100644 index 000..950b46e --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/pr62015.C @@ -0,0 +1,55 @@ +/* { dg-do run } */ +/* { dg-options -O3 -std=c++11 } */ + + +extern C int printf(const char *fmt, ...); +extern C void abort(void); + +struct Side { +enum _Value { Left, Right, Invalid }; + +constexpr Side() : _value(Invalid) {} +constexpr Side(_Value value) : _value(value) {} +operator _Value() const { return (_Value)_value; } + + private: +char _value; +}; + +struct A { +void init(); +void adjust(Side side, bool final); +void move(Side side); +}; + +void A::init() +{ +adjust(Side::Invalid, false); +} + +static void __attribute__((noinline)) +check (int v, int final) +{ +if (v != 0) + abort(); +} + + +__attribute__((noinline)) +void A::adjust(Side side, bool final) +{ + check ((int)side, final); +} + +void A::move(Side side) +{ +adjust(side, false); +adjust(side, true); +} + +int main() +{ +A t; +t.move(Side::Left); +return 0; +}
Re: [patch] prevent tree sinking of trapping stmts
On Wed, Sep 3, 2014 at 12:28 PM, Olivier Hainque hain...@adacore.com wrote: Hello, For the testcase below, the tree-ssa-sink pass sinks the first a = b + c; assignment within the if branch. This is problematic when the + operation on floats could trap, as it gets moved out of the path that dominates the call in the else branch and a trap on the original + should prevent the call from taking place. The attached patch is a proposal to address this by refusing to sink statements that could trap, except load/stores. Trapping loads or stores typically yield undefined behavior anyway, and not sinking a load or store as soon as it is potentially trapping pessimizes quite a bit for no valid reason. I don't quite follow this reasoning. Why would a trapping FP operation not be undefined behavior without -fnon-call-exceptions? That is, don't you want to check stmt_could_throw_p? That is, what's the real testcase you are trying to fix? Richard. Bootstrapped and regression tested on x86_64-linux-gnu. OK to commit ? Thanks in advance for your feedback, With Kind Regards, Olivier 2014-09-03 Olivier Hainque hain...@adacore.com * tree-ssa-sink.c (statement_sink_location): Don't sink !load-or-store stmts that could trap. testsuite/ * gcc.dg/tree-ssa/ssa-sink-13.c: New test. [see attached file: sinktrap.diff] -- extern void bar (); float foo (int call) { float a, b, c; a = b + c; if (!call) ; else { bar (); b = c * 2; a = b + c; } return a; }
Re: [C++ Patch] PR 58102 aka DR 1405
Hi, On 09/02/2014 05:45 PM, Jason Merrill wrote: On 09/02/2014 11:07 AM, Paolo Carlini wrote: Anyway, what about the below? Certainly works for the tests which we have got. Hmm. This is definitely an improvement, as it allows a subset of a non-volatile glvalue of literal type that refers to a non-volatile object whose lifetime began within the evalution of e But it doesn't cover all of that, and in any case we shouldn't need to explicitly handle that just for types with mutable subobjects. I think perhaps it would be better to remove that hunk as in your initial patch and replace it with a check in constant_value_1 and an explanation in non_const_var_error. In practice, I'm encountering a rather serious problem with moving away the check, I'm looking more into it, but maybe I can already explain it to you... The issue, AFAICS, boils down to the difference itself between cxx_eval_outermost_constant_expr and cxx_eval_constant_expression: changing constant_value_1 means that in principle all the calls of the latter (for VAR_DECLs) are impacted. Thus, for example, for the call at the beginning of cxx_eval_component_reference: struct A { int i; mutable int j; }; constexpr A a = { 0, 1 }; constexpr int i = a.i; how do we avoid emitting a wrong error for the a of a.i? Paolo.
Re: [PATCH 1/4] aarch64: Improve epilogue unwind info
On 22 August 2014 23:05, Richard Henderson r...@redhat.com wrote: Delay cfi restore opcodes until the stack frame is deallocated. This reduces the number of cfi advance opcodes required. We perform a similar optimization in the x86_64 epilogue. * config/aarch64/aarch64.c (aarch64_popwb_single_reg): Remove. (aarch64_popwb_pair_reg): Remove. (aarch64_restore_callee_saves): Add CFI_OPS argument; fill it with the restore ops performed by the insns generated. (aarch64_expand_epilogue): Attach CFI_OPS to the stack deallocation insn. Perform the calls_eh_return addition later; do not attempt to preserve the CFA in that case. Don't use aarch64_set_frame_expr. OK, thank you. /Marcus
Re: [PATCH 2/4] aarch64: Tidy prologue unwind notes
On 22 August 2014 23:05, Richard Henderson r...@redhat.com wrote: We were marking more than necessary in aarch64_set_frame_expr. Fold the reduced function into aarch64_expand_prologue as necessary. * config/aarch64/aarch64.c (aarch64_set_frame_expr): Remove. (aarch64_expand_prologue): Use REG_CFA_ADJUST_CFA directly, or no special markup at all. Ok /Marcus
Re: [PATCH] Add -fno-instrument-function
On Tue, Sep 2, 2014 at 5:00 PM, Andi Kleen a...@linux.intel.com wrote: Hmm, why not make -no-pg (does that exist?) and/or -mno-fentry I'm not sure. do this? That is, I don't see the need for a new option. That would be really odd behavior. An yes/no option whose default is controlled by other object files' command line. And -pg would be for all files in LTO, and no-pg only for that file, so not be symmetric. I think an explicit different option has far cleaner semantics for now (at least until the LTO option mess can be properly cleaned up) No, not a new fake option either but just initialize DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT properly when not doing -pg or -mfentry (that is, set it to 1). Richard. -Andi
Re: [PATCH 3/4] aarch64: Tidy prologue local variables
On 22 August 2014 23:05, Richard Henderson r...@redhat.com wrote: Don't continually re-read data from cfun-machine. * config/aarch64/aarch64.c (aarch64_expand_prologue): Load cfun-machine-frame.hard_fp_offset into a local variable. OK /Marcus
Re: [PATCH 4/4] aarch64: Don't duplicate calls_alloca check
On 22 August 2014 23:05, Richard Henderson r...@redhat.com wrote: Generic code already handles calls_alloca for determining the need for a frame pointer. * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Don't check calls_alloca. Ok Thanks /Marcus
[PATCH] Fix for tree-ssa-pre
Hello, I've encountered and issue in a ltrans for libxul.so (with LTO). The patch fixes uninitialized value for a given argument, pre-approved by Richard. Thanks, Martin gcc/ChangeLog: 2014-09-03 Martin Liska mli...@suse.cz * tree-ssa-sccvn.c (vn_reference_lookup_call): default (NULL) value is set to preserve uninitialized value for vnresult. diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c index 1bcbde3..44656ea 100644 --- a/gcc/tree-ssa-sccvn.c +++ b/gcc/tree-ssa-sccvn.c @@ -2146,6 +2146,9 @@ void vn_reference_lookup_call (gimple call, vn_reference_t *vnresult, vn_reference_t vr) { + if (vnresult) +*vnresult = NULL; + tree vuse = gimple_vuse (call); vr-vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
Re: [C PATCH] Backport a fix for PR62294 to 4.9
On Tue, Sep 02, 2014 at 05:01:10PM +, Joseph S. Myers wrote: On Tue, 2 Sep 2014, Marek Polacek wrote: PR62294 reports that 4.9 does not emit an incompatible pointer type warning in certain scenario. I unknowingly broke this in r207335, and then fixed it in r210980, which is a follow-up to the former. But 4.9 doesn't have the latter. This patch is basically a backport of r210980, only without the traditional conversion stuff. Bootstrapped/regtested on x86_64-linux, ok for 4.9? OK with a testcase specifically for the regression case added on trunk and 4.9 if there isn't one already. Sure, I managed to create one. I'll add the testcase onto trunk in a separate patch. Bootstrapped/regtested on x86_64-linux, applying to 4.9. 2014-09-03 Marek Polacek pola...@redhat.com PR c/62294 * c-typeck.c (convert_arguments): Get location of a parameter. Change error and warning calls to error_at and warning_at. Pass location of a parameter to it. (convert_for_assignment): Add parameter to WARN_FOR_ASSIGNMENT and WARN_FOR_QUALIFIERS. Pass expr_loc to those. * gcc.dg/pr56724-1.c: New test. * gcc.dg/pr56724-2.c: New test. * gcc.dg/pr62294.c: New test. * gcc.dg/pr62294.h: New file. diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c index 5838d6a..d096ad4 100644 --- gcc/c/c-typeck.c +++ gcc/c/c-typeck.c @@ -3071,6 +3071,12 @@ convert_arguments (location_t loc, veclocation_t arg_loc, tree typelist, bool excess_precision = false; bool npc; tree parmval; + /* Some __atomic_* builtins have additional hidden argument at +position 0. */ + location_t ploc + = !arg_loc.is_empty () values-length () == arg_loc.length () + ? expansion_point_location_if_in_system_header (arg_loc[parmnum]) + : input_location; if (type == void_type_node) { @@ -3113,7 +3119,8 @@ convert_arguments (location_t loc, veclocation_t arg_loc, tree typelist, if (type == error_mark_node || !COMPLETE_TYPE_P (type)) { - error (type of formal parameter %d is incomplete, parmnum + 1); + error_at (ploc, type of formal parameter %d is incomplete, + parmnum + 1); parmval = val; } else @@ -3128,34 +3135,34 @@ convert_arguments (location_t loc, veclocation_t arg_loc, tree typelist, if (INTEGRAL_TYPE_P (type) TREE_CODE (valtype) == REAL_TYPE) - warning (0, passing argument %d of %qE as integer -rather than floating due to prototype, -argnum, rname); + warning_at (ploc, 0, passing argument %d of %qE as + integer rather than floating due to + prototype, argnum, rname); if (INTEGRAL_TYPE_P (type) TREE_CODE (valtype) == COMPLEX_TYPE) - warning (0, passing argument %d of %qE as integer -rather than complex due to prototype, -argnum, rname); + warning_at (ploc, 0, passing argument %d of %qE as + integer rather than complex due to + prototype, argnum, rname); else if (TREE_CODE (type) == COMPLEX_TYPE TREE_CODE (valtype) == REAL_TYPE) - warning (0, passing argument %d of %qE as complex -rather than floating due to prototype, -argnum, rname); + warning_at (ploc, 0, passing argument %d of %qE as + complex rather than floating due to + prototype, argnum, rname); else if (TREE_CODE (type) == REAL_TYPE INTEGRAL_TYPE_P (valtype)) - warning (0, passing argument %d of %qE as floating -rather than integer due to prototype, -argnum, rname); + warning_at (ploc, 0, passing argument %d of %qE as + floating rather than integer due to + prototype, argnum, rname); else if (TREE_CODE (type) == COMPLEX_TYPE INTEGRAL_TYPE_P (valtype)) - warning (0, passing argument %d of %qE as complex -rather than integer due to prototype, -argnum, rname); + warning_at (ploc, 0, passing argument %d of %qE as + complex rather than integer due to + prototype, argnum, rname); else if (TREE_CODE (type) == REAL_TYPE
Re: [patch] prevent tree sinking of trapping stmts
Hi Richard, (Thanks for your feedback) On Sep 3, 2014, at 12:52 , Richard Biener richard.guent...@gmail.com wrote: I don't quite follow this reasoning. Why would a trapping FP operation not be undefined behavior without -fnon-call-exceptions? That is, don't you want to check stmt_could_throw_p? That is, what's the real testcase you are trying to fix? The original testcase is an Ada ACATS test. It reduces to a miscompilation by gcc-4.7 of this excerpt: with Report; procedure P is X : Float; function Ident (X : Float) return Float; pragma Import (Ada, Ident); procedure Latch (X : Float); pragma Import (Ada, Latch); begin X := Ident(Float'Base'Last) + Float'Base'Last; Report.Failed(NO EXCEPTION RAISED BY LARGE '+'); Latch (X); exception when others = NULL; end; The observable problem is that the + operation gets moved past the call to Report.Failed: from .095t.sink Sinking x_19 = D.1629_17 + 3.4028234663852885981170418348451692544e+38; from bb 15 to bb 18 ... bb 15: report_E$5_20 = report_E; if (report_E$5_20 == 0) goto bb 16; else goto bb 17; bb 16: .gnat_rcheck_PE_Access_Before_Elaboration (p.adb, 15); bb 17: D.1633.P_ARRAY = NO EXCEPTION RAISED BY LARGE \'+\'; D.1633.P_BOUNDS = *.LC0; report.failed (D.1633); bb 18: x_19 = D.1629_17 + 3.4028234663852885981170418348451692544e+38; p.latch (x_19); This was for a port configured to use our front-end sjlj exception scheme, using builtin_setjmp and builtin_longjmp to manage exception regions and propagations. This operates without -fexceptions at all, so stmt_could_throw_p is false in this case unfortunately. Olivier
[PATCH][RFC] Restrict, take 42
Ok, so with recent activity in that mgrid bug (PR55334) I tried to remember what solution we thought of after determining that ADD_RESTRICT is a no-go. The following very prototypish patch implements the idea of computing known non-dependences and maintaining them over the compilation (mainly inlining / cloning for PR55334). So the patch piggy-backs on PTA (bad - PTA doesn't compute must-aliases, so it will work only for a very limited set of testcases now - but at least it might work for non-register restrict stuff). The representation of non-dependences is the most interesting bit of course. We partition memory references into different cliques in which all references are independent. And we split that clique again into sets of references based on the same pointer. That allows us to disambiguate equal clique but distinct pointer memory references. For restrict you'd put all(!) memory references in the BLOCK where the restrict pointer is live into the same clique and assign a distinct ptr based on which restrict pointer this is based on. You make all references not based on any restrict pointer have ptr == 0 (not implemented in the prototype patch - they don't get a clique assigned). The patch simplifies this by taking function scope as the only BLOCK to consider, thus inside a function the clique will be unique (before inlining). I can see issues arising with assigning numbers in the frontend based on real BLOCKs and { int * restrict q; { int * restrict p; *p = *q; } { int * restrict r; *r = *q; } that is, non-dependences based on pointers coming from different nested scopes. The FE in this case would need to duplicate the ptr value for 'q' to not make *p and *r falsely a non-dependence. But I think we're not planning to assign this in the C frontend family (maybe the Fortran FE though). To preserve correctness cliques from inlined functions need to be remapped to an unused clique number so struct function gets a max_clique number. (remapping not implemented in the prototype) Any correctness / missed-optimization holes to punch? That is, given a set of non-dependent reference pairs, can you assign that clique, ptr values in a correct and optimal way? (it somehow assumes transitivity, if the set is *a, *b; *b, *c; then *a, *c are also non-dependent) It all depends on a conservative but agressive must-be-based-on analysis of course. You'd meet that with the conservative may-be-based-on analysis from PTA and compute the proper non-dependences from that. Comments welcome. Patch tested on static inline void foo (int * __restrict__ p, int * __restrict__ q) { int i; for (i = 0; i 1024; ++i) p[i] = q[i]; } void bar (int *p, int *q) { foo (p, q); } where it still vectorizes the loop in bar without alias check. Richard. Index: trunk/gcc/function.h === *** trunk.orig/gcc/function.h 2014-08-29 14:18:27.221364973 +0200 --- trunk/gcc/function.h2014-09-03 13:15:11.433883630 +0200 *** struct GTY(()) function { *** 592,597 --- 592,600 a string describing the reason for failure. */ const char * GTY((skip)) cannot_be_copied_reason; + /* Last assigned restrict UID. */ + unsigned short last_restrict_uid; + /* Collected bit flags. */ /* Number of units of general registers that need saving in stdarg Index: trunk/gcc/tree-inline.c === *** trunk.orig/gcc/tree-inline.c2014-08-26 13:40:18.811368135 +0200 --- trunk/gcc/tree-inline.c 2014-09-03 14:15:14.792635543 +0200 *** remap_gimple_op_r (tree *tp, int *walk_s *** 929,934 --- 929,937 TREE_THIS_VOLATILE (*tp) = TREE_THIS_VOLATILE (old); TREE_SIDE_EFFECTS (*tp) = TREE_SIDE_EFFECTS (old); TREE_NO_WARNING (*tp) = TREE_NO_WARNING (old); + /* ??? Properly remap the clique to a new one. */ + if (old-base.u.version != 0) + (*tp)-base.u.version = old-base.u.version; /* We cannot propagate the TREE_THIS_NOTRAP flag if we have remapped a parameter as the property might be valid only for the parameter itself. */ *** copy_tree_body_r (tree *tp, int *walk_su *** 1180,1185 --- 1183,1191 TREE_THIS_VOLATILE (*tp) = TREE_THIS_VOLATILE (old); TREE_SIDE_EFFECTS (*tp) = TREE_SIDE_EFFECTS (old); TREE_NO_WARNING (*tp) = TREE_NO_WARNING (old); + /* ??? Properly remap the clique to a new one. */ + if (old-base.u.version != 0) + (*tp)-base.u.version = old-base.u.version; /* We cannot propagate the TREE_THIS_NOTRAP flag if we have remapped a parameter as the property might be valid only for the parameter itself. */ Index: trunk/gcc/tree-pretty-print.c
Re: [Patch AArch64] Fix for PR62040
On 20 August 2014 20:51, Carrot Wei car...@google.com wrote: Good suggestion. Add the testcase. thanks Guozhi Wei 2014-08-20 Guozhi Wei car...@google.com PR target/62040 * gcc.target/aarch64/pr62040.c: New test. Index: pr62040.c === --- pr62040.c (revision 0) +++ pr62040.c (revision 0) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options -g -Os } */ + +#include arm_neon.h + +extern bar(int32x4_t); + +void foo() { + int32x4x4_t rows; + uint64x2x2_t row01; + + row01.val[0] = vreinterpretq_u64_s32(rows.val[0]); + row01.val[1] = vreinterpretq_u64_s32(rows.val[1]); + uint64x1_t row3l = vget_low_u64(row01.val[0]); + row01.val[0] = vcombine_u64(vget_low_u64(row01.val[1]), row3l); + int32x4_t xxx = vreinterpretq_s32_u64(row01.val[0]); + int32x4_t out = vtrn1q_s32 (xxx, xxx); + bar(out); +} GNU coding style please. /Marcus
Re: [patch] prevent tree sinking of trapping stmts
On Wed, Sep 3, 2014 at 2:49 PM, Olivier Hainque hain...@adacore.com wrote: Hi Richard, (Thanks for your feedback) On Sep 3, 2014, at 12:52 , Richard Biener richard.guent...@gmail.com wrote: I don't quite follow this reasoning. Why would a trapping FP operation not be undefined behavior without -fnon-call-exceptions? That is, don't you want to check stmt_could_throw_p? That is, what's the real testcase you are trying to fix? The original testcase is an Ada ACATS test. It reduces to a miscompilation by gcc-4.7 of this excerpt: with Report; procedure P is X : Float; function Ident (X : Float) return Float; pragma Import (Ada, Ident); procedure Latch (X : Float); pragma Import (Ada, Latch); begin X := Ident(Float'Base'Last) + Float'Base'Last; Report.Failed(NO EXCEPTION RAISED BY LARGE '+'); Latch (X); exception when others = NULL; end; The observable problem is that the + operation gets moved past the call to Report.Failed: from .095t.sink Sinking x_19 = D.1629_17 + 3.4028234663852885981170418348451692544e+38; from bb 15 to bb 18 ... bb 15: report_E$5_20 = report_E; if (report_E$5_20 == 0) goto bb 16; else goto bb 17; bb 16: .gnat_rcheck_PE_Access_Before_Elaboration (p.adb, 15); bb 17: D.1633.P_ARRAY = NO EXCEPTION RAISED BY LARGE \'+\'; D.1633.P_BOUNDS = *.LC0; report.failed (D.1633); bb 18: x_19 = D.1629_17 + 3.4028234663852885981170418348451692544e+38; p.latch (x_19); This was for a port configured to use our front-end sjlj exception scheme, using builtin_setjmp and builtin_longjmp to manage exception regions and propagations. This operates without -fexceptions at all, so stmt_could_throw_p is false in this case unfortunately. Well, but that's a bug in the Ada frontend if it does exceptions behind GCCs back. ISTR that the middle-end also supports a SJLJ EH scheme? Richard. Olivier
Re: [Patch AArch64] Fix for PR62040
On 20 August 2014 00:43, Carrot Wei car...@google.com wrote: Hi Current AArch64 backend can generate rtl expressions like (vec_duplicate:DI (const_int 0 [0])), which causes ICE in simplify_const_unary_operation because vec_duplicate should generate vector mode only. As suggested by Andrew in the bug entry, I split the original insn patterns to avoid scalar mode vec_duplicate expression. Passed regression tests on qemu without failure. OK for trunk and 4.9 branch? thanks Guozhi Wei 2014-08-19 Guozhi Wei car...@google.com PR target/62040 * config/aarch64/iterators.md (VQ_NO2E, VQ_2E): New iterators. * config/aarch64/aarch64-simd.md (move_lo_quad_internal_mode): Split it into two patterns. (move_lo_quad_internal_be_mode): Likewise. OK once the test case is approved. Thanks /Marcus
[committed] Add testcase from PR62294
This adds a testcase for PR62294 that I just fixed on the 4.9 branch. Tested on x86_64-linux, applying to trunk. 2014-09-03 Marek Polacek pola...@redhat.com PR c/62294 * gcc.dg/pr62294.c: New test. * gcc.dg/pr62294.h: New file. diff --git gcc/testsuite/gcc.dg/pr62294.c gcc/testsuite/gcc.dg/pr62294.c index e69de29..c6ec5a7 100644 --- gcc/testsuite/gcc.dg/pr62294.c +++ gcc/testsuite/gcc.dg/pr62294.c @@ -0,0 +1,10 @@ +/* PR c/62294 */ +/* { dg-do compile } */ + +#include pr62294.h + +void +fn (int *u) +{ + foo (u); /* { dg-error passing argument 1 of .bar. from incompatible pointer type } */ +} diff --git gcc/testsuite/gcc.dg/pr62294.h gcc/testsuite/gcc.dg/pr62294.h index e69de29..9be45ad 100644 --- gcc/testsuite/gcc.dg/pr62294.h +++ gcc/testsuite/gcc.dg/pr62294.h @@ -0,0 +1,3 @@ +#pragma GCC system_header +#define foo bar +extern void foo (float *); Marek
Re: fix gfcov regression
On 09/03/14 04:06, Dominique Dhumieres wrote: I've committed the patch now. It (r214840) breaks bootstrap on darwin: does this fix it? nathan 2014-09-03 Nathan sidwell nat...@acm.org * libgcov-interface.c (STRONG_ALIAS): Rename to ... (ALIAS): ... here. Make weak. Adjust uses. Index: libgcc/libgcov-interface.c === --- libgcc/libgcov-interface.c (revision 214840) +++ libgcc/libgcov-interface.c (working copy) @@ -42,11 +42,11 @@ void __gcov_dump (void) {} #else - /* Some functions we want to bind in this dynamic object, but have an - overridable global alias. */ -#define STRONG_ALIAS(src,dst) \ - extern __typeof (src) dst __attribute__((alias (#src))) + overridable global alias. Weak aliases are supported in more + places than non-weak, and is adequate for our needs. */ +#define ALIAS(src,dst) \ + extern __typeof (src) dst __attribute__((weak, alias (#src))) extern __gthread_mutex_t __gcov_flush_mx ATTRIBUTE_HIDDEN; extern __gthread_mutex_t __gcov_flush_mx ATTRIBUTE_HIDDEN; @@ -133,7 +133,7 @@ __gcov_reset_int (void) __gcov_root.dumped = 0; } -STRONG_ALIAS (__gcov_reset_int, __gcov_reset); +ALIAS (__gcov_reset_int, __gcov_reset); #endif /* L_gcov_reset */
Re: [PATCH PR62151]Fix uninitialized register issue caused by distribute_notes in combine pass
On Wed, Sep 03, 2014 at 10:34:39AM +0800, Bin.Cheng wrote: Now I guess this check could be relaxed if somewhere else in combine we'd recognize the substitution into a clobber and simply omit it in that case. Yeah. In the testcase, combine tries combining 76,77 (77 is that clobbering insn) and refuses it; then it tries 32,76,77 and refuses it; and then it tries 32,76,77,43 and allows it (it doesn't do this check at all, 77 is not i3, combine omits the clobber completely). Which is inconsistent. I guess it makes sense because this way it doesn't introduce any invalid instructions. But yes, how combine handles the clobber in this way may help combine the three instructions? Combine could just throw away all clobbers on i3 and add them back if wanted by the combined insn. I think it does things the way it does so that it creates slightly less garbage RTL and/or it is a little less work. But it does disallow this case; easily fixed, have patch, will post in a minute. That will hide your problem, but not really fix it I think. Maybe we need to disallow all combos that set the same register twice in four-insn combinations. Or at least when that register does not die. Why is this combination tried anyway? r84 (as set in 77, used in 43) does not die, so this is just doing a very expensive very limited form of un-cse? Segher
Re: [patch] prevent tree sinking of trapping stmts
On Sep 3, 2014, at 15:05 , Richard Biener richard.guent...@gmail.com wrote: Well, but that's a bug in the Ada frontend if it does exceptions behind GCCs back. I agree that this is a problem and that could_throw_p is a better predicate. I wasn't convinced by my own answer but hadn't really pinpointed why before hitting send. ISTR that the middle-end also supports a SJLJ EH scheme? Yes, it does. There's a long history behind the front-end sjlj scheme and it has remained in place for a number of reasons (compatibility with other libgcc's when mixing with c++ on some platforms, certification concerns, ...) Needs extra thought. Thanks for the exchange, Cheers, Olivier
Re: [patch] prevent tree sinking of trapping stmts
On Wed, Sep 3, 2014 at 3:25 PM, Olivier Hainque hain...@adacore.com wrote: On Sep 3, 2014, at 15:05 , Richard Biener richard.guent...@gmail.com wrote: Well, but that's a bug in the Ada frontend if it does exceptions behind GCCs back. I agree that this is a problem and that could_throw_p is a better predicate. I wasn't convinced by my own answer but hadn't really pinpointed why before hitting send. ISTR that the middle-end also supports a SJLJ EH scheme? Yes, it does. There's a long history behind the front-end sjlj scheme and it has remained in place for a number of reasons (compatibility with other libgcc's when mixing with c++ on some platforms, certification concerns, ...) Needs extra thought. Thanks for the exchange, Eventually the FE can still simply set flag_exeptions / flag_non_call_exceptions? It will still not be correct for all cases but at least external throw predicates would work ... Richard. Cheers, Olivier
[PATCH] combine: Allow substituting the target reg of a clobber
This came up when investigating PR62151. In that PR combine messes up a four-insn combination. It should really have done the combination of the first three insns in that. The second of those instructions sets a register; the third clobbers the same. Substituting the source of the set into the clobber usually results in invalid RTL so that the combination will not be valid; also, it confuses the undobuf machinery. can_combine_p rejects the combination for this reason. But we can simply make subst not substitute into clobbers of registers. Any unnecessary clobbers will be removed later anyway. With this patch, the three-insn combination in PR62151 is successful. It does likely not really solve the problem there though, it just hides it. Bootstrapped and regression checked on powerpc64-linux, options -m64,-m32,-m32/-mpowerpc64. Is this okay for mainline? Segher 2014-09-03 Segher Boessenkool seg...@kernel.crashing.org PR rtl-optimization/62151 * combine.c (can_combine_p): Allow the destination register of INSN to be clobbered in I3. (subst): Do not substitute into clobbers of registers. --- gcc/combine.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/combine.c b/gcc/combine.c index 60524b5..6a5dfbb 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -1950,11 +1950,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn *pred ATTRIBUTE_UNUSED, for (i = XVECLEN (PATTERN (i3), 0) - 1; i = 0; i--) if (GET_CODE (XVECEXP (PATTERN (i3), 0, i)) == CLOBBER) { - /* Don't substitute for a register intended as a clobberable -operand. */ rtx reg = XEXP (XVECEXP (PATTERN (i3), 0, i), 0); - if (rtx_equal_p (reg, dest)) - return 0; /* If the clobber represents an earlyclobber operand, we must not substitute an expression containing the clobbered register. @@ -4964,6 +4960,11 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy) || (REG_P (X) REG_P (Y) \ REGNO (X) == REGNO (Y) GET_MODE (X) == GET_MODE (Y))) + /* Do not substitute into clobbers of regs -- this will never result in + valid RTL. */ + if (GET_CODE (x) == CLOBBER REG_P (XEXP (x, 0))) +return x; + if (! in_dest COMBINE_RTX_EQUAL_P (x, from)) { n_occurrences++; -- 1.8.1.4
Re: [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
On Tue, 2 Sep 2014, Adhemerval Zanella wrote: Ping. On 19-08-2014 13:54, Adhemerval Zanella wrote: Ping. On 06-08-2014 17:21, Adhemerval Zanella wrote: On 01-08-2014 12:31, Joseph S. Myers wrote: On Thu, 31 Jul 2014, David Edelsohn wrote: Thanks for implementing the FENV support. The patch generally looks good to me. My one concern is a detail in the implementation of update. I do not have enough experience with GENERIC to verify the details and it seems like it is missing building an outer COMPOUND_EXPR containing update_mffs and the CALL_EXPR for update mtfsf. I suppose what's actually odd there is that you have + tree update_mffs = build2 (MODIFY_EXPR, void_type_node, old_fenv, call_mffs); + + tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, update_mffs); so you build a MODIFY_EXPR in void_type_node but then convert it with a VIEW_CONVERT_EXPR. If you'd built the MODIFY_EXPR in double_type_node then the VIEW_CONVERT_EXPR would be meaningful (the value of an assignment a = b being the new value of a), but reinterpreting a void value doesn't make sense. Or you could probably just use call_mffs directly in the VIEW_CONVERT_EXPR without explicitly creating the old_fenv variable. Thanks for the review Josephm. I have changed to avoid the void reinterpretation and use call_mffs directly. I have also removed the the mask generation in 'clear' from your previous message, it is now reusing the mas used in feholdexcept. The testcase patch is the same as before. Checked on both linux-powerpc64/powerpc64le and no regressions found. -- 2014-08-06 Adhemerval Zanella azane...@linux.vnet.ibm.com gcc: * config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): New function. gcc/testsuite: * gcc.dg/atomic/c11-atomic-exec-5.c (test_main_long_double_add_overflow): Define and run only for LDBL_MANT_DIG != 106. (test_main_complex_long_double_add_overflow): Likewise. (test_main_long_double_sub_overflow): Likewise. (test_main_complex_long_double_sub_overflow): Likewise. FWIW I pushed it through regression testing across my usual set of powerpc-linux-gnu multilibs with the results (for c11-atomic-exec-5.c) as follows: -mcpu=603e PASS -mcpu=603e -msoft-float UNSUPPORTED -mcpu=8540 -mfloat-gprs=single -mspe=yes -mabi=spe UNSUPPORTED -mcpu=8548 -mfloat-gprs=double -mspe=yes -mabi=spe UNSUPPORTED -mcpu=7400 -maltivec -mabi=altivec PASS -mcpu=e6500 -maltivec -mabi=altivec PASS -mcpu=e5500 -m64PASS -mcpu=e6500 -m64 -maltivec -mabi=altivecPASS (floating-point environment is of course unsupported for soft-float targets and for the SPE FPU another change is required to implement floating-point environment handling to complement one proposed here). No regressions otherwise. While at it, may I propose another change on top of this? I've noticed the test case is rather slow, it certainly takes much more time than the average one, I've seen elapsed times of well over a minute on reasonably fast hardware and occasionally a timeout midway through even though the test case was otherwise progressing just fine. I think lock contention or unrelated system activity such as hardware interrupts (think a busy network!) may contribute to it for systems using LL/SC loops for atomicity. So I think the default timeout that's used for really quick tests should be extended a bit. I propose a factor of 2, just not to make it too excessive, at least for the beginning (maybe it'll have to be higher eventually). OK? 2014-09-03 Maciej W. Rozycki ma...@codesourcery.com gcc/testsuite/ * gcc.dg/atomic/c11-atomic-exec-5.c (dg-timeout-factor): New setting. Maciej gcc-test-c11-atomic-exec-5-timeout-factor.diff Index: gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c === --- gcc-fsf-trunk-quilt.orig/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c 2014-09-02 17:34:06.718927043 +0100 +++ gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c 2014-09-03 14:51:12.958927233 +0100 @@ -9,6 +9,7 @@ /* { dg-additional-options -D_XOPEN_SOURCE=600 { target *-*-solaris2.1[0-9]* } } */ /* { dg-require-effective-target fenv_exceptions } */ /* { dg-require-effective-target pthread } */ +/* { dg-timeout-factor 2 } */ #include fenv.h #include float.h
Re: [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
Hello! While at it, may I propose another change on top of this? I've noticed the test case is rather slow, it certainly takes much more time than the average one, I've seen elapsed times of well over a minute on reasonably fast hardware and occasionally a timeout midway through even though the test case was otherwise progressing just fine. I think lock contention or unrelated system activity such as hardware interrupts (think a busy network!) may contribute to it for systems using LL/SC loops for atomicity. So I think the default timeout that's used for really quick tests should be extended a bit. I propose a factor of 2, just not to make it too excessive, at least for the beginning (maybe it'll have to be higher eventually). Or you can just lower the iteration count as I have to do for alpha. Uros.
[PATCH] Speedup -Og
I seem to have lost this patch in my dev tree for some reason. After introducing --param max-combine-insns it was the idea to restrict combine to two-insn combines for -Og as combine is a major RTL compile-time hog. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-09-03 Richard Biener rguent...@suse.de * opts.c (default_options_optimization): Adjust max-combine-insns to 2 for -Og. Index: gcc/opts.c === *** gcc/opts.c (revision 214810) --- gcc/opts.c (working copy) *** default_options_optimization (struct gcc *** 636,641 --- 637,648 default_param_value (PARAM_MIN_CROSSJUMP_INSNS), opts-x_param_values, opts_set-x_param_values); + /* Restrict the amount of work combine does at -Og while retaining + most of its useful transforms. */ + if (opts-x_optimize_debug) + maybe_set_param_value (PARAM_MAX_COMBINE_INSNS, 2, + opts-x_param_values, opts_set-x_param_values); + /* Allow default optimizations to be specified on a per-machine basis. */ maybe_default_options (opts, opts_set, targetm_common.option_optimization_table,
Re: [PATCH][AArch64] Fix wrong .cfi_def_cfa_offset in epilogue
On 03/09/14 11:33, Marcus Shawcroft wrote: On 20 August 2014 09:43, Jiong Wang jiong.w...@arm.com wrote: gcc/ * config/aarch64/aarch64.c (aarch64_expand_epilogue): Remove redundant cfa offset update. OK /Marcus thanks for review. this fix is included in Richard H's patch at https://gcc.gnu.org/ml/gcc-patches/2014-08/msg02234.html I'd drop my patch. -- Jiong
Re: [PATCH, testsuite]: Compile gcc.dg/20111227-?.c for x86 targets only.
On Sep 3, 2014, at 1:03 AM, Uros Bizjak ubiz...@gmail.com wrote: These testcases were intended to be compiled on x86 targets only [1]. Not a bug deal, but would a git mv bla gcc.target/i386 be more appropriate? 2014-09-03 Uros Bizjak ubiz...@gmail.com * gcc.dg/20111227-2.c: Compile only for x86 targets. * gcc.dg/20111227-3.c: Ditto. +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
Re: [PATCH] Don't init ira_spilled_reg_stack_slots in ira if using lra.
On 2014-09-03 5:26 AM, Kito Cheng wrote: ping! The patch saves some compilation time for LRA based targets. It is ok to commit. Thanks, Kito. On Wed, Aug 27, 2014 at 10:49 PM, Kito Cheng kito.ch...@gmail.com wrote: Hi all: This patch is clean up useless initialize for IRA with LRA. 2014-08-27 Kito Cheng k...@0xlab.org * ira.c (ira): Don't initialize ira_spilled_reg_stack_slots and ira_spilled_reg_stack_slots_num if using lra. (do_reload): Remove release ira_spilled_reg_stack_slots part. * ira-color.c (ira_sort_regnos_for_alter_reg): Add assertion to make sure not using lra. (ira_reuse_stack_slot): Likewise. (ira_mark_new_stack_slot): Likewise.
Re: [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
On Wed, 3 Sep 2014, Maciej W. Rozycki wrote: (floating-point environment is of course unsupported for soft-float targets and for the SPE FPU another change is required to implement floating-point environment handling to complement one proposed here). Support for SPE will depend on the C library just as soft-float support will, because of the need to have trapping on exceptions other than inexact enabled in the processor at all times with the kernel then using the prctl settings to determine whether that trap is for emulation or to produce SIGFPE. (The relevant support is in glibc 2.19 for soft-float and e500, in the form of __atomic_feholdexcept, __atomic_feclearexcept and __atomic_feupdateenv functions. I intend to implement the GCC side - conditional on being configured for glibc 2.19 or later on the target, as specified with --with-glibc-version or detected by configure's examination of target headers - once the hard-float support is in GCC. I believe the support in question will be identical for soft-float and e500, since it will be calling libc functions instead of manipulating processor state.) -- Joseph S. Myers jos...@codesourcery.com
Re: [FORTRAN PATCH] Two -Wlogical-not-parentheses fixes (PR fortran/62270)
On Tue, Sep 02, 2014 at 09:29:51PM +0200, Thomas Koenig wrote: Am 02.09.2014 17:32, schrieb Tobias Burnus: Marek Polacek wrote: This patch fixes the last two spots where -Wlogical-not-parentheses warns. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62270#c3 if you want more info about the changes. Bootstrapped/regtested on x86_64-linux, ok for trunk? Looks good to me. Thanks for the patch! As this commit fixes obvious errors for not-so-obvious cases, what about a backport? Yep, I'll apply the same patch to 4.9. And for 4.8 only the interface.c part. Marek
Re: [patch] prevent tree sinking of trapping stmts
On Sep 3, 2014, at 15:27 , Richard Biener richard.guent...@gmail.com wrote: Eventually the FE can still simply set flag_exeptions / flag_non_call_exceptions? It will still not be correct for all cases but at least external throw predicates would work ... Yes, these would work. I have always been worried about the potential unwanted effects and refrained from getting there so far. Now, it could well be that there are more or worse unwanted effects with not setting them today. The compiler has evolved quite a lot since the last time I looked into these. I'll re-examine ... Thanks again for your constructive feedback, Olivier
Re: [PATCH, testsuite]: Compile gcc.dg/20111227-?.c for x86 targets only.
On Wed, Sep 3, 2014 at 4:53 PM, Mike Stump mikest...@comcast.net wrote: On Sep 3, 2014, at 1:03 AM, Uros Bizjak ubiz...@gmail.com wrote: These testcases were intended to be compiled on x86 targets only [1]. Not a bug deal, but would a git mv bla gcc.target/i386 be more appropriate? I have considered this option, but the functionality we are testing here is not specific to x86. It just happens that the test depends on certain insn patterns, and these are implemented only in x86 for now. It is pointless to compile the test for other targets, OTOH - there is no reason that some less known or future target won't be able to pass this testcase. Uros.
Re: fix gfcov regression
does this fix it? The answer after a quick update is yes, further testing scheduled for tonight. Thanks, Dominique
Re: [PATCH][IRA] Analysis of register usage of functions for usage by IRA.
On 01-09-14 18:41, Ulrich Weigand wrote: Tom de Vries wrote: * ira-costs.c (ira_tune_allocno_costs): Use ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS to adjust costs. In debugging PR 53864 on s390x-linux, I ran into a weird change in behavior that occurs when the following part of this patch was checked in: - if (ira_hard_reg_set_intersection_p (regno, mode, call_used_reg_set) - || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode)) - cost += (ALLOCNO_CALL_FREQ (a) -* (ira_memory_move_cost[mode][rclass][0] - + ira_memory_move_cost[mode][rclass][1])); + crossed_calls_clobber_regs + = (ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a)); + if (ira_hard_reg_set_intersection_p (regno, mode, + *crossed_calls_clobber_regs)) + { + if (ira_hard_reg_set_intersection_p (regno, mode, + call_used_reg_set) + || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode)) + cost += (ALLOCNO_CALL_FREQ (a) +* (ira_memory_move_cost[mode][rclass][0] + + ira_memory_move_cost[mode][rclass][1])); #ifdef IRA_HARD_REGNO_ADD_COST_MULTIPLIER - cost += ((ira_memory_move_cost[mode][rclass][0] - + ira_memory_move_cost[mode][rclass][1]) - * ALLOCNO_FREQ (a) - * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2); + cost += ((ira_memory_move_cost[mode][rclass][0] + + ira_memory_move_cost[mode][rclass][1]) + * ALLOCNO_FREQ (a) + * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2); #endif + } Before that patch, this code would penalize all call-clobbered registers (if the alloca is used across a call), and it would penalize *all* registers in a target-dependent way if IRA_HARD_REGNO_ADD_COST_MULTIPLIER is defined; the latter is completely independent of the presence of any calls. However, after that patch, the IRA_HARD_REGNO_ADD_COST_MULTIPLIER penalty is only applied for registers clobbered by calls in this function. This seems a completely unrelated change, and looks just wrong to me ... Was this done intentionally or is this just an oversight? Ulrich, thanks for noticing this. I agree, this looks wrong, and is probably an oversight. [ It seems that s390 is the only target defining IRA_HARD_REGNO_ADD_COST_MULTIPLIER, so this problem didn't show up on any other target. ] I think attached patch fixes it. I've build the patch and ran the fuse-caller-save tests, and I'm currently bootstrapping and reg-testing it on x86_64. Can you check whether this patches fixes the issue for s390 ? Thanks, - Tom Bye, Ulrich diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c index 774a958..57239f5 100644 --- a/gcc/ira-costs.c +++ b/gcc/ira-costs.c @@ -2217,21 +2217,19 @@ ira_tune_allocno_costs (void) crossed_calls_clobber_regs = (ALLOCNO_CROSSED_CALLS_CLOBBERED_REGS (a)); if (ira_hard_reg_set_intersection_p (regno, mode, - *crossed_calls_clobber_regs)) - { - if (ira_hard_reg_set_intersection_p (regno, mode, + *crossed_calls_clobber_regs) + (ira_hard_reg_set_intersection_p (regno, mode, call_used_reg_set) - || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode)) - cost += (ALLOCNO_CALL_FREQ (a) - * (ira_memory_move_cost[mode][rclass][0] -+ ira_memory_move_cost[mode][rclass][1])); + || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode))) + cost += (ALLOCNO_CALL_FREQ (a) + * (ira_memory_move_cost[mode][rclass][0] + + ira_memory_move_cost[mode][rclass][1])); #ifdef IRA_HARD_REGNO_ADD_COST_MULTIPLIER - cost += ((ira_memory_move_cost[mode][rclass][0] - + ira_memory_move_cost[mode][rclass][1]) - * ALLOCNO_FREQ (a) - * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2); + cost += ((ira_memory_move_cost[mode][rclass][0] + + ira_memory_move_cost[mode][rclass][1]) + * ALLOCNO_FREQ (a) + * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2); #endif - } if (INT_MAX - cost reg_costs[j]) reg_costs[j] = INT_MAX; else -- 1.9.1
Re: [PATCH 4/4] aarch64: Don't duplicate calls_alloca check
On 09/03/2014 04:06 AM, Marcus Shawcroft wrote: On 22 August 2014 23:05, Richard Henderson r...@redhat.com wrote: Generic code already handles calls_alloca for determining the need for a frame pointer. * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Don't check calls_alloca. Ok Thanks /Marcus Thanks. I fixed up the conflicts with the rtx_insn patch set and committed all four patches squashed together. r~
Re: RFA: Document first operand to RTX_AUTOINC
On 09/02/14 12:00, Richard Sandiford wrote: As Jeff suggested here: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00390.html this patch documents that the first operand to an RTX_AUTOINC is the automodified register. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * doc/rtl.texi (RTX_AUTOINC): Document that the first operand is the automodified register. OK. jeff
Re: [debug-early] reuse variable DIEs and fix their context
[Jason, Richard]: Is it useful for my patches to contain ChangeLog entries? I find them mildly annoying for something that will inevitably be rewritten multiple times, but if it aids in reviewing my WIP, I am more than happy to continue including them. On 08/28/14 11:01, Jason Merrill wrote: On 08/28/2014 01:34 PM, Aldy Hernandez wrote: I wonder if instead of early dumping of all the DECLs, we could only dump the toplevel scoped DECLs, and let inheritance set the proper contexts. Yes, I think this makes a lot more sense; do it at a well-defined point in compilation rather than as part of free_lang_data. Great. It turned out, this was a cleaner approach as well. The problem being that to calculate `ext_block' above, we need intimate knowledge of scopes and such, only available in the FE. Is there a generic way of determining if a DECL is in global scope? Why not do it in the FE, i.e. *_write_global_declarations? This is what I've done in this patch. I'm no longer generating dwarf early from free_lang_data, instead I'm using the global_decl debug hook and adding an EARLY argument. Then I call it twice, once after the FE is done, and once after the full compilation has finished (cgraph has been generated, etc). The goal is to have the first pass generate the DIEs and the 2nd pass fill in location information and such. Generating the globals first solves the context issue. The recursive nature of generating DIEs gets everything right. For that matter, with the attached patch, I actually get *LESS* guality failures than before. Unexpected, but I'm not going to complain ;-). I have added a few (temporary) checks to make sure we're not regenerating DIEs when we already have one (at least for DECLs). These should go away after this work is incorporated into mainline. FYI, I am only handling C for now while we iron out the general idea. How does this look? Aldy diff --git a/gcc/ChangeLog.debug-early b/gcc/ChangeLog.debug-early index df571a4..980b655 100644 --- a/gcc/ChangeLog.debug-early +++ b/gcc/ChangeLog.debug-early @@ -1,3 +1,31 @@ +2014-09-03 Aldy Hernandez al...@redhat.com + + * c/c-decl.c (write_global_declarations_1): Call global_decl() + with early=true. + (write_global_declarations_2): Call global_decl() with + early=false. + * dbxout.c (dbxout_global_decl): New argument. + * debug.c (do_nothing_debug_hooks): Use debug_nothing_tree_bool + for global_decl hook. + (debug_nothing_tree_bool): New. + (struct gcc_debug_hooks): New argument to global_decl. + * dwarf2out.c (output_die): Add misc debugging information. + (gen_variable_die): Do not reparent children. + (dwarf2out_global_decl): Add new documentation. Add EARLY + argument. + (dwarf2out_decl): Make sure we don't generate new DIEs if we + already have a DIE. + * cp/name-lookup.c (do_namespace_alias): New argument to + global_decl debug hook. + * fortran/trans-decl.c (gfc_emit_parameter_debug_info): Same. + * godump.c (go_global_decl): Same. + * lto/lto-lang.c (lto_write_globals): Same. + * sdbout.c (sdbout_global_decl): Same. + * toplev.c (emit_debug_global_declarations): Same. + * vmsdbgout.c (vmsdbgout_global_decl): Same. + * tree.c (free_lang_data_in_decl): Do not call + dwarf2out_early_decl from here. + 2014-08-26 Aldy Hernandez al...@redhat.com * dwarf2out.c (struct die_struct): Add dumped_early field. diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index b4995a6..1e09404 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -10308,7 +10308,10 @@ c_write_global_declarations_1 (tree globals) while (reconsider); for (decl = globals; decl; decl = DECL_CHAIN (decl)) -check_global_declaration_1 (decl); +{ + check_global_declaration_1 (decl); + debug_hooks-global_decl (decl, /*early=*/true); +} } /* A subroutine of c_write_global_declarations Emit debug information for each @@ -10320,7 +10323,7 @@ c_write_global_declarations_2 (tree globals) tree decl; for (decl = globals; decl ; decl = DECL_CHAIN (decl)) -debug_hooks-global_decl (decl); +debug_hooks-global_decl (decl, /*early=*/false); } /* Callback to collect a source_ref from a DECL. */ diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c index ebcbb5c..45b3b99 100644 --- a/gcc/cp/name-lookup.c +++ b/gcc/cp/name-lookup.c @@ -3859,7 +3859,7 @@ do_namespace_alias (tree alias, tree name_space) /* Emit debug info for namespace alias. */ if (!building_stmt_list_p ()) -(*debug_hooks-global_decl) (alias); +(*debug_hooks-global_decl) (alias, /*early=*/false); } /* Like pushdecl, only it places X in the current namespace, diff --git a/gcc/dbxout.c b/gcc/dbxout.c index d856bdd..208cec9 100644 --- a/gcc/dbxout.c +++ b/gcc/dbxout.c @@ -325,7 +325,7 @@ static int dbxout_symbol_location (tree, tree, const char *, rtx); static void
Re: [PATCH libstdc++ v5] - Add xmethods for std::vector and std::unique_ptr
Ping. I am not sure if the OK to ping weekly applies to GCC patches as well. I apologize if it has to be longer. On Wed, Aug 27, 2014 at 3:58 PM, Jonathan Wakely jwakely@gmail.com wrote: On 27 August 2014 23:38, Siva Chandra wrote: You are probably already doing it, but just in case: are you using GDB 7.8 (or later, like ToT) ? You most likely are as otherwise the tests added by this patch will not be exercised. Yes, I'm testing with both 7.8 (where it should work) and an older version (where it should fail gracefully).
Re: [PATCH][IRA] Analysis of register usage of functions for usage by IRA.
Tom de Vries wrote: thanks for noticing this. I agree, this looks wrong, and is probably an oversight. [ It seems that s390 is the only target defining IRA_HARD_REGNO_ADD_COST_MULTIPLIER, so this problem didn't show up on any other target. ] I think attached patch fixes it. I've build the patch and ran the fuse-caller-save tests, and I'm currently bootstrapping and reg-testing it on x86_64. Thanks! Can you check whether this patches fixes the issue for s390 ? Yes, this (which is equivalent to a patch I had been using) does fix the s390 issue again. Just for my curiosity, why is the second condition (after ) needed in this clause in the first place? if (ira_hard_reg_set_intersection_p (regno, mode, +*crossed_calls_clobber_regs) +(ira_hard_reg_set_intersection_p (regno, mode, call_used_reg_set) - || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode)) If a register is in crossed_calls_clobber_regs, can it ever *not* be a call-clobbered register? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH 1/2, PR 61654] Handle newly truly expanded artificial_thunks
On 09/03/14 02:45, Martin Jambor wrote: Hi, I did not think it was possible, but it can happen that when duplicate_thunk_for_node creates a duplicate of a thunk which previously expand_thunk left alone to be expanded into assembly by the back end, the newly created thunk does get expanded by expand_thunk. When this happens, we end up with an un-analyzed node which triggers an assert later on. This patch deals with the situation by analyzing the newly expanded thunk. This revealed that DECL_ARGUMENTS were insufficiently copied for the new decl and it was sharing them with the old one. So this patch fixes this as well. Bootstrapped and tested on x86_64-linux and i686-linux (where the bug triggered), OK for trunk and the 4.9 branch? Thanks, Martin 2014-09-01 Martin Jambor mjam...@suse.cz PR ipa/61654 * cgraphclones.c (duplicate_thunk_for_node): Copy arguments of the new decl properly. Analyze the new thunk if it is expanded. gcc/testsuite/ * g++.dg/ipa/pr61654.C: New test. OK. Jeff
Re: [PATCH 2/2] Set analyzed flag of unexpanded thunks in expand_thunk
On 09/03/14 02:45, Martin Jambor wrote: Hi, this is a followup to my previous PR-fixing patch. At ever more places we currently do if (!node-expand_thunk (false, whatever)) node-analyzed = true; and we always set the flag when expand_thunk returns with false (it only can when the first parameter is false). So I thought it would be much nicer to set the analyzed flag in expand_thunk itself when it returns false, especially given that we probably want to set the flag at as few places as reasonably possible. Bootstrapped and tested on x86_64-linux. OK for trunk? Thanks, Martin 2014-09-01 Martin Jambor mjam...@suse.cz * cgraphunit.c (expand_thunk): If not expanding, set analyzed flag. (analyze): Do not set analyze flag if expand_thunk returns false;. (create_wrapper): Likewise. * cgraphclones.c (duplicate_thunk_for_node): Likewise. OK. Jeff
Re: [4.9] PR 62146
On 09/02/14 12:52, Easwaran Raman wrote: It turns out that the REG_EQUAL note is removed on a hoisted instruction (relevant code is in dead_or_predicable in ifcvt.c) if the source of the move instruction is not a function invariant. In this case, the source is a function invariant (constant) and so that doesn't kick in. I don't understand why this exemption for function invariant is there and the original thread in https://gcc.gnu.org/ml/gcc/2005-05/msg01710.html doesn't explain either. Should I just remove the REG_EQUAL notes of all hoisted instructions or are there cases where it is safe to leave the note? I suspect it was left in in an attempt to keep as many REG_EQUAL notes as possible as the notes can help later optimizations. But even those equivalences are not necessarily safe. I'm pretty sure the right thing to do here is just drop the note regardless of whether or not its an invariant. jeff
Re: [Patch, Fortran] Component declarations overwrite types of Cray Pointee variables
Fritz Reese wrote: The typespecs for Cray pointees are overwritten by the typespecs of components with the same name which are declared later. Here is a proposed patch from 4.8.3 (test case comments/ChangeLog descriptions are updated from the submission on bugzilla). The test case demonstrates the problem. OK. Thanks for the patch. I have committed it as Rev. 214891. FYI, I am currently working with my employer so any future changes I have can comply with GNU's legal requirements. That would be useful. In terms of complexity and number of line changes of the actual code, I think it still counts as trivial such that I have committed the patch without waiting for the copyright assignment. Also my mail client replaces tabs with spaces so I'm sorry for any whitespace issues. An alternative would be to attach the patch – possibly such that the mail client uses text/* as MIME code (e.g. using .txt as suffix, though several programs also handle other extensions). That way it shows up as text in most email programs and in the mail archive as text. 2014-09-02 Fritz Reese reese-fr...@zai.com PR fortran/62174 * gcc/testsuite/gfortran.dg/cray_pointers_11.f90: New. Two nits: I'd prefer if you placed both changelogs before the actual patch - not between different parts of the patch. And entries in ChangeLog files are relative to the file, i.e. in this case gfortran.dg/... instead of gcc/testsuite/gfortran.dg/ Tobias
Re: RFA: Merge definitions of get_some_local_dynamic_name
On 09/02/14 12:36, Richard Sandiford wrote: Several targets define a function like i386's get_some_local_dynamic_name. The function looks through the current output function and returns the first (arbitrary) local-dynamic symbol that it finds. The result can be used in a call to __tls_get_addr, since all local-dynamic symbols have the same base. This patch replaces the various target functions with a single generic one. The only difference between the implementations was that s390 checked for constant pool references while the others didn't need to (because they don't allow TLS symbols to be forced into the pool). Checking for constant pool references is unnecessary but harmless for the other ports. Also, the walk is needed only once per TLS-referencing output function, so it's hardly critical in terms of compile time. All uses of this function are in final. In general it wouldn't be safe to call the function earlier than that, since the symbol reference could in principle be deleted by any rtl pass. I've therefore cached it in a variable local to final rather than in cfun (which is where the ports used to cache it). Also, i386 was robust against uses of % in inline asm. The patch makes sure the other ports are too. Using % in inline asm would often be a mistake, but it should at least trigger a proper error rather than an ICE. Tested on x86_64-linux-gnu. Also tested by building cross compilers before and after the change on: alpha-linux-gnu powerpc64-linux-gnu s390x-linux-gnu sparc64-linux-gnu OK to install? Thanks, Richard gcc/ * output.h (get_some_local_dynamic_name): Declare. * final.c (some_local_dynamic_name): New variable. (get_some_local_dynamic_name): New function. (final_end_function): Clear some_local_dynamic_name. * config/alpha/alpha.c (machine_function): Remove some_ld_name. (get_some_local_dynamic_name, get_some_local_dynamic_name_1): Delete. (print_operand): Report an error if '%' is used inappropriately. * config/i386/i386.c (get_some_local_dynamic_name): Delete. (get_some_local_dynamic_name_1): Delete. * config/rs6000/rs6000.c (machine_function): Remove some_ld_name. (rs6000_get_some_local_dynamic_name): Delete. (rs6000_get_some_local_dynamic_name_1): Delete. (print_operand): Report an error if '%' is used inappropriately. * config/s390/s390.c (machine_function): Remove some_ld_name. (get_some_local_dynamic_name, get_some_local_dynamic_name_1): Delete. (print_operand): Assert that get_some_local_dynamic_name is nonnull. * config/sparc/sparc.c: Include rtl-iter.h. (machine_function): Remove some_ld_name. (sparc_print_operand): Report an error if '%' is used inappropriately. (get_some_local_dynamic_name, get_some_local_dynamic_name_1): Delete. OK. Jeff
Re: [PATCH] Enable -Wlogical-not-parentheses by -Wall
On 09/02/14 09:53, Marek Polacek wrote: Now that PR61271 and PR62270 have been fixed, we can enable -Wlogical-not-parentheses by -Wall. I think this warning proved useful. Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk? 2014-08-26 Marek Polacek pola...@redhat.com * doc/invoke.texi: Document that -Wlogical-not-parentheses is enabled by -Wall. c-family/ * c.opt (Wlogical-not-parentheses): Enable by -Wall. OK. I'm sure this is going to trip in someone's code and they'll complain. But I've always stated that warnings, particularly from -Wall, are not consistent from release to release. Thus someone using -Wall has to be prepared to fix their code at each release because we can and will add things to -Wall over time. jeff
Re: [PING][PATCH] Fix environment variables restoring in GCC testsuite.
On 09/01/14 03:09, Maxim Ostapenko wrote: Subject: [PATCH] Fix environment variables restoring in GCC testsuite. Date: Fri, 22 Aug 2014 14:39:16 +0400 From: Maxim Ostapenko m.ostape...@partner.samsung.com To: GCC Patches gcc-patches@gcc.gnu.org CC: Yury Gribov y.gri...@samsung.com, Slava Garbuzov v.garbu...@samsung.com Hi, When testing, I've noticed, that Asan-bootstrapped GCC should be executed with ASAN_OPTIONS=detect_leaks=0 because of memory leaks in GCC, reported by Leak Sanitizer. When I ran Asan test on Asan-bootstrapped GCC, some of them fail with memory leaks into GCC, even if Lsan is disabled. This caused by slightly wrong logic in saving/restoring env variables functionality in gcc-dg.exp (some tests override ASAN_OPTIONS and this env variable isn't restored correcty). This tiny patch seems to fix the issue. Tested on x86_64-pc-linux-gnu. Ok to commit? -Maxim env_restore.diff gcc/testsuite/ChangeLog: 2014-09-01 Max Ostapenkom.ostape...@partner.samsung.com * lib/gcc-dg.exp: Change pattern. OK. Jeff
Re: [RFC] Tweak gcc.c-torture/execute/pr39228.c
On 09/02/14 23:26, Kaz Kojima wrote: Oleg Endo oleg.e...@t-online.de wrote: -mieee should be the default on sh* and thus can be removed from the dg-options line, or is it not? If -mieee is still needed (for alpha) maybe it's better to use dg-additional-options instead? Sure. The attached is a revised one. Regards, kaz -- * gcc.c-torture/execute/pr39228.c: Use dg-additional-options instead of dg-options and remove sh*-*-* from its target list. Add inline keyword to test functions. Wouldn't we be better off moving this into execute/ieee? jeff
Re: [PATCH, Pointer Bounds Checker 23/x] Function split
On 08/18/14 09:55, Ilya Enkovich wrote: On 04 Jun 01:15, Jeff Law wrote: On 06/03/14 01:10, Ilya Enkovich wrote: Hi, This patch does not allow splitting in case bounds are returned until retutrned bounds are supported. It also propagates instrumentation marks for generated call and function. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-03 Ilya Enkovich ilya.enkov...@intel.com * ipa-split.c: Include tree-chkp.h. (consider_split): Do not split when return bounds. (split_function): Propagate Pointer Bounds Checker instrumentation marks. It's a hack. There's no reason we can't support this. So I'll approve on the condition that you do look at removing this limitation in the future. jeff I did some work for function splitting and now patch cover more cases. Now returned bounds are supported but it is not allowed to split producers of returned pointer and its bounds. Is it OK? Thanks, Ilya -- 2014-08-15 Ilya Enkovich ilya.enkov...@intel.com * ipa-split.c: Include tree-chkp.h. (find_retbnd): New. (consider_split): Do not split retbnd and retval producers. (split_function): Propagate Pointer Bounds Checker instrumentation marks and handle returned bounds. I don't think it's sufficient to just look at the SSA_NAME_DEFSTMT and verify that it's not in the header. You could easily have the SSA_NAME_DEF_STMT be a PHI which is in the same partition as the RETURN statement. One of the PHI arguments might be fed from a statement in the header, right? Don't you have to look at the entire set of definitions which directly and indirectly feed the return statement and verify that each and every one is in the same partition as the return statement? And if so, that makes me start to think the original hack wasn't so bad after all :-) jeff
Re: [PATCH, Pointer Bounds Checker 36/x] IPA pure const
On 08/18/14 08:47, Ilya Enkovich wrote: Hi, This small patch adds support for new reference type for IPA pure const analysis. Thanks, Ilya -- 2014-08-15 Ilya Enkovich ilya.enkov...@intel.com * ipa-pure-const.c (propagate_pure_const): Support IPA_REF_CHKP. OK. jeff
Re: [PATCH, Pointer Bounds Checker 37/x] Support va_arg_pack and va_arg_pack_len
On 08/18/14 09:03, Ilya Enkovich wrote: Hi, This patch adds support for va_arg_pack and va_arg_pack_len for instrumented functions into inliner. There are two things to do: 1) ignore bounds args when computing va_arg_pack_len 2) remove bounds args when expanding va_arg_pack in not instrumented call. Thanks, Ilya -- 2014-08-15 Ilya Enkovich ilya.enkov...@intel.com * tree-inline.c (copy_bb): Properly handle bounds in va_arg_pack and va_arg_pack_len. OK. Jeff
Re: [gomp4] Add tables generation
Hi! On Mon, 18 Aug 2014 20:07:59 +0400, Ilya Verbin iver...@gmail.com wrote: I discovered an issue in the LTO streaming out for target - currently any file (even without any pragma) compiled with -fopenmp/-fopenacc contains .gnu.target_lto_* sections. This increases the size of an object file and makes lto-wrapper to run mkoffload. Therefore, I propose to replace the condition before ipa_write_summaries: - if (flag_openacc || flag_openmp) + if ((flag_openacc || flag_openmp) !(vec_safe_is_empty (offload_funcs) vec_safe_is_empty (offload_vars))) But to do this, the offload_vars must be filled before the check (offload_funcs is already filled in expand_omp_target). Here is the updated patch. Bootstrap passed. OK for gomp-4_0-branch? On 13 Aug 20:19, Ilya Verbin wrote: Here is the updated patch. offload_funcs/vars are now declared in omp-low.h, the functions have a comment. Also it fixes the issue of offload_funcs/vars corruption by the garbage collector. OK for gomp-4_0-branch? --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -8906,6 +8909,9 @@ expand_omp_target (struct omp_region *region) DECL_STRUCT_FUNCTION (child_fn)-curr_properties = cfun-curr_properties; cgraph_add_new_function (child_fn, true); + /* Add the new function to the offload table. */ + vec_safe_push (offload_funcs, child_fn); + /* Fix the callgraph edges for child_cfun. Those for cfun will be fixed in a following pass. */ push_cfun (child_cfun); The same change needs to be done for OpenACC offloading; addressed in r214892: commit 9fb900482bd3bca9bfa89301e417174caabd7176 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Sep 3 19:10:43 2014 + Restore OpenACC offloading. gcc/ * omp-low.c (expand_oacc_offload): Add child_fn to offload_funcs. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@214892 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 4 gcc/omp-low.c | 3 +++ 2 files changed, 7 insertions(+) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 40688df..0c55814 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,7 @@ +2014-09-03 Thomas Schwinge tho...@codesourcery.com + + * omp-low.c (expand_oacc_offload): Add child_fn to offload_funcs. + 2014-08-19 Ilya Verbin ilya.ver...@intel.com * Makefile.in (GTFILES): Add omp-low.h. diff --git gcc/omp-low.c gcc/omp-low.c index 1ad98ab..6ed8239 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -5351,6 +5351,9 @@ expand_oacc_offload (struct omp_region *region) DECL_STRUCT_FUNCTION (child_fn)-curr_properties = cfun-curr_properties; cgraph_add_new_function (child_fn, true); + /* Add the new function to the offload table. */ + vec_safe_push (offload_funcs, child_fn); + /* Fix the callgraph edges for child_cfun. Those for cfun will be fixed in a following pass. */ push_cfun (child_cfun); Grüße, Thomas pgpr85p9Z8mFN.pgp Description: PGP signature
Re: [PATCH, Pointer Bounds Checker 24/x] PRE
On 08/18/14 07:02, Ilya Enkovich wrote: On 03 Jun 11:33, Richard Biener wrote: On Tue, Jun 3, 2014 at 9:13 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch preserves CALL_WITH_BOUNDS flag for calls during PRE. Ok. Richard. Merging with the trunk I found that op2 field of vn_reference_op_struct is now used to pass EH context for calls and there is no more free field to store with_bounds flag. So I added one. Does it look OK? Thanks, Ilya -- 2014-08-14 Ilya Enkovich ilya.enkov...@intel.com * tree-ssa-sccvn.h (vn_reference_op_struct): Transform opcode into bit field and add with_bounds field. * tree-ssa-sccvn.c (copy_reference_ops_from_call): Set with_bounds field for instrumented calls. * tree-ssa-pre.c (create_component_ref_by_pieces_1): Restore CALL_WITH_BOUNDS_P flag for calls. For consistency, make the bitfield 16 bits (see ipa-inline.h, tree-core.h and tree-ssa-sccvn.h. OK with that change. jeff
Re: [PATCH, Pointer Bounds Checker 9/x] Cgraph extension
On 07/24/14 03:59, Ilya Enkovich wrote: -- 2014-07-24 Ilya Enkovich ilya.enkov...@intel.com * cgraph.h (cgraph_thunk_info): Add add_pointer_bounds_args field. (cgraph_node): Add instrumented_version, orig_decl and instrumentation_clone fields. (symtab_alias_target): Allow IPA_REF_CHKP reference. * cgraph.c (cgraph_remove_node): Fix instrumented_version of the referenced node if any. (dump_cgraph_node): Dump instrumentation_clone and instrumented_version fields. (verify_cgraph_node): Check correctness of IPA_REF_CHKP references and instrumentation thunks. * cgraphbuild.c (rebuild_cgraph_edges): Rebuild IPA_REF_CHKP reference. (cgraph_rebuild_references): Likewise. * cgraphunit.c (assemble_thunks_and_aliases): Skip thunks calling instrumneted function version. * ipa-ref.h (ipa_ref_use): Add IPA_REF_CHKP. (ipa_ref): increase size of use field. * ipa-ref.c (ipa_ref_use_name): Add element for IPA_REF_CHKP. * lto-cgraph.c (lto_output_node): Output instrumentation_clone, thunk.add_pointer_bounds_args and orig_decl field. (lto_output_ref): Adjust to new ipa_ref::use field size. (input_overwrite_node): Read instrumentation_clone field. (input_node): Read thunk.add_pointer_bounds_args and orig_decl fields. (input_ref): Adjust to new ipa_ref::use field size. (input_cgraph_1): Compute instrumented_version fields and restore IDENTIFIER_TRANSPARENT_ALIAS chains. * lto-streamer.h (LTO_minor_version): Change minor version from 0 to 1. * ipa.c (symtab_remove_unreachable_nodes): Consider instrumented clone as address taken if the original one is address taken. (cgraph_externally_visible_p): Mark instrumented 'main' as externally visible. (function_and_variable_visibility): Filter instrumentation thunks. Thanks for adding the additional checking. @@ -513,6 +517,11 @@ cgraph_rebuild_references (void) ipa_record_stmt_references (node, gsi_stmt (gsi)); } record_eh_tables (node, cfun); + + + if (node-instrumented_version + !node-instrumentation_clone) +ipa_record_reference (node, node-instrumented_version, IPA_REF_CHKP, NULL); } Trivial nit here -- just one vertical space here. With that nit fixed, this is OK. jeff
Re: [PATCH] Enable -Wlogical-not-parentheses by -Wall
On Wed, Sep 03, 2014 at 12:53:21PM -0600, Jeff Law wrote: On 09/02/14 09:53, Marek Polacek wrote: Now that PR61271 and PR62270 have been fixed, we can enable -Wlogical-not-parentheses by -Wall. I think this warning proved useful. Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk? 2014-08-26 Marek Polacek pola...@redhat.com * doc/invoke.texi: Document that -Wlogical-not-parentheses is enabled by -Wall. c-family/ * c.opt (Wlogical-not-parentheses): Enable by -Wall. OK. I'm sure this is going to trip in someone's code and they'll complain. But I've always stated that warnings, particularly from -Wall, are not consistent from release to release. Thus someone using -Wall has to be prepared to fix their code at each release because we can and will add things to -Wall over time. Yeah. Fortunately the warning shouldn't have (m)any false positives. Marek
Re: Enable EBX for x86 in 32bits PIC code
On 2014-08-29 2:47 AM, Ilya Enkovich wrote: Seems your patch doesn't cover all cases. Attached is a modified patch (with your changes included) and a test where double constant is wrongly rematerialized. I also see in ira dump that there is still a copy of PIC reg created: Initialization of original PIC reg: (insn 23 22 24 2 (set (reg:SI 127) (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) ... Copy is created: (insn 135 37 25 3 (set (reg:SI 138 [127]) (reg:SI 127)) 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 127) (nil))) ... Copy is used: (insn 119 25 122 3 (set (reg:DF 134) (mem/u/c:DF (plus:SI (reg:SI 138 [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} (expr_list:REG_EQUIV (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11]) (nil))) The copy is created by a newer IRA optimization for function prologues. The patch in the attachment should solve the problem. I also added the code to prevent spilling the pic pseudo in LRA which could happen before theoretically. After reload we have new usage of r127 which is allocated to ecx which actually does not have any definition in this function at all. (insn 151 42 44 4 (set (reg:SI 0 ax [147]) (plus:SI (reg:SI 2 cx [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF test.cc:44 213 {*leasi} (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2]) (nil))) (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129]) (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (mem/u/c:DF (reg:SI 0 ax [147]) [5 S8 A64]))) test.cc:44 790 {*fop_df_comm_sse} (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11])) (nil))) Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc Index: ira.c === --- ira.c (revision 214576) +++ ira.c (working copy) @@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void) FOR_BB_INSNS (first, insn) { rtx dest = interesting_dest_for_shprep (insn, call_dom); - if (!dest) + if (!dest || dest == pic_offset_table_rtx) continue; rtx newreg = NULL_RTX; Index: lra-assigns.c === --- lra-assigns.c (revision 214576) +++ lra-assigns.c (working copy) @@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse } /* Spill pseudos. */ EXECUTE_IF_SET_IN_BITMAP (spill_pseudos_bitmap, 0, spill_regno, bi) - if ((int) spill_regno = lra_constraint_new_regno_start -! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) -! bitmap_bit_p (lra_split_regs, spill_regno) -! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) -! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno)) + if ((pic_offset_table_rtx != NULL + spill_regno == REGNO (pic_offset_table_rtx)) + || ((int) spill_regno = lra_constraint_new_regno_start +! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) +! bitmap_bit_p (lra_split_regs, spill_regno) +! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) +! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno))) goto fail; insn_pseudos_num = 0; if (lra_dump_file != NULL) @@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky return; } for (n = 0, i = FIRST_PSEUDO_REGISTER; i max_regno; i++) -if (reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) +if ((pic_offset_table_rtx == NULL_RTX +|| i != (int) REGNO (pic_offset_table_rtx)) +reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) sorted_pseudos[n++] = i; qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func); for (i = n - 1; i = 0; i--) @@ -1360,6 +1364,8 @@ assign_by_spills (void) } EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) { + gcc_assert (pic_offset_table_rtx == NULL + || conflict_regno != REGNO (pic_offset_table_rtx)); if ((int) conflict_regno = lra_constraint_new_regno_start) sorted_pseudos[nfails++] = conflict_regno; if (lra_dump_file != NULL)
Re: [C++ Patch] PR 58102 aka DR 1405
On 09/03/2014 06:53 AM, Paolo Carlini wrote: The issue, AFAICS, boils down to the difference itself between cxx_eval_outermost_constant_expr and cxx_eval_constant_expression: changing constant_value_1 means that in principle all the calls of the latter (for VAR_DECLs) are impacted. Oh, right. Thus, for example, for the call at the beginning of cxx_eval_component_reference: struct A { int i; mutable int j; }; constexpr A a = { 0, 1 }; constexpr int i = a.i; how do we avoid emitting a wrong error for the a of a.i? Perhaps when we get the value of a we replace the mutable initializer with a magic mutable value and then make sure what we return from _outermost_ doesn't contain it? Jason
Re: RFC: Patch for switch elimination (PR 54742)
On 08/13/14 03:44, Richard Biener wrote: I don't see that this pass should scrog a loop beyond repair. Btw, the proper way of just fixing loops up (assuming that all loop headers are still at their appropriate place) is to _just_ do loops_set_state (LOOPS_NEED_FIXUP). This pass can quite easily create new irreducible loops and non-nested loops, the pass may take a previously well defined natural loop and make it irreducible. When I worked on it, I didn't see any reasonable way to update the loop structures. But still this is in theory very bad as you cause important annotations to be lost. If the loop is truly gone, ok, but if it just re-materializes then you've done a bad job here. Consider the case where a loop becomes a loop nest - you'd want to preserve the loop header as the header of the outer loop (which you'd have to identify its header in some way - dominator checks to the rescue!) and let fixup discover the new inner loop. While I'd love to be able to be able to update and DTRT here, I just couldn't see a way to do it. And while I hate losing the loop structure and missed-optimizations that it may lead to later, I judged the benefit of removing the multi-way branch to be so beneficial that it outweighed the losses elsewhere. Yes, we may have little utility for dealing with the more complex cases and I've been hesitant to enforce not dropping loops on the floor an ICE (well, mainly because we can't even bootstrap with such check ...), but in the end we should arrive there. It'd certainly be nice. I really don't like the idea of dropping loops on the floor. If we can get there, great, but I suspect you'll find its harder than expected. jeff
Re: [PATCH][IRA] Analysis of register usage of functions for usage by IRA.
On 03-09-14 20:12, Ulrich Weigand wrote: Just for my curiosity, why is the second condition (after ) needed in this clause in the first place? if (ira_hard_reg_set_intersection_p (regno, mode, + *crossed_calls_clobber_regs) + (ira_hard_reg_set_intersection_p (regno, mode, call_used_reg_set) - || HARD_REGNO_CALL_PART_CLOBBERED (regno, mode)) If a register is in crossed_calls_clobber_regs, can it ever*not* be a call-clobbered register? I *think* you're right that the second condition is not needed. But I'll leave that for a follow-up patch. Thanks, - Tom
Re: [PATCH libstdc++ v5] - Add xmethods for std::vector and std::unique_ptr
On 03/09/14 11:01 -0700, Siva Chandra wrote: Ping. I am not sure if the OK to ping weekly applies to GCC patches as well. I apologize if it has to be longer. I was waiting to see which version of the patch actually works, so that users can use the xmethods. There's no point committing the patch if they aren't installed and can't be used! Also, Tom had objections to the most recent patch, which haven't been addressed. I suppose we can fix that later.
Re: Make many more options use CPP()
On Sat, 30 Aug 2014, Manuel L?pez-Ib??ez wrote: gcc/ChangeLog: 2014-08-30 Manuel L?pez-Ib??ez m...@gcc.gnu.org * doc/options.texi: Document that Var and Init are required if CPP is given. * optc-gen.awk: Require Var and Init if CPP is given. * common.opt (Wpedantic): Use Init. libcpp/ChangeLog: 2014-08-30 Manuel L?pez-Ib??ez m...@gcc.gnu.org * macro.c (replace_args): Use cpp_pedwarning, cpp_warning and CPP_W flags. * include/cpplib.h: Add CPP_W_C90_C99_COMPAT and CPP_W_PEDANTIC. * init.c (cpp_create_reader): Do not init to -1 here. * expr.c (num_binary_op): Use cpp_pedwarning. gcc/c-family/ChangeLog: 2014-08-30 Manuel L?pez-Ib??ez m...@gcc.gnu.org * c.opt (Wc90-c99-compat,Wc++-compat,Wcomment,Wendif-labels, Winvalid-pch,Wlong-long,Wmissing-include-dirs,Wmultichar,Wpedantic, (Wdate-time,Wtraditional,Wundef,Wvariadic-macros): Add CPP, Var and Init. * c-opts.c (c_common_handle_option): Do not handle here. (sanitize_cpp_opts): Likewise. * c-common.c (struct reason_option_codes_t): Handle CPP_W_C90_C99_COMPAT and CPP_W_PEDANTIC. gcc/testsuite/ChangeLog: 2014-08-30 Manuel L?pez-Ib??ez m...@gcc.gnu.org * gcc.dg/cpp/endif-pedantic2.c: More general options do not override specific ones, but specific ones do. OK. -- Joseph S. Myers jos...@codesourcery.com
PATCH for Re: New GCC mirror
On Fri, 29 Aug 2014, ConcertPass Mirrors Admin wrote: we set up a new GCC mirror for the community. URL: http://mirrors.concertpass.com/gcc/ Organization/Contact: ConcertPass (ad...@mirrors.concertpass.com) Location: United States, Michigan Please, add it to your mirror list page. Done thusly. Note that your page claims this to be a Sudo mirror; you may want to make that read GCC mirror, or better gcc.gnu.org mirror. (Out of curiosity, this is not for the sake of search engine optimization, is it?) Gerald Index: mirrors.html === RCS file: /cvs/gcc/wwwdocs/htdocs/mirrors.html,v retrieving revision 1.226 diff -u -r1.226 mirrors.html --- mirrors.html28 Jul 2014 23:02:56 - 1.226 +++ mirrors.html31 Aug 2014 11:14:17 - @@ -57,6 +57,7 @@ a href=http://mirrors-usa.go-parts.com/gcc/;http://mirrors-usa.go-parts.com/gcc/a | a href=ftp://mirrors-usa.go-parts.com/gcc;ftp://mirrors-usa.go-parts.com/gcc/a | a href=rsync://mirrors-usa.go-parts.com/gccrsync://mirrors-usa.go-parts.com/gcc/a/li +liUS, Michigan: a href=http://mirrors.concertpass.com/gcc/;http://mirrors.concertpass.com/gcc//a, thanks to ad...@mirrors.concertpass.com./li /ul pThe archives there will be signed by one of the following GnuPG keys:/p
Re: [PATCH libstdc++ v5] - Add xmethods for std::vector and std::unique_ptr
On Wed, Sep 3, 2014 at 3:35 PM, Jonathan Wakely jwak...@redhat.com wrote: I was waiting to see which version of the patch actually works, so that users can use the xmethods. There's no point committing the patch if they aren't installed and can't be used! Doesn't the latest version of the patch I posted work for you: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg02516.html If not, what errors or missing things do you see? Also, Tom had objections to the most recent patch, which haven't been addressed. I am a bit confused about his suggestion and hence I asked if it can be addressed as a follow up. It seems to me like his suggestion is a code cleanup, but I am not really sure what the aim of the cleanup is. To avoid mixing up discussing that versus the aim the patch in question, I am asking if it can addressed as a follow up. Thanks, Siva Chandra
Re: [RFC] Tweak gcc.c-torture/execute/pr39228.c
Jeff Law l...@redhat.com wrote: * gcc.c-torture/execute/pr39228.c: Use dg-additional-options instead of dg-options and remove sh*-*-* from its target list. Add inline keyword to test functions. Wouldn't we be better off moving this into execute/ieee? I've tried it and found that all execute/ieee tests are compiled with -fno-inline. Perhaps it's a reason for not having moved that test into there, though I could be wrong about that. Regards, kaz
RE: [Patch ARM] Fix PR target/56846
Ping? -Original Message- From: Tony Wang [mailto:tony.w...@arm.com] Sent: Monday, August 25, 2014 6:33 PM To: 'gcc-patches@gcc.gnu.org'; 'd...@debian.org'; 'aph-...@littlepinkcloud.com'; Richard Earnshaw; Ramana Radhakrishnan Subject: [Patch ARM] Fix PR target/56846 Hi all, The bug is reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56846, and it’s about the problem that when exception handler is involved in the function, then _Unwind_Backtrace function will run into deadloop on arm target. Cmd line: arm-none-eabi-g++ -mthumb -mcpu=cortex-m3 -O0 -g -std=c++11 -specs=rdimon.specs main.c -o main.exe #include unwind.h #include stdio.h _Unwind_Reason_Code trace_func(struct _Unwind_Context * context, void* arg) { void *ip = (void *)_Unwind_GetIP(context); printf(Address: %p\n, ip); return _URC_NO_REASON; } void bar() { puts(This is in bar); _Unwind_Backtrace((_Unwind_Trace_Fn)trace_func, 0); } void foo() { try { bar(); } catch (...) { puts(Exception); } } The potential of such a bug is discussed long time ago in mail: https://gcc.gnu.org/ml/gcc/2007- 08/msg00235.html. Basically, as the ARM EHABI does not define how to implement the Unwind_Backtrace, Andrew give control to the personality routine to unwind the stack, and use the unwind state combination of “_US_VIRTUAL_UNWIND_FRAME | _US_FORCE_UNWIND” to represent that the caller is asking the personality routine to only unwind the stack for it. However, the pr in the libstdc++-v3 doesn’t handle such a unwind state pattern correctly. When the backtrace function passes such a pattern to it, it will still return _URC_HANDLER_FOUND to the caller in some cases. It’s because the pr will think that the _Unwind_Backtrace is raising a none type exception to it, so if the exception handler in current stack frame can catch anything(like catch(…)), the pr will return _URC_HANDLER_FOUND to the caller and ask for next step. But definitely, the unwind backtrace function don’t know what to do when pr return an exception handler to it. So this patch just evaluate such a unwind state pattern at the beginning of the personality routine in libstdc++-v3, if we meet with “_US_VIRTUAL_UNWIND_FRAME | _US_FORCE_UNWIND”, then we directly call macro CONTINUE_UNWINDING to unwind the stack and return. Is this a reasonable fix? gcc/libstdc++-v3/ChangeLog: 2014-8-25 Tony Wang tony.w...@arm.com PR target/56846 * libsupc++/eh_personality.cc: Return with CONTINUE_UNWINDING when meet with the unwind state pattern: _US_VIRTUAL_UNWIND_FRAME | _US_FORCE_UNWIND diff --git a/libstdc++-v3/libsupc++/eh_personality.cc b/libstdc++-v3/libsupc++/eh_personality.cc index f315a83..c2b30e9 100644 --- a/libstdc++-v3/libsupc++/eh_personality.cc +++ b/libstdc++-v3/libsupc++/eh_personality.cc @@ -378,6 +378,11 @@ PERSONALITY_FUNCTION (int version, switch (state _US_ACTION_MASK) { case _US_VIRTUAL_UNWIND_FRAME: + // If the unwind state pattern is _US_VIRTUAL_UNWIND_FRAME | + // _US_FORCE_UNWIND, we don't need to search for any handler + // as it is not a real exception. Just unwind the stack. + if (state _US_FORCE_UNWIND) + CONTINUE_UNWINDING; actions = _UA_SEARCH_PHASE; break;
RE: [PATCH 1/3,ARM,libgcc]Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc
Ping 2? -Original Message- From: Tony Wang [mailto:tony.w...@arm.com] Sent: Thursday, August 28, 2014 2:02 PM To: 'gcc-patches@gcc.gnu.org' Cc: Richard Earnshaw; Ramana Radhakrishnan Subject: RE: [PATCH 1/3,ARM,libgcc]Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc Ping? -Original Message- From: Tony Wang [mailto:tony.w...@arm.com] Sent: Thursday, August 21, 2014 2:15 PM To: 'gcc-patches@gcc.gnu.org' Subject: [PATCH 1/3,ARM,libgcc]Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc Hi there, In libgcc the file ieee754-sf.S and ieee754-df.S have some function pairs which will be bundled into one .o file and sharing the same .text section. For example, the fmul and fdiv, the libgcc makefile will build them into one .o file and archived into libgcc.a. So when user only call single float point multiply functions, the fdiv function will also be linked, and as fmul and fdiv share the same .text section, linker option --gc-sections or -flot can't remove the dead code. So this optimization just separates the function pair(fmul/fdiv and dmul/ddiv) into different sections, following the naming pattern of -ffunction-sections(.text.__functionname), through which the unused sections of fdiv/ddiv can be eliminated through option --gcc-sections when users only use fmul/dmul.The solution is to add a conditional statement in the macro FUNC_START, which will conditional change the section of a function from .text to .text.__\name. when compiling with the L_arm_muldivsf3 or L_arm_muldivdf3 macro. GCC regression test has been done on QEMU for Cortex-M3. No new regressions when turn on this patch. The code reduction for thumb2 on cortex-m3 is: 1. When user only use single float point multiply: fmul+fdiv = fmul will have a code size reduction of 318 bytes. 2. When user only use double float point multiply: dmul+ddiv = dmul will have a code size reduction of 474 bytes. Ok for trunk? BR, Tony Step 1: Provide another option: sp-scetion to control whether to split the section of a function pair into two part. gcc/libgcc/ChangeLog: 2014-08-21 Tony Wang tony.w...@arm.com * config/arm/lib1funcs.S (FUNC_START): Add conditional section redefine for macro L_arm_muldivsf3 and L_arm_muldivdf3 (SYM_END, ARM_SYM_START): Add macros used to expose function Symbols diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S index b617137..0f87111 100644 --- a/libgcc/config/arm/lib1funcs.S +++ b/libgcc/config/arm/lib1funcs.S @@ -418,8 +418,12 @@ SYM (\name): #define THUMB_SYNTAX #endif -.macro FUNC_START name +.macro FUNC_START name sp_section= + .ifc \sp_section, function_section + .section.text.__\name,ax,%progbits + .else .text + .endif .globl SYM (__\name) TYPE (__\name) .align 0 @@ -429,14 +433,24 @@ SYM (\name): SYM (__\name): .endm +.macro ARM_SYM_START name + TYPE (\name) + .align 0 +SYM (\name): +.endm + +.macro SYM_END name + SIZE (\name) +.endm + /* Special function that will always be coded in ARM assembly, even if in Thumb-only compilation. */ #if defined(__thumb2__) /* For Thumb-2 we build everything in thumb mode. */ -.macro ARM_FUNC_START name - FUNC_START \name +.macro ARM_FUNC_START name sp_section= + FUNC_START \name \sp_section .syntax unified .endm #define EQUIV .thumb_set @@ -467,8 +481,12 @@ _L__\name: #ifdef __ARM_ARCH_6M__ #define EQUIV .thumb_set #else -.macro ARM_FUNC_START name +.macro ARM_FUNC_START name sp_section= + .ifc \sp_section, function_section + .section.text.__\name,ax,%progbits + .else .text + .endif .globl SYM (__\name) TYPE (__\name) .align 0
RE: [PATCH 2/3,ARM,libgcc]Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc
Ping 2? -Original Message- From: Tony Wang [mailto:tony.w...@arm.com] Sent: Thursday, August 28, 2014 2:02 PM To: 'gcc-patches@gcc.gnu.org' Cc: Richard Earnshaw; Ramana Radhakrishnan Subject: RE: [PATCH 2/3,ARM,libgcc]Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc Ping? -Original Message- From: Tony Wang [mailto:tony.w...@arm.com] Sent: Thursday, August 21, 2014 2:15 PM To: 'gcc-patches@gcc.gnu.org' Subject: [PATCH 2/3,ARM,libgcc]Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc Step 2: Mark all the symbols around the fragment boundaries as function symbols, so as to generate veneer when the two section is too far away from each other. Also, I have both manually and using some test cases to verify that IP and PSR are not alive at such point. gcc/libgcc/ChangeLog: 2014-8-21 Tony Wang tony.w...@arm.com * config/arm/ieee754-sf.S: Expose symbols around fragment boundaries as function symbols. * config/arm/ieee754-df.S: Same with above BR, Tony libgcc_mul_div_code_size_reduction_2.diff Description: Binary data
RE: [PATCH 3/3,ARM,libgcc]Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc
Ping 2? -Original Message- From: Tony Wang [mailto:tony.w...@arm.com] Sent: Thursday, August 28, 2014 2:02 PM To: 'gcc-patches@gcc.gnu.org' Cc: Richard Earnshaw; Ramana Radhakrishnan Subject: RE: [PATCH 3/3,ARM,libgcc]Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc Ping? -Original Message- From: Tony Wang [mailto:tony.w...@arm.com] Sent: Thursday, August 21, 2014 2:15 PM To: 'gcc-patches@gcc.gnu.org' Subject: [PATCH 3/3,ARM,libgcc]Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc Step 3: Test cases to verify the code size reduction. gcc/gcc/testsuite/ChangeLog: 2014-08-21 Tony Wang tony.w...@arm.com * gcc.target/arm/size-optimization-ieee-1.c: New test case * gcc.target/arm/size-optimization-ieee-2.c: New test case * lib/gcc-dg.exp: Add new function scan-symbol-common, scan-symbol-yes, scan-symbol-no to scan a user defined symbol in final elf file BR, Tony diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c b/gcc/testsuite/gcc.target/arm/size- optimization-ieee-1.c new file mode 100644 index 000..46e9cdf --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c @@ -0,0 +1,30 @@ +/* { dg-do link { target { arm_thumb2_ok } } } */ +/* { dg-options -Wl,--gc-sections } */ +int +foo () +{ + volatile float a; + volatile float b; + volatile float c = a * b; + return 0; +} + +int +bar () +{ + volatile double a; + volatile double b; + volatile double c = a * b; + return 0; +} + +int +main () +{ + foo (); + bar (); + return 0; +} +/* { dg-final { scan-symbol-no __aeabi_fdiv } } */ +/* { dg-final { scan-symbol-no __aeabi_ddiv } } */ + diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c b/gcc/testsuite/gcc.target/arm/size- optimization-ieee-2.c new file mode 100644 index 000..5007d62 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c @@ -0,0 +1,30 @@ +/* { dg-do link { target { arm_thumb2_ok } } } */ +/* { dg-options -Wl,--gc-sections } */ +int +foo () +{ + volatile float a; + volatile float b; + volatile float c = a / b; + return 0; +} + +int +bar () +{ + volatile double a; + volatile double b; + volatile double c = a / b; + return 0; +} + +int +main () +{ + foo (); + bar (); + return 0; +} +/* { dg-final { scan-symbol-yes __aeabi_fmul } } */ +/* { dg-final { scan-symbol-yes __aeabi_dmul } } */ + diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp index 3390caa..0d52e95 100644 --- a/gcc/testsuite/lib/gcc-dg.exp +++ b/gcc/testsuite/lib/gcc-dg.exp @@ -880,5 +880,57 @@ proc gdb-exists { args } { return 0; } +# Scan the OUTPUT_FILE for a symbol. Return 1 if it present, or +# return 0 if it doesn't present + +proc scan-symbol-common { args } { +global nm +global base_dir + +set testcase [testname-for-summary] +set output_file [file rootname [file tail $testcase]].exe + +# Find nm like we find g++ in g++.exp. +if ![info exists nm] { +set nm [findfile $base_dir/../../../binutils/nm \ +$base_dir/../../../binutils/nm \ +[findfile $base_dir/../../nm $base_dir/../../nm \ + [findfile $base_dir/nm $base_dir/nm \ + [transform nm +verbose -log nm is $nm +} + +if { $output_file == } { +fail scan-symbol-not $args: dump file does not exist +return +} + +set fd [open | $nm $output_file r] +set text [read $fd] +close $fd + +if [regexp -- [lindex $args 0] $text] { +return 1 +} else { +return 0 +} +} + +proc scan-symbol-yes { args } { +if { [scan-symbol-common $args] == 1 } { + pass scan-symbol-yes $args exists +} else { + fail scan-symbol-yes $args does not exist +} +} + +proc scan-symbol-no { args } { +if { [scan-symbol-common $args] != 1 } { +pass scan-symbol-no $args does not exist +} else { +fail scan-symbol-no $args exists +} +} + set additional_prunes set dg_runtest_extra_prunes
[Patch, gcc, testsuite]Disable xordi3-opt.c/iordi3-opt.c on thumb1 target
Hi there, This is a test case clean up patch, because orr/eor instruction for thumb1 has only two variant: ORRS Rdn, Rm ORRc Rdn, Rm No shift is available for thumb1 encoding, so test case xordi3-opt.c/iordi3-opt.c is invalid for thumb1 target. This patch just disabled them for thumb1 target. Ok for the trunk? gcc/gcc/testsuite/ChangeLog: 2014-09-04 Tony Wang tony.w...@arm.com * gcc.target/arm/xordi3-opt.c: Disable this test case for thumb1 target. * gcc.target/arm/iordi3-opt.c: Ditto. diff --git a/gcc/testsuite/gcc.target/arm/iordi3-opt.c b/gcc/testsuite/gcc.target/arm/iordi3-opt.c index b3f465b..63fbe0b 100644 --- a/gcc/testsuite/gcc.target/arm/iordi3-opt.c +++ b/gcc/testsuite/gcc.target/arm/iordi3-opt.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target { arm_arm_ok || arm_thumb2_ok} } } */ /* { dg-options -O1 } */ unsigned long long or64 (unsigned long long input) diff --git a/gcc/testsuite/gcc.target/arm/xordi3-opt.c b/gcc/testsuite/gcc.target/arm/xordi3-opt.c index 7e031c3..53b2bab 100644 --- a/gcc/testsuite/gcc.target/arm/xordi3-opt.c +++ b/gcc/testsuite/gcc.target/arm/xordi3-opt.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target { arm_arm_ok || arm_thumb2_ok} } } */ /* { dg-options -O1 } */ unsigned long long xor64 (unsigned long long input)
Re: [PATCH 2/2] Enable elimination of zext/sext
I added this part of the code (in cfgexpand.c) to handle binary/unary/.. gimple operations and used the LHS value range to infer the assigned value range. I will revert this part of the code as this is wrong. I dont think checking promoted_mode for temp will be necessary here as convert_move will handle it correctly if promoted_mode is set for temp. Thus, I will reimplement setting promoted_mode to temp (in expand_expr_real_2) based on the gimple statement content on RHS. i.e. by looking at the RHS operands and its value ranges and by calculating the resulting value range. Does this sound OK to you. No, this sounds backward again and won't work because those operands again could be just truncated - thus you can't rely on their value-range. What you would need is VRP computing value-ranges in the promoted mode from the start (and it doesn't do that). Hi Richard, Here is an attempt to do the value range computation in promoted_mode's type when it is overflowing. Bootstrapped on x86-84. Based on your feedback, I will do more testing on this. Thanks for your time, Kugan gcc/ChangeLog: 2014-09-04 Kugan Vivekanandarajah kug...@linaro.org * tree-ssa-ccp.c (ccp_finalize): Adjust the nonzero_bits precision to the type. (evaluate_stmt): Likewise. * tree-ssanames.c (set_range_info): Adjust if the precision of stored value range is different. * tree-vrp.c (normalize_int_cst_precision): New function. (set_value_range): Add assert to check precision. (set_and_canonicalize_value_range): Call normalize_int_cst_precision on min and max. (promoted_type): New function. (promote_unary_vr): Likewise. (promote_binary_vr): Likewise. (extract_range_from_binary_expr_1): Adjust type to match value range. Store value ranges in promoted type if they overflow. (extract_range_from_unary_expr_1): Likewise. (adjust_range_with_scev): Call normalize_int_cst_precision on min and max. (vrp_visit_assignment_or_call): Likewise. (simplify_bit_ops_using_ranges): Adjust the value range precision. (test_for_singularity): Likewise. (simplify_stmt_for_jump_threading): Likewise. (extract_range_from_assert): Likewise. diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c index a90f708..1733073 100644 --- a/gcc/tree-ssa-ccp.c +++ b/gcc/tree-ssa-ccp.c @@ -916,7 +916,11 @@ ccp_finalize (void) unsigned int precision = TYPE_PRECISION (TREE_TYPE (val-value)); wide_int nonzero_bits = wide_int::from (val-mask, precision, UNSIGNED) | val-value; - nonzero_bits = get_nonzero_bits (name); + wide_int nonzero_bits_name = get_nonzero_bits (name); + if (precision != nonzero_bits_name.get_precision ()) + nonzero_bits = wi::shwi (*nonzero_bits.get_val (), +nonzero_bits_name.get_precision ()); + nonzero_bits = nonzero_bits_name; set_nonzero_bits (name, nonzero_bits); } } @@ -1852,6 +1856,8 @@ evaluate_stmt (gimple stmt) { tree lhs = gimple_get_lhs (stmt); wide_int nonzero_bits = get_nonzero_bits (lhs); + if (TYPE_PRECISION (TREE_TYPE (lhs)) != nonzero_bits.get_precision ()) + nonzero_bits = wide_int_to_tree (TREE_TYPE (lhs), nonzero_bits); if (nonzero_bits != -1) { if (!is_constant) diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c index 3af80a0..459c669 100644 --- a/gcc/tree-ssanames.c +++ b/gcc/tree-ssanames.c @@ -192,7 +192,7 @@ set_range_info (tree name, enum value_range_type range_type, gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name))); gcc_assert (range_type == VR_RANGE || range_type == VR_ANTI_RANGE); range_info_def *ri = SSA_NAME_RANGE_INFO (name); - unsigned int precision = TYPE_PRECISION (TREE_TYPE (name)); + unsigned int precision = min.get_precision (); /* Allocate if not available. */ if (ri == NULL) @@ -204,6 +204,15 @@ set_range_info (tree name, enum value_range_type range_type, SSA_NAME_RANGE_INFO (name) = ri; ri-set_nonzero_bits (wi::shwi (-1, precision)); } + else if (ri-get_min ().get_precision () != precision) +{ + size_t size = (sizeof (range_info_def) ++ trailing_wide_ints 3::extra_size (precision)); + ri = static_castrange_info_def * (ggc_realloc (ri, size)); + ri-ints.set_precision (precision); + SSA_NAME_RANGE_INFO (name) = ri; + ri-set_nonzero_bits (wi::shwi (-1, precision)); +} /* Record the range type. */ if (SSA_NAME_RANGE_TYPE (name) != range_type) diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index d16fd8a..772676a 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -61,6 +61,7 @@ along with GCC; see the file COPYING3. If not see #include optabs.h #include tree-ssa-threadedge.h #include wide-int.h +#include
Re: [PATCH] Force rtl templates to be inlined
Anyway, removing !optimize checks in favor of flag_no_inline checks and initializing that properly is a cleanup as well. Patch looks good to me. -Andi
[PATCH, rs6000] Correct optimization of VSX extract-load for little endian
Hi, The *vsx_extract_mode_load pattern performs a scalar load of memory when possible, rather than a vector load followed by an extract. The assembly for the pattern always loads the 0th memory doubleword element, but the pattern match selects the 0th for big-endian and the 1st for little-endian, leading to wrong results for LE. This patch changes the pattern match to look for the 0th element regardless of endianness. I ran across this when working on another patch, which provides more test coverage for this scenario and will be submitted shortly. For this patch, I'm just correcting the now-failing vsx-extract-1.c test. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Ok for trunk? (This should eventually be backported to 4.8 and 4.9 as well...) Thanks, Bill [gcc] 2014-09-03 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/vsx.md (*vsx_extract_mode_load): Always match selection of 0th memory doubleword, regardless of endianness. [gcc/testsuite] 2014-09-03 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.target/powerpc/vsx-extract-1.c: Test 0th doubleword regardless of endianness. Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 214897) +++ gcc/config/rs6000/vsx.md(working copy) @@ -1835,7 +1835,7 @@ [(set (match_operand:VS_scalar 0 register_operand =d,wv,wr) (vec_select:VS_scalar (match_operand:VSX_D 1 memory_operand m,Z,m) -(parallel [(match_operand:QI 2 vsx_scalar_64bit wD,wD,wD)])))] +(parallel [(const_int 0)])))] VECTOR_MEM_VSX_P (MODEmode) @ lfd%U1%X1 %0,%1 Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-1.c === --- gcc/testsuite/gcc.target/powerpc/vsx-extract-1.c(revision 214897) +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-1.c(working copy) @@ -7,10 +7,4 @@ #include altivec.h -#if __LITTLE_ENDIAN__ -#define OFFSET 1 -#else -#define OFFSET 0 -#endif - -double get_value (vector double *p) { return vec_extract (*p, OFFSET); } +double get_value (vector double *p) { return vec_extract (*p, 0); }
[PATCH, rs6000] Handle vec_extract and splat patterns in analyze_swaps
Hi, This patch adds more special handling to analyze_swaps to allow us to improve more computations. Previously I had disallowed VEC_SELECT in all cases. This is now changed to allow a select of a single lane, either for an extract operation or for a splat operation. If a computation containing such operations is optimized, the selected lane is changed to count from the other end of the vector. Several new tests are added to check these opportunities are now exploited. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2014-09-03 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.c (special_handling_values): Add SH_EXTRACT. (rtx_is_swappable_p): Look for patterns with a VEC_SELECT, perhaps wrapped in a VEC_DUPLICATE, representing an extract. Mark these as swappable with special handling SH_EXTRACT. Remove UNSPEC_VSX_XXSPLTW from the list of disallowed unspecs for the optimization. (adjust_extract): New function. (handle_special_swappables): Add default to case statement; add case for SH_EXTRACT that calls adjust_extract. (dump_swap_insn_table): Handle SH_EXTRACT. [gcc/testsuite] 2014-09-03 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.target/powerpc/swaps-p8-13.c: New test. * gcc.target/powerpc/swaps-p8-14.c: New test. * gcc.target/powerpc/swaps-p8-15.c: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 214879) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -33562,7 +33562,8 @@ enum special_handling_values { SH_CONST_VECTOR, SH_SUBREG, SH_NOSWAP_LD, - SH_NOSWAP_ST + SH_NOSWAP_ST, + SH_EXTRACT }; /* Union INSN with all insns containing definitions that reach USE. @@ -33704,6 +33705,7 @@ rtx_is_swappable_p (rtx op, unsigned int *special) { enum rtx_code code = GET_CODE (op); int i, j; + rtx parallel; switch (code) { @@ -33714,7 +33716,6 @@ rtx_is_swappable_p (rtx op, unsigned int *special) return 1; case VEC_CONCAT: -case VEC_SELECT: case ASM_INPUT: case ASM_OPERANDS: return 0; @@ -33732,9 +33733,31 @@ rtx_is_swappable_p (rtx op, unsigned int *special) handling. */ if (GET_CODE (XEXP (op, 0)) == CONST_INT) return 1; + else if (GET_CODE (XEXP (op, 0)) == REG + GET_MODE_INNER (GET_MODE (op)) == GET_MODE (XEXP (op, 0))) + /* This catches V2DF and V2DI splat, at a minimum. */ + return 1; + else if (GET_CODE (XEXP (op, 0)) == VEC_SELECT) + /* If the duplicated item is from a select, defer to the select + processing to see if we can change the lane for the splat. */ + return rtx_is_swappable_p (XEXP (op, 0), special); else return 0; +case VEC_SELECT: + /* A vec_extract operation is ok if we change the lane. */ + if (GET_CODE (XEXP (op, 0)) == REG + GET_MODE_INNER (GET_MODE (XEXP (op, 0))) == GET_MODE (op) + GET_CODE ((parallel = XEXP (op, 1))) == PARALLEL + XVECLEN (parallel, 0) == 1 + GET_CODE (XVECEXP (parallel, 0, 0)) == CONST_INT) + { + *special = SH_EXTRACT; + return 1; + } + else + return 0; + case UNSPEC: { /* Various operations are unsafe for this optimization, at least @@ -33777,7 +33800,6 @@ rtx_is_swappable_p (rtx op, unsigned int *special) || val == UNSPEC_VSX_CVSPDPN || val == UNSPEC_VSX_SET || val == UNSPEC_VSX_SLDWI - || val == UNSPEC_VSX_XXSPLTW || val == UNSPEC_VUNPACK_HI_SIGN || val == UNSPEC_VUNPACK_HI_SIGN_DIRECT || val == UNSPEC_VUNPACK_LO_SIGN @@ -34115,6 +34137,27 @@ permute_store (rtx_insn *insn) INSN_UID (insn)); } +/* Given OP that contains a vector extract operation, change the index + of the extracted lane to count from the other side of the vector. */ +static void +adjust_extract (rtx_insn *insn) +{ + rtx body = PATTERN (insn); + /* The vec_select may be wrapped in a vec_duplicate for a splat, so + account for that. */ + rtx sel = (GET_CODE (body) == VEC_DUPLICATE +? XEXP (XEXP (body, 0), 1) +: XEXP (body, 1)); + rtx par = XEXP (sel, 1); + int nunits = GET_MODE_NUNITS (GET_MODE (XEXP (sel, 0))); + XVECEXP (par, 0, 0) = GEN_INT (nunits - 1 - INTVAL (XVECEXP (par, 0, 0))); + INSN_CODE (insn) = -1; /* Force re-recognition. */ + df_insn_rescan (insn); + + if (dump_file) +fprintf (dump_file, Changing lane for extract %d\n, INSN_UID (insn)); +} + /* The insn described by INSN_ENTRY[I] can be swapped, but only with special handling. Take care of that here. */ static void @@ -34125,6 +34168,8 @@ handle_special_swappables (swap_web_entry *insn_en
[wwwdocs] Buildstat update for 4.9
Hi, Please find an update of test results for 4.9 as below: Test Results for 4.9.1 : aarch64-linux-gnu Best Regards, Raghunath Lolur. Index: buildstat.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/buildstat.html,v retrieving revision 1.6 diff -u -r1.6 buildstat.html --- buildstat.html 24 Aug 2014 10:15:58 - 1.6 +++ buildstat.html 4 Sep 2014 04:46:38 - @@ -23,6 +23,14 @@ table tr +tdaarch64-linux-gnu/td +tdnbsp;/td +tdTest results: +a href=https://gcc.gnu.org/ml/gcc-testresults/2014-09/msg00328.html;4.9.1/a +/td +/tr + +tr tdarm-unknown-linux-gnueabi/td tdnbsp;/td tdTest results:
[PATCH 01/18, nds32] Define PIC_OFFSET_TABLE_REGNUM to $gp register.
Hi, all, Committed as Rev. 214849: https://gcc.gnu.org/r214849 gcc/ChangeLog 2014-09-03 Chung-Ju Wu jasonw...@gmail.com * config/nds32/nds32.h (PIC_OFFSET_TABLE_REGNUM): Define. Best regards, jasonwucj 0001-PATCH-01-Define-PIC_OFFSET_TABLE_REGNUM-to-gp-regist.patch Description: Binary data
[PATCH 02/18, nds32] Refine the implementation and consider CFA restore information for stack push/pop multiple.
Hi, all, Committed as Rev. 214851: https://gcc.gnu.org/r214851 gcc/ChangeLog 2014-09-03 Chung-Ju Wu jasonw...@gmail.com * config/nds32/nds32.c (nds32_gen_stack_push_multiple): Rename to ... (nds32_emit_stack_push_multiple): ... this. (nds32_gen_stack_pop_multiple): Rename to ... (nds32_emit_stack_pop_multiple): ... this and consider CFA restore information. Best regards, jasonwucj 0002-PATCH-02-Refine-the-implementation-and-consider-CFA-.patch Description: Binary data
[PATCH 03/18, nds32] Refine the implementation and consider CFA restore information for stack v3push/v3pop.
Hi, all, Committed as Rev. 214852: https://gcc.gnu.org/r214852 gcc/ChangeLog 2014-09-03 Chung-Ju Wu jasonw...@gmail.com * config/nds32/nds32.c (nds32_gen_stack_v3push): Rename to ... (nds32_emit_stack_v3push): ... this. (nds32_gen_stack_v3pop): Rename to ... (nds32_emit_stack_v3pop): ... this and consider CFA restore information. Best regards, jasonwucj 0003-PATCH-03-Refine-the-implementation-and-consider-CFA-.patch Description: Binary data
[PATCH 04/18, nds32] In nds32_valid_stack_push_pop_p(), we look into OP rtx to see if we indeed save $fp/$gp/$lp registers.
Hi, all, Committed as Rev. 214853: https://gcc.gnu.org/r214853 gcc/ChangeLog 2014-09-03 Chung-Ju Wu jasonw...@gmail.com * config/nds32/nds32-predicates.c (nds32_valid_stack_push_pop): Rename to ... (nds32_valid_stack_push_pop_p): ... this. * config/nds32/nds32-protos.h: Likewise. * config/nds32/predicates.md: Likewise. Best regards, jasonwucj 0004-PATCH-04-In-nds32_valid_stack_push_pop_p-we-look-int.patch Description: Binary data
[PATCH 05/18, nds32] Preparation in nds32.h of using registers to save varargs.
Hi, all, Committed as Rev. 214854: https://gcc.gnu.org/r214854 gcc/ChangeLog 2014-09-03 Chung-Ju Wu jasonw...@gmail.com * config/nds32/nds32.h (machine_function): Add some fields for variadic arguments implementation. Best regards, jasonwucj 0005-PATCH-05-Preparation-in-nds32.h-of-using-registers-t.patch Description: Binary data