Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing
On Wed, 9 Dec 2015, Jan Hubicka wrote: > Hi > this patch implements the trik for punting if we get too many nested pointers. > This fixes the ada tstcases. Curiously enough I would like to replace > safe_push > by quick_push but doing so I get weird error about freeing non-heap object > in the auto_vec desructor... That's odd but a sign of you doing sth wrong ;) Like somehow clobbering m_using_auto_storage of the vec. But of course if you use a maximum length of 8 there is no need to use a vec<>, just use a fixed bool[8] array (or an integer mask). > Bootstraping/regtesting x86_64-linux. Ok if it passes? Ok. Thanks, Richard. > Honza > > * alias.c (get_alias_set): Punt after getting 8 nested pointers. > > Index: alias.c > === > --- alias.c (revision 231439) > +++ alias.c (working copy) > @@ -990,6 +990,14 @@ get_alias_set (tree t) > || TREE_CODE (p) == VECTOR_TYPE; > p = TREE_TYPE (p)) > { > + /* Ada supports recusive pointers. Instead of doing recrusion check > + just give up once the preallocated space of 8 elements is up. > + In this case just punt to void * alias set. */ > + if (reference.length () == 8) > + { > + p = ptr_type_node; > + break; > + } > if (TREE_CODE (p) == REFERENCE_TYPE) > /* In LTO we want languages that use references to be compatible > with languages that use pointers. */
Re: [patch] Fix PR middle-end/68291 & 68292
On 8 December 2015 at 19:48, Eric Botcazouwrote: >> I think that is ok if the testing passes. > > Thanks. It did on the 3 architectures so I have applied the patch. > Thanks, I confirm it also fixes the regressions I noticed on ARM. Christophe. > -- > Eric Botcazou
Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing
On Wed, 9 Dec 2015, Arnaud Charlet wrote: > > Hi > > this patch implements the trik for punting if we get too many nested > > pointers. > > This fixes the ada tstcases. Curiously enough I would like to replace > > safe_push > > by quick_push but doing so I get weird error about freeing non-heap object > > in the auto_vec desructor... > > > > Bootstraping/regtesting x86_64-linux. Ok if it passes? > > > > Honza > > > > * alias.c (get_alias_set): Punt after getting 8 nested pointers. > > > > Index: alias.c > > === > > > > --- alias.c (revision 231439) > > +++ alias.c (working copy) > > @@ -990,6 +990,14 @@ get_alias_set (tree t) > >|| TREE_CODE (p) == VECTOR_TYPE; > >p = TREE_TYPE (p)) > > { > > + /* Ada supports recusive pointers. Instead of doing recrusion check > > typo above: recrusion -> recursion > > > +just give up once the preallocated space of 8 elements is up. > > +In this case just punt to void * alias set. */ > > + if (reference.length () == 8) > > We don't use magic numbers in general, can you please replace by a named > constant instead? Instead of a magic constant you could also use sth like TREE_VISITED (or a pointer-map). Of course that's a lot more expensive. > > + { > > + p = ptr_type_node; > > + break; > > + } > > if (TREE_CODE (p) == REFERENCE_TYPE) > > /* In LTO we want languages that use references to be compatible > >with languages that use pointers. */ > > I'll let others comment on the general idea. > > -- Richard BienerSUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets
c-c++-common/attr-simd-3.c fails to compile on arm-none-eabi targets due to -fcilkplus needing -pthread which is not available for those targets. This patch solves this issue by adding a condition to the cilkplus effective target that compiling with -fcilkplus succeeds and requires cilkplus as an effective target for attr-simd-3.c testcase. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-12-08 Thomas Preud'hommePR testsuite/68629 * lib/target-supports.exp (check_effective_target_cilkplus): Also check that compiling with -fcilkplus does not give an error. * c-c++-common/attr-simd-3.c: Require cilkplus effective target. diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c b/gcc/testsuite/c-c++-common/attr-simd-3.c index d61ba82..1970c67 100644 --- a/gcc/testsuite/c-c++-common/attr-simd-3.c +++ b/gcc/testsuite/c-c++-common/attr-simd-3.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target "cilkplus" } */ /* { dg-options "-fcilkplus" } */ /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was not declared in this scope" } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 4e349e9..95b903c 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -1432,7 +1432,12 @@ proc check_effective_target_cilkplus { } { if { [istarget avr-*-*] } { return 0; } -return 1 +return [ check_no_compiler_messages_nocache fcilkplus_available executable { + #ifdef __cplusplus + extern "C" + #endif + int dummy; + } "-fcilkplus" ] } proc check_linker_plugin_available { } { Testsuite shows no regression when run with + an arm-none-eabi GCC cross-compiler targeting Cortex-M3 + a bootstrapped x86_64-linux-gnu GCC native compiler Is this ok for trunk? Best regards, Thomas
Re: [PATCH, PR68716] Fix GOMP/GOACC_parallel handling in find_func_clobbers
On Wed, Dec 09, 2015 at 11:01:31AM +0100, Tom de Vries wrote: > PR tree-optimization/68716 > * tree-ssa-structalias.c (find_func_clobbers): Fix handling of > BUILT_IN_GOMP_PARALLEL and BUILT_IN_GOMP_PARALLEL. Pasto in ChangeLog entry? Jakub
[PATCH] Fix vectorizer memory leak
This fixes the observed memory leak in the correct way, asserting that overwriting of a vinfo doesn't happen. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-12-09 Richard Biener* tree-vect-stmts.c (vectorizable_load): Set new vinfo only if it was not yet set. * tree-vectorizer.h (set_vinfo_for_stmt): Assert we don't overwrite an existing entry. Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 231355) +++ gcc/tree-vect-stmts.c (working copy) @@ -7276,6 +7337,9 @@ vectorizable_load (gimple *stmt, gimple_ unshare_expr (gimple_assign_rhs1 (stmt; new_temp = vect_init_vector (stmt, tem, vectype, NULL); + new_stmt = SSA_NAME_DEF_STMT (new_temp); + set_vinfo_for_stmt (new_stmt, + new_stmt_vec_info (new_stmt, vinfo)); } else { @@ -7283,10 +7347,8 @@ vectorizable_load (gimple *stmt, gimple_ gsi_next (); new_temp = vect_init_vector (stmt, scalar_dest, vectype, ); + new_stmt = SSA_NAME_DEF_STMT (new_temp); } - new_stmt = SSA_NAME_DEF_STMT (new_temp); - set_vinfo_for_stmt (new_stmt, - new_stmt_vec_info (new_stmt, vinfo)); } if (negative) Index: gcc/tree-vectorizer.h === --- gcc/tree-vectorizer.h (revision 231355) +++ gcc/tree-vectorizer.h (working copy) @@ -715,7 +715,10 @@ set_vinfo_for_stmt (gimple *stmt, stmt_v stmt_vec_info_vec.safe_push (info); } else -stmt_vec_info_vec[uid - 1] = info; +{ + gcc_checking_assert (info == NULL); + stmt_vec_info_vec[uid - 1] = info; +} } /* Return the earlier statement between STMT1 and STMT2. */
Re: [RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [1/3]
On 12/08/2015 07:27 AM, Richard Biener wrote: I wonder if it makes more sense to integrate this with the domwalker itself. Add a constructor flag to it and do everything in itself. By letting the before_dom_children return a taken edge (or NULL if unknown) it can drive the outgoing edge marking. And the domwalk worker would simply not recurse to dom children for unreachable blocks. So interface-wise do [ ... ] Close :-) If skip_unreachable_blocks is true, then we want the walker to initialize EDGE_EXECUTABLE automatically. So we drop the member initialization and constructor body from domwalk.h and instead have a ctor in domwalk.c where we can initialize the private members and set EDGE_EXECUTABLE as needed. My first iteration let the clients clear EDGE_EXECUTABLE as they found conditionals that could be optimized. That was pretty clean and localized in sccvn & dom. If we have the before_dom_children return the taken edge, then we have to twiddle all the clients due to the api change in before_dom_children. . There's ~18 in total, so it's not too bad. 2 of the 18 clearly need to use the skip_unreachable_blocks capability (dom and sccvn). 2 others might be able to use it (tree-ssa-pre.c and tree-ssa-propagate.c) I converted dom and sccvn, but not pre and the generic propagation engine. I can submit the iteration which lets clients clear EDGE_EXECUTABLE, or the iteration where the clients return the taken edge (or NULL) from the before_dom_children callback. Either is fine with me, so if you have a preference, let me know. jeff
Re: [RFA] Transparent alias suport part 10: Fix base+offset alias analysis oracle WRT aliases
On Wed, 9 Dec 2015, Jan Hubicka wrote: > Hi, > this patch fixes base+offset alias oracles to consider variable aliases. > With this patch and one extra hack I can LTO bootstrap with variale symbol > merging disabled in lto-symtab.c > > The basic idea is simple - there is new comparer compare_base_decls which > knows > how to handle aliases via symbol table's address_equal_to predicate. > I went through the sources and tried to find all places where we compare bases > for equality and replace the copmarsion by this new predicate. > > Bootstrapped/regtested x86_64-linux, looks sane? > > Honza > > * tree-ssa-alias.c (ptr_deref_may_alias_decl_p): Use compare_base_decls > (nonoverlapping_component_refs_of_decl_p): Update sanity check. > (decl_refs_may_alias_p): Use compare_base_decls. > * alias.c: Include cgraph.h > (rtx_equal_for_memref_p): Use rtx_equal_for_memref_p. > (compare_base_decls): New function. > (base_alias_check): Likewise. > (memrefs_conflict_p): Likewise. > (nonoverlapping_memrefs_p): Likewise. > * alias.h (compare_base_decls): Declare. > > * gcc.c-torture/execute/alias-2.c: New testcase. > Index: tree-ssa-alias.c > === > --- tree-ssa-alias.c (revision 231438) > +++ tree-ssa-alias.c (working copy) > @@ -194,7 +194,7 @@ ptr_deref_may_alias_decl_p (tree ptr, tr > ptr = TREE_OPERAND (base, 0); >else if (base > && DECL_P (base)) > - return base == decl; > + return compare_base_decls (base, decl) != 0; >else if (base > && CONSTANT_CLASS_P (base)) > return false; > @@ -805,8 +805,10 @@ nonoverlapping_component_refs_of_decl_p >ref2 = TREE_OPERAND (TREE_OPERAND (ref2, 0), 0); > } > > - /* We must have the same base DECL. */ > - gcc_assert (ref1 == ref2); > + /* Bases must be either same or uncomparable. */ > + gcc_checking_assert (ref1 == ref2 > +|| (DECL_P (ref1) && DECL_P (ref2) > +&& compare_base_decls (ref1, ref2))); I prefer explicit != 0 here. > >/* Pop the stacks in parallel and examine the COMPONENT_REFs of the same > rank. This is sufficient because we start from the same DECL and you > @@ -989,7 +991,7 @@ decl_refs_may_alias_p (tree ref1, tree b >gcc_checking_assert (DECL_P (base1) && DECL_P (base2)); > >/* If both references are based on different variables, they cannot alias. > */ > - if (base1 != base2) > + if (compare_base_decls (base1, base2) == 0) > return false; > >/* If both references are based on the same variable, they cannot alias if > Index: testsuite/gcc.c-torture/execute/alias-2.c > === > --- testsuite/gcc.c-torture/execute/alias-2.c (revision 0) > +++ testsuite/gcc.c-torture/execute/alias-2.c (revision 0) > @@ -0,0 +1,12 @@ > +/* { dg-require-alias "" } */ > +int a[10]={}; > +extern int b[10] __attribute__ ((alias("a"))); > +int off; > +main() > +{ > + b[off]=1; > + a[off]=2; > + if (b[off]!=2) > + __builtin_abort (); > + return 0; > +} > Index: alias.c > === > --- alias.c (revision 231438) > +++ alias.c (working copy) > @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3. > #include "langhooks.h" > #include "cfganal.h" > #include "rtl-iter.h" > +#include "cgraph.h" > > /* The aliasing API provided here solves related but different problems: > > @@ -1747,7 +1748,15 @@ rtx_equal_for_memref_p (const_rtx x, con >return LABEL_REF_LABEL (x) == LABEL_REF_LABEL (y); > > case SYMBOL_REF: > - return XSTR (x, 0) == XSTR (y, 0); > + { > + tree x_decl = SYMBOL_REF_DECL (x); > + tree y_decl = SYMBOL_REF_DECL (y); > + > + if (!x_decl || !y_decl) > + return XSTR (x, 0) == XSTR (y, 0); > + else > + return compare_base_decls (x_decl, y_decl) == 1; > + } > > case ENTRY_VALUE: >/* This is magic, don't go through canonicalization et al. */ > @@ -2010,6 +2019,31 @@ may_be_sp_based_p (rtx x) >return !base || base == static_reg_base_value[STACK_POINTER_REGNUM]; > } > > +/* BASE1 and BASE2 are decls. Return 1 if they refer to same object, 0 > + if they refer to different objects and -1 if we can not decide. */ > + > +int > +compare_base_decls (tree base1, tree base2) > +{ > + int ret; > + gcc_checking_assert (DECL_P (base1) && DECL_P (base2)); > + if (base1 == base2) > +return 1; > + > + bool in_symtab1 = decl_in_symtab_p (base1); > + bool in_symtab2 = decl_in_symtab_p (base2); > + > + /* Declarations of non-automatic variables may have aliases. All other > + decls are unique. */ > + if (in_symtab1 != in_symtab2 || !in_symtab1) > +return 0; > + ret = symtab_node::get_create (base1)->equal_address_to > + (symtab_node::get_create
Re: [PATCH, PR68716] Fix GOMP/GOACC_parallel handling in find_func_clobbers
On 09/12/15 11:03, Jakub Jelinek wrote: On Wed, Dec 09, 2015 at 11:01:31AM +0100, Tom de Vries wrote: PR tree-optimization/68716 * tree-ssa-structalias.c (find_func_clobbers): Fix handling of BUILT_IN_GOMP_PARALLEL and BUILT_IN_GOMP_PARALLEL. Pasto in ChangeLog entry? Indeed, thanks for noticing. - Tom Fix GOMP/GOACC_parallel handling in find_func_clobbers 2015-12-08 Tom de VriesPR tree-optimization/68716 * tree-ssa-structalias.c (find_func_clobbers): Fix handling of BUILT_IN_GOMP_PARALLEL and BUILT_IN_GOACC_PARALLEL. * testsuite/libgomp.c/omp-nested-2.c: New test. --- gcc/tree-ssa-structalias.c | 47 +- libgomp/testsuite/libgomp.c/omp-nested-2.c | 4 +++ 2 files changed, 50 insertions(+), 1 deletion(-) diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c index 060ff3e..15e351e 100644 --- a/gcc/tree-ssa-structalias.c +++ b/gcc/tree-ssa-structalias.c @@ -5082,7 +5082,52 @@ find_func_clobbers (struct function *fn, gimple *origt) return; case BUILT_IN_GOMP_PARALLEL: case BUILT_IN_GOACC_PARALLEL: - return; + { + unsigned int fnpos, argpos; + switch (DECL_FUNCTION_CODE (decl)) + { + case BUILT_IN_GOMP_PARALLEL: + /* __builtin_GOMP_parallel (fn, data, num_threads, flags). */ + fnpos = 0; + argpos = 1; + break; + case BUILT_IN_GOACC_PARALLEL: + /* __builtin_GOACC_parallel (device, fn, mapnum, hostaddrs, + sizes, kinds, ...). */ + fnpos = 1; + argpos = 3; + break; + default: + gcc_unreachable (); + } + + tree fnarg = gimple_call_arg (t, fnpos); + gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR); + tree fndecl = TREE_OPERAND (fnarg, 0); + varinfo_t cfi = get_vi_for_tree (fndecl); + + tree arg = gimple_call_arg (t, argpos); + + /* Parameter passed by value is used. */ + lhs = get_function_part_constraint (fi, fi_uses); + struct constraint_expr *rhsp; + get_constraint_for (arg, ); + FOR_EACH_VEC_ELT (rhsc, j, rhsp) + process_constraint (new_constraint (lhs, *rhsp)); + rhsc.truncate (0); + + /* The caller clobbers what the callee does. */ + lhs = get_function_part_constraint (fi, fi_clobbers); + rhs = get_function_part_constraint (cfi, fi_clobbers); + process_constraint (new_constraint (lhs, rhs)); + + /* The caller uses what the callee does. */ + lhs = get_function_part_constraint (fi, fi_uses); + rhs = get_function_part_constraint (cfi, fi_uses); + process_constraint (new_constraint (lhs, rhs)); + + return; + } /* printf-style functions may have hooks to set pointers to point to somewhere into the generated string. Leave them for a later exercise... */ diff --git a/libgomp/testsuite/libgomp.c/omp-nested-2.c b/libgomp/testsuite/libgomp.c/omp-nested-2.c new file mode 100644 index 000..7495afb --- /dev/null +++ b/libgomp/testsuite/libgomp.c/omp-nested-2.c @@ -0,0 +1,4 @@ +// { dg-do run } +// { dg-additional-options "-fipa-pta" } + +#include "omp-nested-1.c"
Re: Wrap fewer refs for LTO
On Wed, 9 Dec 2015, Jan Hubicka wrote: > Hi, while debugging an renaming issue I run into ADDR_EXPR of MEM_REF of > ADDR_EXPR which seemed somewhat odd + we don't really get the CONSTANT > and other flags right because recompute_tree_invariant_for_addr_expr > punts on ADDR_EXPR inside ADDR_EXPR. That needs to be fixed then - [ + CST] is a valid invariant address (we actually create this form quite often during propagation). > The expression is created by wrap_refs. While looking at the code I > noitced that we wrap all VAR_DECLs while I think we only need to wrap > ones that can be merged by lto-symtab. I think this comes before the times of "sane" type merging thus we had to prepare for the type to get munged to sth incompatible here. > I wonder if the function could not stop the recursive walk on ADDR_EXPR > and MEM_REFS? Is there a way we can use the inner MEM_REF for something > useful? No, but the walkign would also never reach a handled-component inside a MEM_REF. A MEM_REF[] is not valid gimple (see is_gimple_mem_ref_addr). So we don't ever re-write those "recursively". But yeah, we'd save a few recursions for adding an extra test against MEM_REF. Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 231355) +++ lto-streamer-out.c (working copy) @@ -2244,7 +2244,8 @@ wrap_refs (tree *tp, int *ws, void *) } else if (TREE_CODE (t) == CONSTRUCTOR) ; - else if (!EXPR_P (t)) + else if (!EXPR_P (t) + || TREE_CODE (t) == MEM_REF) *ws = 0; return NULL_TREE; } > Bootstrapped/regtested x86_64-linux, OK? Ok. Thanks, Richard. > > Honza > > * lto-streamer-out.c (wrap_refs): Only wrap public variables. > Index: lto-streamer-out.c > === > --- lto-streamer-out.c(revision 231438) > +++ lto-streamer-out.c(working copy) > @@ -2236,7 +2237,8 @@ wrap_refs (tree *tp, int *ws, void *) > { >tree t = *tp; >if (handled_component_p (t) > - && TREE_CODE (TREE_OPERAND (t, 0)) == VAR_DECL) > + && TREE_CODE (TREE_OPERAND (t, 0)) == VAR_DECL > + && TREE_PUBLIC (TREE_OPERAND (t, 0))) > { >tree decl = TREE_OPERAND (t, 0); >tree ptrtype = build_pointer_type (TREE_TYPE (decl)); > > -- Richard BienerSUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: Wrap fewer refs for LTO
On Wed, 9 Dec 2015, Jan Hubicka wrote: > > Hi, > > while debugging an renaming issue I run into ADDR_EXPR of MEM_REF of > > ADDR_EXPR > > which seemed somewhat odd + we don't really get the CONSTANT and other flags > > right because recompute_tree_invariant_for_addr_expr punts on ADDR_EXPR > > inside > > ADDR_EXPR. > > > > The expression is created by wrap_refs. While looking at the code I noitced > > that > > we wrap all VAR_DECLs while I think we only need to wrap ones that can be > > merged > > by lto-symtab. > > > > I wonder if the function could not stop the recursive walk on ADDR_EXPR and > > MEM_REFS? > > Is there a way we can use the inner MEM_REF for something useful? > > > > Bootstrapped/regtested x86_64-linux, OK? > > Uh, sorry, actually I forgot about DECL_EXTERNAL. Here is updated version I > am re-testing. OK. > * lto-streamer-out.c (wrap_refs): Only wrap public variables. > Index: lto-streamer-out.c > === > --- lto-streamer-out.c(revision 231438) > +++ lto-streamer-out.c(working copy) > @@ -2236,7 +2237,9 @@ wrap_refs (tree *tp, int *ws, void *) > { >tree t = *tp; >if (handled_component_p (t) > - && TREE_CODE (TREE_OPERAND (t, 0)) == VAR_DECL) > + && TREE_CODE (TREE_OPERAND (t, 0)) == VAR_DECL > + && (TREE_PUBLIC (TREE_OPERAND (t, 0)) > +|| DECL_EXTERNAL (TREE_OPERAND (t, 0 > { >tree decl = TREE_OPERAND (t, 0); >tree ptrtype = build_pointer_type (TREE_TYPE (decl)); > > -- Richard BienerSUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH, testsuite] Fix PR68632: gcc.target/arm/lto/pr65837 failure on M profile ARM targets
gcc.target/arm/lto/pr65837 fails on M profile ARM targets because of lack of neon instructions. This patch adds the necessary arm_neon_ok effective target requirement to avoid running this test for such targets. ChangeLog entry is as follows: * gcc/testsuite/ChangeLog *** 2015-12-08 Thomas Preud'hommePR testsuite/68632 * gcc.target/arm/lto/pr65837_0.c: Require arm_neon_ok effective target. diff --git a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c index 000fc2a..fcc26a1 100644 --- a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c +++ b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c @@ -1,4 +1,5 @@ /* { dg-lto-do run } */ +/* { dg-require-effective-target arm_neon_ok } */ /* { dg-lto-options {{-flto -mfpu=neon}} } */ /* { dg-suppress-ld-options {-mfpu=neon} } */ Testcase fails without the patch and succeeds with. Is this ok for trunk? Best regards, Thomas
Re: [PATCH, PR68716] Fix GOMP/GOACC_parallel handling in find_func_clobbers
On Wed, 9 Dec 2015, Jakub Jelinek wrote: > On Wed, Dec 09, 2015 at 11:01:31AM +0100, Tom de Vries wrote: > > PR tree-optimization/68716 > > * tree-ssa-structalias.c (find_func_clobbers): Fix handling of > > BUILT_IN_GOMP_PARALLEL and BUILT_IN_GOMP_PARALLEL. > > Pasto in ChangeLog entry? Ok with that fixed. Richard.
Re: [DOC, PATCH] Mention --enable-valgrind-annotations in install.texi
On 12/09/2015 12:47 AM, Gerald Pfeifer wrote: > On Tue, 8 Dec 2015, Martin Liška wrote: >> I would like to add a missing configure option. > > I saw that Jeff approved while I was mulling over the patch, > but still have two questons: > > 1. Why did you sort this in where you did? (It's not alphabetic.) Hi. Well, it's not easy to find a place as the options are not well ordered :) I tried to pick up a better place in v2. > > +Specify that the compiler should interact with valgrind runtime, where > +selected invalid memory reads are marked as false positives and > +garbage collected memory is properly marked for proper interaction. > > 2. "interact...for proper interaction" and "properly marked for > proper" feels a little confusing to me. That is, I had to think > twice. Any chance you can make this a little simpler to understand? Sure, please read v2. Martin > > Gerald > >From 3531cab4de82bc0899c44d5cac19f3541027be8b Mon Sep 17 00:00:00 2001 From: marxinDate: Tue, 8 Dec 2015 16:54:43 +0100 Subject: [PATCH] Mention --enable-valgrind-annotations in install.texi gcc/ChangeLog: 2015-12-08 Martin Liska * doc/install.texi (--enable-valgrind-annotations): Mention the configure option in configure page. --- gcc/doc/install.texi | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 0b71bef..652b936 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -1748,6 +1748,12 @@ When this option is specified more detailed information on memory allocation is gathered. This information is printed when using @option{-fmem-report}. +@item --enable-valgrind-annotations +Specify that the compiler should interact with valgrind runtime, +where selected memory-related operations are marked as valid. +There operations are safe and should not become a candidate +of an undefined behavior. + @item --enable-nls @itemx --disable-nls The @option{--enable-nls} option enables Native Language Support (NLS), -- 2.6.3
Re: [RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [1/3]
On Wed, Dec 9, 2015 at 9:31 AM, Jeff Lawwrote: > On 12/08/2015 07:27 AM, Richard Biener wrote: >>> >>> >>> I wonder if it makes more sense to integrate this with the >>> domwalker itself. Add a constructor flag to it and do everything >>> in itself. By letting the before_dom_children return a taken edge >>> (or NULL if unknown) it can drive the outgoing edge marking. And >>> the domwalk worker would simply not recurse to dom children for >>> unreachable blocks. >> >> >> So interface-wise do > > [ ... ] > Close :-) > > If skip_unreachable_blocks is true, then we want the walker to initialize > EDGE_EXECUTABLE automatically. So we drop the member initialization and > constructor body from domwalk.h and instead have a ctor in domwalk.c where > we can initialize the private members and set EDGE_EXECUTABLE as needed. > > My first iteration let the clients clear EDGE_EXECUTABLE as they found > conditionals that could be optimized. That was pretty clean and localized > in sccvn & dom. > > If we have the before_dom_children return the taken edge, then we have to > twiddle all the clients due to the api change in before_dom_children. . > There's ~18 in total, so it's not too bad. > > 2 of the 18 clearly need to use the skip_unreachable_blocks capability (dom > and sccvn). 2 others might be able to use it (tree-ssa-pre.c and > tree-ssa-propagate.c) I converted dom and sccvn, but not pre and the > generic propagation engine. > > I can submit the iteration which lets clients clear EDGE_EXECUTABLE, or the > iteration where the clients return the taken edge (or NULL) from the > before_dom_children callback. > > Either is fine with me, so if you have a preference, let me know. Return the taken edge. Richard. > jeff
[PATCH, PR68716] Fix GOMP/GOACC_parallel handling in find_func_clobbers
[ was: Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta ] On 30/11/15 13:11, Tom de Vries wrote: On 30/11/15 10:16, Richard Biener wrote: On Mon, 30 Nov 2015, Tom de Vries wrote: Hi, this patch fixes PR46032. It handles a call: ... __builtin_GOMP_parallel (fn, data, num_threads, flags) ... as: ... fn (data) ... in ipa-pta. This improves ipa-pta alias analysis in the parallelized function fn, and allows vectorization in the testcase without a runtime alias test. Bootstrapped and reg-tested on x86_64. OK for stage3 trunk? + /* Assign the passed argument to the appropriate incoming +parameter of the function. */ + struct constraint_expr lhs ; + lhs = get_function_part_constraint (fi, fi_parm_base + 0); + auto_vecrhsc; + struct constraint_expr *rhsp; + get_constraint_for_rhs (arg, ); + while (rhsc.length () != 0) + { + rhsp = (); + process_constraint (new_constraint (lhs, *rhsp)); + rhsc.pop (); + } please use style used elsewhere with FOR_EACH_VEC_ELT (rhsc, j, rhsp) process_constraint (new_constraint (lhs, *rhsp)); rhsc.truncate (0); That code was copied from find_func_aliases_for_call. I've factored out the bit that I copied as find_func_aliases_for_call_arg, and fixed the style there (and dropped 'rhsc.truncate (0)' since AFAIU it's redundant at the end of a function). + /* Parameter passed by value is used. */ + lhs = get_function_part_constraint (fi, fi_uses); + struct constraint_expr *rhsp; + get_constraint_for_address_of (arg, ); This isn't correct - you want to use get_constraint_for (arg, ). After all rhs is already an ADDR_EXPR. Can we add an assert somewhere to detect this incorrect usage? + FOR_EACH_VEC_ELT (rhsc, j, rhsp) + process_constraint (new_constraint (lhs, *rhsp)); + rhsc.truncate (0); + + /* The caller clobbers what the callee does. */ + lhs = get_function_part_constraint (fi, fi_clobbers); + rhs = get_function_part_constraint (cfi, fi_clobbers); + process_constraint (new_constraint (lhs, rhs)); + + /* The caller uses what the callee does. */ + lhs = get_function_part_constraint (fi, fi_uses); + rhs = get_function_part_constraint (cfi, fi_uses); + process_constraint (new_constraint (lhs, rhs)); I don't see why you need those. The solver should compute these in even better precision (context sensitive on the call side). The same is true for the function parameter. That is, the only needed part of the patch should be that making sure we see the "direct" call and assign parameters correctly. Dropped this bit. Dropping the find_func_clobbers bit (in particular, the part copying the clobbers) caused PR68716. Basically the find_func_clobbers bit tries to emulate the generic handling of calls in find_func_clobbers, but pretends the call is fn (data) when handling __builtin_GOMP_parallel (fn, data, num_threads, flags) . I used the same comments as in the generic call handling to make it clear what part of the generic call handling is emulated. Compared to the originally posted patch, the changes are: - removed 'gcc_assert (TREE_CODE (arg) == ADDR_EXPR)'. Ran into a case where arg is NULL. - use get_constraint_for instead of get_constraint_for_address_of as requested in a review comment above. - handle GOACC_parallel as well Bootstrapped and reg-tested on x86_64. OK for stage3 trunk? Thanks, - Tom Fix GOMP/GOACC_parallel handling in find_func_clobbers 2015-12-08 Tom de Vries PR tree-optimization/68716 * tree-ssa-structalias.c (find_func_clobbers): Fix handling of BUILT_IN_GOMP_PARALLEL and BUILT_IN_GOMP_PARALLEL. * testsuite/libgomp.c/omp-nested-2.c: New test. --- gcc/tree-ssa-structalias.c | 47 +- libgomp/testsuite/libgomp.c/omp-nested-2.c | 4 +++ 2 files changed, 50 insertions(+), 1 deletion(-) diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c index 060ff3e..15e351e 100644 --- a/gcc/tree-ssa-structalias.c +++ b/gcc/tree-ssa-structalias.c @@ -5082,7 +5082,52 @@ find_func_clobbers (struct function *fn, gimple *origt) return; case BUILT_IN_GOMP_PARALLEL: case BUILT_IN_GOACC_PARALLEL: - return; + { + unsigned int fnpos, argpos; + switch (DECL_FUNCTION_CODE (decl)) + { + case BUILT_IN_GOMP_PARALLEL: + /* __builtin_GOMP_parallel (fn, data, num_threads, flags). */ + fnpos = 0; + argpos = 1; + break; + case BUILT_IN_GOACC_PARALLEL: + /* __builtin_GOACC_parallel (device, fn, mapnum, hostaddrs, + sizes, kinds, ...). */ + fnpos = 1; + argpos = 3; +
Re: [PATCH] Fix phiopt ICE in Factor conversion in COND_EXPR (PR tree-optimization/66949)
On Tue, Dec 8, 2015 at 5:21 PM, Marek Polacekwrote: > The following is a conservative fix for this PR. This is an ICE transpiring > in the new "Factor conversion in COND_EXPR" optimization added in r225722. > > Before this optimization kicks in, we have > : > ... > p1_32 = (short unsigned int) _20; > > : > ... > iftmp.0_18 = (short unsigned int) _20; > > : > ... > # iftmp.0_19 = PHI > > after factor_out_conditional_conversion does its work, we end up with those > two > def stmts removed and instead of the PHI we'll have > > # _35 = PHI <_20(3), _20(2)> > iftmp.0_19 = (short unsigned int) _35; > > That itself looks like a fine optimization, but after > factor_out_conditional_conversion > there's > 320 phis = phi_nodes (bb2); > 321 phi = single_non_singleton_phi_for_edges (phis, e1, e2); > 322 gcc_assert (phi); > and phis look like > b.2_38 = PHI > _35 = PHI <_20(3), _20(2)> > so single_non_singleton_phi_for_edges returns NULL and the subsequent assert > triggers. > > With this patch we won't ICE (and PRE should clean this up anyway), but I > don't know, > maybe I should try harder to optimize even this problematical case (not sure > how hard > it would be...)? > > pr66949-2.c only ICEd on powerpc64le and I have verified that this patch > fixes it too. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? Hmm. Yes, there is an opportunity for optimization. But I don't see why we have the asserts at all - they seem to be originated from development. IMHO factor_out_conditonal_conversion should simply return the new PHI it generates instead of relying on signle_non_signelton_phi_for_edges to return it. Richard. > 2015-12-08 Marek Polacek > > PR tree-optimization/66949 > * tree-ssa-phiopt.c (factor_out_conditional_conversion): Return false > if > NEW_ARG0 and NEW_ARG1 are equal. > > * gcc.dg/torture/pr66949-1.c: New test. > * gcc.dg/torture/pr66949-2.c: New test. > > diff --git gcc/testsuite/gcc.dg/torture/pr66949-1.c > gcc/testsuite/gcc.dg/torture/pr66949-1.c > index e69de29..1b765bc 100644 > --- gcc/testsuite/gcc.dg/torture/pr66949-1.c > +++ gcc/testsuite/gcc.dg/torture/pr66949-1.c > @@ -0,0 +1,28 @@ > +/* PR tree-optimization/66949 */ > +/* { dg-do compile } */ > + > +int a, *b = , c; > + > +unsigned short > +fn1 (unsigned short p1, unsigned int p2) > +{ > + return p2 > 1 || p1 >> p2 ? p1 : p1 << p2; > +} > + > +void > +fn2 () > +{ > + int *d = > + for (a = 0; a < -1; a = 1) > +; > + if (a < 0) > +c = 0; > + *b = fn1 (*d || c, *d); > +} > + > +int > +main () > +{ > + fn2 (); > + return 0; > +} > diff --git gcc/testsuite/gcc.dg/torture/pr66949-2.c > gcc/testsuite/gcc.dg/torture/pr66949-2.c > index e69de29..e6250a3 100644 > --- gcc/testsuite/gcc.dg/torture/pr66949-2.c > +++ gcc/testsuite/gcc.dg/torture/pr66949-2.c > @@ -0,0 +1,23 @@ > +/* PR tree-optimization/66949 */ > +/* { dg-do compile } */ > + > +char a; > +int b, c, d; > +extern int fn2 (void); > + > +short > +fn1 (short p1, short p2) > +{ > + return p2 == 0 ? p1 : p1 / p2; > +} > + > +int > +main (void) > +{ > + char e = 1; > + int f = 7; > + c = a >> f; > + b = fn1 (c, 0 < d <= e && fn2 ()); > + > + return 0; > +} > diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c > index 344cd2f..caac5d5 100644 > --- gcc/tree-ssa-phiopt.c > +++ gcc/tree-ssa-phiopt.c > @@ -477,6 +477,11 @@ factor_out_conditional_conversion (edge e0, edge e1, > gphi *phi, > return false; > } > > + /* If we were to continue, we'd create a PHI with same arguments for edges > + E0 and E1. That could get us in trouble later, so punt. */ > + if (operand_equal_for_phi_arg_p (new_arg0, new_arg1)) > +return false; > + >/* If arg0/arg1 have > 1 use, then this transformation actually increases >the number of expressions evaluated at runtime. */ >if (!has_single_use (arg0) > > Marek
Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing
> Hi > this patch implements the trik for punting if we get too many nested > pointers. > This fixes the ada tstcases. Curiously enough I would like to replace > safe_push > by quick_push but doing so I get weird error about freeing non-heap object > in the auto_vec desructor... > > Bootstraping/regtesting x86_64-linux. Ok if it passes? > > Honza > > * alias.c (get_alias_set): Punt after getting 8 nested pointers. > > Index: alias.c > === > > --- alias.c (revision 231439) > +++ alias.c (working copy) > @@ -990,6 +990,14 @@ get_alias_set (tree t) > || TREE_CODE (p) == VECTOR_TYPE; > p = TREE_TYPE (p)) > { > + /* Ada supports recusive pointers. Instead of doing recrusion check typo above: recrusion -> recursion > + just give up once the preallocated space of 8 elements is up. > + In this case just punt to void * alias set. */ > + if (reference.length () == 8) We don't use magic numbers in general, can you please replace by a named constant instead? > + { > + p = ptr_type_node; > + break; > + } > if (TREE_CODE (p) == REFERENCE_TYPE) > /* In LTO we want languages that use references to be compatible > with languages that use pointers. */ I'll let others comment on the general idea.
[PATCH] Fix PR68583
Well, not really in the end - the actual testcase would need code hoisting to make the refs equal enough for ifcvt. Making the testcase simpler doesn't seem to capture the issue so for now without one (I'll keep trying). Bootstrapped and tested on x86_64-unknown-linux-gnu. Richard. 2015-12-09 Richard BienerPR tree-optimization/68583 * tree-if-conv.c (ifc_dr): Make flags bool, add w_unconditionally flag and rename predicates to w_predicate, rw_predicate and base_w_predicate. (DR_WRITTEN_AT_LEAST_ONCE): Rename to ... (DR_BASE_W_UNCONDITIONALLY): ... this. (DR_W_UNCONDITIONALLY): Add. (hash_memrefs_baserefs_and_store_DRs_read): Adjust. Compute unconditionally written separately from read or written. (ifcvt_memrefs_wont_trap): Properly treat reads. (ifcvt_could_trap_p): Inline ... (if_convertible_gimple_assign_stmt_p): ... here. Refactor to avoid code duplication. (if_convertible_loop_p_1): Adjust and properly initialize predicates. Index: gcc/tree-if-conv.c === *** gcc/tree-if-conv.c (revision 231396) --- gcc/tree-if-conv.c (working copy) *** if_convertible_phi_p (struct loop *loop, *** 582,601 each DR->aux field. */ struct ifc_dr { ! /* -1 when not initialized, 0 when false, 1 when true. */ ! int written_at_least_once; ! ! /* -1 when not initialized, 0 when false, 1 when true. */ ! int rw_unconditionally; ! ! tree predicate; ! ! tree base_predicate; }; #define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux) ! #define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)->written_at_least_once) #define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally) /* Iterates over DR's and stores refs, DR and base refs, DR pairs in HASH tables. While storing them in HASH table, it checks if the --- 582,600 each DR->aux field. */ struct ifc_dr { ! bool rw_unconditionally; ! bool w_unconditionally; ! bool written_at_least_once; ! ! tree rw_predicate; ! tree w_predicate; ! tree base_w_predicate; }; #define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux) ! #define DR_BASE_W_UNCONDITIONALLY(DR) (IFC_DR (DR)->written_at_least_once) #define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally) + #define DR_W_UNCONDITIONALLY(DR) (IFC_DR (DR)->w_unconditionally) /* Iterates over DR's and stores refs, DR and base refs, DR pairs in HASH tables. While storing them in HASH table, it checks if the *** hash_memrefs_baserefs_and_store_DRs_read *** 623,660 ref = TREE_OPERAND (ref, 0); master_dr = _DR_map->get_or_insert (ref, ); - if (!exist1) ! { ! IFC_DR (a)->predicate = ca; ! *master_dr = a; ! } ! else ! IFC_DR (*master_dr)->predicate ! = fold_or_predicates ! (EXPR_LOCATION (IFC_DR (*master_dr)->predicate), !ca, IFC_DR (*master_dr)->predicate); ! if (is_true_predicate (IFC_DR (*master_dr)->predicate)) ! DR_RW_UNCONDITIONALLY (*master_dr) = 1; if (DR_IS_WRITE (a)) { base_master_dr = _DR_map->get_or_insert (base_ref, ); - if (!exist2) ! { ! IFC_DR (a)->base_predicate = ca; ! *base_master_dr = a; ! } ! else ! IFC_DR (*base_master_dr)->base_predicate ! = fold_or_predicates ! (EXPR_LOCATION (IFC_DR (*base_master_dr)->base_predicate), ! ca, IFC_DR (*base_master_dr)->base_predicate); ! ! if (is_true_predicate (IFC_DR (*base_master_dr)->base_predicate)) ! DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) = 1; } } --- 622,654 ref = TREE_OPERAND (ref, 0); master_dr = _DR_map->get_or_insert (ref, ); if (!exist1) ! *master_dr = a; ! if (DR_IS_WRITE (a)) ! { ! IFC_DR (*master_dr)->w_predicate ! = fold_or_predicates (UNKNOWN_LOCATION, ca, ! IFC_DR (*master_dr)->w_predicate); ! if (is_true_predicate (IFC_DR (*master_dr)->w_predicate)) ! DR_W_UNCONDITIONALLY (*master_dr) = true; ! } ! IFC_DR (*master_dr)->rw_predicate ! = fold_or_predicates (UNKNOWN_LOCATION, ca, ! IFC_DR (*master_dr)->rw_predicate); ! if (is_true_predicate (IFC_DR (*master_dr)->rw_predicate)) ! DR_RW_UNCONDITIONALLY (*master_dr) = true; if (DR_IS_WRITE (a)) { base_master_dr = _DR_map->get_or_insert (base_ref, ); if (!exist2) ! *base_master_dr = a; ! IFC_DR (*base_master_dr)->base_w_predicate ! = fold_or_predicates (UNKNOWN_LOCATION, ca, ! IFC_DR (*base_master_dr)->base_w_predicate); ! if (is_true_predicate (IFC_DR (*base_master_dr)->base_w_predicate)) ! DR_BASE_W_UNCONDITIONALLY (*base_master_dr) = true; } } ***
Re: [DOC, PATCH] Mention --enable-valgrind-annotations in install.texi
On 2015.12.09 at 10:33 +0100, Martin Liška wrote: > On 12/09/2015 12:47 AM, Gerald Pfeifer wrote: > > On Tue, 8 Dec 2015, Martin Liška wrote: > >> I would like to add a missing configure option. > > > > +Specify that the compiler should interact with valgrind runtime, where > > +selected invalid memory reads are marked as false positives and > > +garbage collected memory is properly marked for proper interaction. > > > > 2. "interact...for proper interaction" and "properly marked for > > proper" feels a little confusing to me. That is, I had to think > > twice. Any chance you can make this a little simpler to understand? > > Sure, please read v2. > > +@item --enable-valgrind-annotations > +Specify that the compiler should interact with valgrind runtime, > +where selected memory-related operations are marked as valid. > +There operations are safe and should not become a candidate > +of an undefined behavior. Sorry, but this is simply awful ;-). How about: Mark selected memory related operations in the compiler when run under valgrind to suppress false positives. -- Markus
Re: [RFC] Request for comments on ivopts patch
On Tue, Dec 8, 2015 at 7:56 PM, Steve Ellceywrote: > I have an ivopts optimization question/proposal. When compiling the > attached program the ivopts pass prefers the original ivs over new ivs > and that causes us to generate less efficient code on MIPS. It may > affect other platforms too. > > The Source code is a C strcmp: > > int strcmp (const char *p1, const char *p2) > { > const unsigned char *s1 = (const unsigned char *) p1; > const unsigned char *s2 = (const unsigned char *) p2; > unsigned char c1, c2; > do { > c1 = (unsigned char) *s1++; > c2 = (unsigned char) *s2++; > if (c1 == '\0') return c1 - c2; > } while (c1 == c2); > return c1 - c2; > } > > Currently the code prefers the original ivs and so it generates > code that increments s1 and s2 before doing the loads (and uses > a -1 offset): > > : > # s1_1 = PHI > # s2_2 = PHI > s1_6 = s1_1 + 1; > c1_8 = MEM[base: s1_6, offset: 4294967295B]; > s2_9 = s2_2 + 1; > c2_10 = MEM[base: s2_9, offset: 4294967295B]; > if (c1_8 == 0) > goto ; > else > goto ; > > If I remove the cost increment for non-original ivs then GCC > does the loads before the increments: > > : > # ivtmp.6_17 = PHI > # ivtmp.7_21 = PHI > _25 = (void *) ivtmp.6_17; > c1_8 = MEM[base: _25, offset: 0B]; > _26 = (void *) ivtmp.7_21; > c2_10 = MEM[base: _26, offset: 0B]; > if (c1_8 == 0) > goto ; > else > goto ; > . > . > : > ivtmp.6_14 = ivtmp.6_17 + 1; > ivtmp.7_23 = ivtmp.7_21 + 1; > if (c1_8 == c2_10) > goto ; > else > goto ; > > > This second case (without the preference for the original IV) > generates better code on MIPS because the final assembly > has the increment instructions between the loads and the tests > of the values being loaded and so there is no delay (or less delay) > between the load and use. It seems like this could easily be > the case for other platforms too so I was wondering what people > thought of this patch: You don't comment on the comment you remove ... debugging programs is also important! So if then the cost of both cases should be distinguished somewhere else, like granting a bonus for increment before exit test or so. Richard. > 2015-12-08 Steve Ellcey > > * tree-ssa-loop-ivopts.c (determine_iv_cost): Remove preference for > original ivs. > > > diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c > index 98dc451..26daabc 100644 > --- a/gcc/tree-ssa-loop-ivopts.c > +++ b/gcc/tree-ssa-loop-ivopts.c > @@ -5818,14 +5818,6 @@ determine_iv_cost (struct ivopts_data *data, struct > iv_cand *cand) > >cost = cost_step + adjust_setup_cost (data, cost_base.cost); > > - /* Prefer the original ivs unless we may gain something by replacing it. > - The reason is to make debugging simpler; so this is not relevant for > - artificial ivs created by other optimization passes. */ > - if (cand->pos != IP_ORIGINAL > - || !SSA_NAME_VAR (cand->var_before) > - || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before))) > -cost++; > - >/* Prefer not to insert statements into latch unless there are some > already (so that we do not create unnecessary jumps). */ >if (cand->pos == IP_END
Re: [PATCH] Encode alignment info in MASK_{LOAD,STORE} ifns (PR tree-optimization/68786)
On Tue, 8 Dec 2015, Jakub Jelinek wrote: > Hi! > > As written in the PR, with MASK_{LOAD,STORE} ifns we lose info on the > alignment of the point, unless the referenced SSA_NAME has pointer info > with alignment mask, but that is just an optimization issue rather than > requirement. > > As the second argument to these builtins is always 0 (used just to hold > the pointer type for aliasing purposes), this patch uses the value of > the second arg to hold alignment info. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. > 2015-12-08 Jakub Jelinek> > PR tree-optimization/68786 > * tree-if-conv.c: Include builtins.h. > (predicate_mem_writes): Put result of get_object_alignment (ref) > into second argument's value. > * tree-vect-stmts.c (vectorizable_mask_load_store): Put minimum > pointer alignment into second argument's value. > * tree-data-ref.c (get_references_in_stmt): Use value of second > argument for build_aligned_type, and only the type to build > a zero second argument for MEM_REF. > * internal-fn.c (expand_mask_load_optab_fn, > expand_mask_store_optab_fn): Likewise. > > --- gcc/tree-if-conv.c.jj 2015-11-30 13:40:38.0 +0100 > +++ gcc/tree-if-conv.c2015-12-08 17:29:18.139186787 +0100 > @@ -111,6 +111,7 @@ along with GCC; see the file COPYING3. > #include "dbgcnt.h" > #include "tree-hash-traits.h" > #include "varasm.h" > +#include "builtins.h" > > /* List of basic blocks in if-conversion-suitable order. */ > static basic_block *ifc_bbs; > @@ -2093,7 +2094,8 @@ predicate_mem_writes (loop_p loop) > vect_sizes.safe_push (bitsize); > vect_masks.safe_push (mask); > } > - ptr = build_int_cst (reference_alias_ptr_type (ref), 0); > + ptr = build_int_cst (reference_alias_ptr_type (ref), > + get_object_alignment (ref)); > /* Copy points-to info if possible. */ > if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr)) > copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr), > --- gcc/tree-vect-stmts.c.jj 2015-12-02 20:26:59.0 +0100 > +++ gcc/tree-vect-stmts.c 2015-12-08 17:10:47.226768560 +0100 > @@ -2058,10 +2058,11 @@ vectorizable_mask_load_store (gimple *st > misalign = DR_MISALIGNMENT (dr); > set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, > misalign); > + tree ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)), > + misalign ? misalign & -misalign : align); > new_stmt > = gimple_build_call_internal (IFN_MASK_STORE, 4, dataref_ptr, > - gimple_call_arg (stmt, 1), > - vec_mask, vec_rhs); > + ptr, vec_mask, vec_rhs); > vect_finish_stmt_generation (stmt, new_stmt, gsi); > if (i == 0) > STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; > @@ -2107,10 +2108,11 @@ vectorizable_mask_load_store (gimple *st > misalign = DR_MISALIGNMENT (dr); > set_ptr_info_alignment (get_ptr_info (dataref_ptr), align, > misalign); > + tree ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)), > + misalign ? misalign & -misalign : align); > new_stmt > = gimple_build_call_internal (IFN_MASK_LOAD, 3, dataref_ptr, > - gimple_call_arg (stmt, 1), > - vec_mask); > + ptr, vec_mask); > gimple_call_set_lhs (new_stmt, make_ssa_name (vec_dest)); > vect_finish_stmt_generation (stmt, new_stmt, gsi); > if (i == 0) > --- gcc/tree-data-ref.c.jj2015-11-09 13:39:32.0 +0100 > +++ gcc/tree-data-ref.c 2015-12-08 17:38:55.199094723 +0100 > @@ -3872,6 +3872,8 @@ get_references_in_stmt (gimple *stmt, ve >else if (stmt_code == GIMPLE_CALL) > { >unsigned i, n; > + tree ptr, type; > + unsigned int align; > >ref.is_read = false; >if (gimple_call_internal_p (stmt)) > @@ -3882,12 +3884,16 @@ get_references_in_stmt (gimple *stmt, ve > break; > ref.is_read = true; > case IFN_MASK_STORE: > - ref.ref = fold_build2 (MEM_REF, > -ref.is_read > -? TREE_TYPE (gimple_call_lhs (stmt)) > -: TREE_TYPE (gimple_call_arg (stmt, 3)), > -gimple_call_arg (stmt, 0), > -gimple_call_arg (stmt, 1)); > + ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)), 0); > + align = tree_to_shwi
Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.
On Tue, Dec 8, 2015 at 11:24 PM, Ajit Kumar Agarwalwrote: > Based on the comments on RFC patch this patch incorporates all the comments > from Jeff. Thanks Jeff for the valuable feedback. > > This patch enables the better register pressure estimate for Loop Invariant > code motion. This patch Calculate the Loop Liveness used for regs_used > used to calculate the register pressure used in the cost estimation. > > Loop Liveness is based on the following properties. we only require to > calculate the set of objects that are live at the birth or the header of the > loop. We > don't need to calculate the live through the Loop considering Live in and > Live out of all the basic blocks of the Loop. This is because the set of > objects. That > are live-in at the birth or header of the loop will be live-in at every node > in the Loop. > > If a v live out at the header of the loop then the variable is live-in at > every node in the Loop. To prove this, Consider a Loop L with header h such > that The > variable v defined at d is live-in at h. Since v is live at h, d is not part > of L. This follows from the dominance property, i.e. h is strictly dominated > by d. > Furthermore, there exists a path from h to a use of v which does not go > through d. For every node of the loop, p, since the loop is strongly connected > Component of the CFG, there exists a path, consisting only of nodes of L from > p to h. Concatenating those two paths prove that v is live-in and live-out Of > p. > > Also Calculate the Live-Out and Live-In for the exit edge of the loop. This > considers liveness for not only the Loop latch but also the liveness outside > the Loops. > > Bootstrapped and Regtested for x86_64 and microblaze target. > > There is an extra failure for the testcase gcc.dg/loop-invariant.c with the > change that looks correct to me. > > This is because the available_regs = 6 and the regs_needed = 1 and new_regs = > 0 and the regs_used = 10. As the reg_used that are based on the Liveness > given above is greater than the available_regs, then it's candidate of spill > and the estimate_register_pressure calculates the spill cost. This spill cost > is greater > than inv_cost and gain comes to be negative. The disables the loop invariant > for the above testcase. > > Disabling of the Loop invariant for the testcase loop-invariant.c with this > patch looks correct to me considering the calculation of available_regs in > cfgloopanal.c > is correct. You keep a lot of data (bitmaps) live just to use the number of bits set in the end. I'll note that this also increases the complexity of the pass which is enabled at -O1+ where -O1 should limit itself to O(n*log n) algorithms but live is quadratic. So I don't think doing this unconditionally is a good idea (and we have -fira-loop-pressure after all). Please watch not making -O1 worse again after we spent so much time making it suitable for all the weird autogenerated code. Richard. > ChangeLog: > 2015-12-09 Ajit Agarwal > > * loop-invariant.c > (calculate_loop_liveness): New. > (move_loop_invariants): Use of calculate_loop_liveness. > > Signed-off-by:Ajit Agarwal ajit...@xilinx.com > > --- > gcc/loop-invariant.c | 77 > ++ > 1 files changed, 65 insertions(+), 12 deletions(-) > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > index 53d1377..ac08594 100644 > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -1464,18 +1464,7 @@ find_invariants_to_move (bool speed, bool call_p) > registers used; we put some initial bound here to stand for > induction variables etc. that we do not detect. */ > { > - unsigned int n_regs = DF_REG_SIZE (df); > - > - regs_used = 2; > - > - for (i = 0; i < n_regs; i++) > - { > - if (!DF_REGNO_FIRST_DEF (i) && DF_REGNO_LAST_USE (i)) > - { > - /* This is a value that is used but not changed inside loop. */ > - regs_used++; > - } > - } > + regs_used = bitmap_count_bits (_DATA (curr_loop)->regs_live) + 2; > } > >if (! flag_ira_loop_pressure) > @@ -1966,7 +1955,63 @@ mark_ref_regs (rtx x) >} > } > > +/* Calculate the Loop Liveness used for regs_used used in > + heuristics to calculate the register pressure. */ > + > +static void > +calculate_loop_liveness (void) > +{ > + struct loop *loop; > + > + FOR_EACH_LOOP (loop, 0) > +if (loop->aux == NULL) > + { > +loop->aux = xcalloc (1, sizeof (struct loop_data)); > +bitmap_initialize (_DATA (loop)->regs_live, _obstack); > + } > + > + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) > + { > +int i; > +edge e; > +vec edges; > +edges = get_loop_exit_edges (loop); > + > +/* Loop Liveness is based on the following proprties. > + we only require to calculate the
Re: [Patch AArch64] Reinstate CANNOT_CHANGE_MODE_CLASS to fix pr67609
On Fri, Nov 27, 2015 at 01:01:01PM +, James Greenhalgh wrote: > This patch follow Richard Henderson's advice to tighten up > CANNOT_CHANGE_MODE_CLASS for AArch64 to avoid a simplification bug in > the middle-end. > > There is nothing AArch64-specific about the testcase which triggers this, > so I'll put it in the testcase for other targets. If you see a regression, > the explanation in the PR is much more thorough and correct than I can > reproduce here, so I'd recommend starting there. In short, target > maintainers need to: > > > forbid BITS_PER_WORD (64-bit) subregs of hard registers > > > BITS_PER_WORD. See the verbiage I added to the i386 backend for this. > > We removed the CANNOT_CHANGE_MODE_CLASS macro back in January 2015. Before > then, we used it to workaround bugs in big-endian vector support > ( https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01216.html ). Ideally, > we'd not need to bring this macro back, but if we can't fix the middle-end > bug this exposes, we need the workaround. > > For AArch64, doing this runs in to some trouble with two of our > instruction patterns - we end up with: > > (truncate:DI (reg:TF)) > > Which fails if it ever make it through to the simplify routines with > something nasty like: > > (and:DI (truncate:DI (reg:TF 32 v0 [ a ])) > (const_int 2305843009213693951 [0x1fff])) > > The simplify routines want to turn this around to look like: > > (truncate:DI (and:TF (reg:TF 32 v0 [ a ]) > (const_int 2305843009213693951 [0x1fff]))) > > Which then wants to further simplify the expression by first building > the constant in TF mode, and trunc_int_for_mode barfs: > > 0x7a38a5 trunc_int_for_mode(long, machine_mode) > .../gcc/explow.c:53 > > We can fix that by changing the patterns to use a zero_extract, which seems > more in line with what they actually express (extracting the two 64-bit > halves of a 128-bit value). > > Bootstrapped on aarch64-none-linux-gnu, and tested on aarch64-none-elf and > aarch64_be-none-elf without seeing any correctness regressions. > > OK? *ping* Thanks, James > > If so, we ought to get this backported to the release branches, the gcc-5 > backport applies clean (testing ongoing but looks OK so far) if the release > managers and AArch64 maintainers agree this is something that should be > backported this late in the 5.3 release cycle. > > Thanks, > James > > --- > 2015-11-27 James Greenhalgh> > * config/aarch64/aarch64-protos.h > (aarch64_cannot_change_mode_class): Bring back. > * config/aarch64/aarch64.c > (aarch64_cannot_change_mode_class): Likewise. > * config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Likewise. > * config/aarch64/aarch64.md (aarch64_movdi_low): Use > zero_extract rather than truncate. > (aarch64_movdi_high): Likewise. > > 2015-11-27 James Greenhalgh > > * gcc.dg/torture/pr67609.c: New. > > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index e0a050c..59d3da4 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -269,6 +269,9 @@ int aarch64_get_condition_code (rtx); > bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode); > int aarch64_branch_cost (bool, bool); > enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx); > +bool aarch64_cannot_change_mode_class (machine_mode, > +machine_mode, > +enum reg_class); > bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); > bool aarch64_constant_address_p (rtx); > bool aarch64_expand_movmem (rtx *); > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 3fe2f0f..fadb716 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -12408,6 +12408,24 @@ aarch64_vectorize_vec_perm_const_ok (machine_mode > vmode, >return ret; > } > > +/* Implement target hook CANNOT_CHANGE_MODE_CLASS. */ > +bool > +aarch64_cannot_change_mode_class (machine_mode from, > + machine_mode to, > + enum reg_class rclass) > +{ > + /* We cannot allow word_mode subregs of full vector modes. > + Otherwise the middle-end will assume it's ok to store to > + (subreg:DI (reg:TI 100) 0) in order to modify only the low 64 bits > + of the 128-bit register. However, after reload the subreg will > + be dropped leaving a plain DImode store. See PR67609 for a more > + detailed dicussion. In all other cases, we want to be premissive > + and return false. */ > + return (reg_classes_intersect_p (FP_REGS, rclass) > + && GET_MODE_SIZE (to) == UNITS_PER_WORD > + && GET_MODE_SIZE (from) > UNITS_PER_WORD); > +} > + > rtx > aarch64_reverse_mask (enum machine_mode mode) > { > diff --git
Re: [hsa 4/10] Merge of HSA branch
On Mon, Dec 07, 2015 at 12:21:22PM +0100, Martin Jambor wrote: > Subject: Make copy_gimple_seq_and_replace_locals copy seqs in omp clauses > > Hi, > > this is https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00477.html with > the early return requested by Jakub. Please refer to that previous > email for explanation why it is necessary. > > Thanks, > > 2015-12-03 Martin Jambor> > * tree-inline.c (duplicate_remap_omp_clause_seq): New function. > (replace_locals_op): Duplicate gimple sequences in OMP clauses. Ok, thanks. Jakub
[PATCH] Fix PR68806
This fixes vectorization re-start without SLP when SLP analysis removed an SLP instance - I was relying on the reduction chains still being in the SLP instances. The following properly detects this case and also handles SLP reductions properly (they can be vectorized non-SLP as well). Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2015-12-09 Richard BienerPR tree-optimization/68806 * tree-vect-loop.c (vect_analyze_loop_2): Properly detect reduction chains and ignore SLP reductions. * gcc.dg/torture/pr68806.c: New testcase. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 231451) +++ gcc/tree-vect-loop.c(working copy) @@ -2123,9 +2123,12 @@ again: if (!slp) return false; + /* If there are reduction chains re-trying will fail anyway. */ + if (! LOOP_VINFO_REDUCTION_CHAINS (loop_vinfo).is_empty ()) +return false; + /* Likewise if the grouped loads or stores in the SLP cannot be handled - via interleaving or lane instructions or if there were any SLP - reductions. */ + via interleaving or lane instructions. */ slp_instance instance; slp_tree node; unsigned i, j; @@ -2135,7 +2138,7 @@ again: vinfo = vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (SLP_INSTANCE_TREE (instance))[0]); if (! STMT_VINFO_GROUPED_ACCESS (vinfo)) - return false; + continue; vinfo = vinfo_for_stmt (STMT_VINFO_GROUP_FIRST_ELEMENT (vinfo)); unsigned int size = STMT_VINFO_GROUP_SIZE (vinfo); tree vectype = STMT_VINFO_VECTYPE (vinfo); Index: gcc/testsuite/gcc.dg/torture/pr68806.c === --- gcc/testsuite/gcc.dg/torture/pr68806.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr68806.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ + +int sad(const unsigned char *p1, long p2) +{ + int a = 0; + for (int y = 0; y < 16; y++) +{ + for (int x = 0; x < 12; x++) + a += p1[x]; + p1 += p2; +} + return a; +}
Re: [PATCH] Do not sanitize left shifts for -fwrapv (PR68418)
On 04/12/2015 23:48, Jeff Law wrote: >> >> Why would pointer types be shifted at all (at the ubsan level, >> which is basically the AST)? > BTW, if you argument is that we can never get into this code with a > shift of a pointer object, I'd like to see some kind of analysis to back > up that assertion -- which could be as simple as pointing to FE code > that issues an error if the user tried to do something like shift a > pointer object. You're right, I should have qualified that better. And actually there is an issue with this patch, though it is not pointers. There are only two call sites for ubsan_instrument_shift. In c/c-typeck.c: /* Remember whether we're doing << or >>. */ bool doing_shift = false; /* The expression codes of the data types of the arguments tell us whether the arguments are integers, floating, pointers, etc. */ code0 = TREE_CODE (type0); code1 = TREE_CODE (type1); switch (code) { ... case RSHIFT_EXPR: ... else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE) && code1 == INTEGER_TYPE) { doing_shift = true; ... } ... case LSHIFT_EXPR: ... else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE) && code1 == INTEGER_TYPE) { doing_shift = true; ... } ... } ... if ((flag_sanitize & (SANITIZE_SHIFT | SANITIZE_DIVIDE | SANITIZE_FLOAT_DIVIDE)) && do_ubsan_in_current_function () && (doing_div_or_mod || doing_shift) && !require_constant_value) { /* OP0 and/or OP1 might have side-effects. */ op0 = c_save_expr (op0); op1 = c_save_expr (op1); op0 = c_fully_fold (op0, false, NULL); op1 = c_fully_fold (op1, false, NULL); ... else if (doing_shift && (flag_sanitize & SANITIZE_SHIFT)) instrument_expr = ubsan_instrument_shift (location, code, op0, op1); } cp/typeck.c is the same but it doesn't handle code0 == FIXED_POINT_TYPE. But FIXED_POINT_TYPE is not an integral type, and thus it would fail the TYPE_OVERFLOW_WRAPS check with my patch. I'll post an updated patch that also removes all instrumentation in the case of fixed point types, similar to instrument_si_overflow. Thanks for the careful review! Paolo
Re: [PATCH] Make basic asm implicitly clobber memory, pr24414
On 12/09/2015 03:18 AM, Bernd Edlinger wrote: Furthermore there is a documented use for asm(""): The empty assembler string is used to make a function volatile, thus calls can not be optimized away. But I think it is not necessary to make this clobber anything, nor should it be an instruction scheduling barrier, as it used to be in the past. Making that change seems risky; best not to make assumptions about how these are used. In any case, ... or should I wait for the next stage1? I think that would be better. Bernd
Fix PR ada/66526
This is about the 3 -Wuninitialized warnings for the build of the Ada library: g-expect.adb: In function 'GNAT.EXPECT.SET_UP_CHILD_COMMUNICATIONS': g-expect.adb:1356:7: warning: 'INPUT' is used uninitialized in this function [-Wuninitialized] g-expect.adb:1357:7: warning: 'OUTPUT' is used uninitialized in this function [-Wuninitialized] g-expect.adb:1358:7: warning: 'ERROR' is used uninitialized in this function [-Wuninitialized] They are actually false positives, but the compiler will never be able to understand that because the uses are past a fork-like function call on Unix systems. Tested on x86-64/Linux, applied on the mainline. 2015-12-09 Eric BotcazouPR ada/66526 * g-expect.adb (Set_Up_Child_Communications): Add matching condition for uses of Input, Ouput and Error variables after the Execvp call. -- Eric BotcazouIndex: g-expect.adb === --- g-expect.adb (revision 231440) +++ g-expect.adb (working copy) @@ -1348,17 +1348,22 @@ package body GNAT.Expect is Portable_Execvp (Pid.Pid'Access, Cmd & ASCII.NUL, Args); - -- The following commands are not executed on Unix systems, and are only - -- required for Windows systems. We are now in the parent process. + -- The following lines are only required for Windows systems and will + -- not be executed on Unix systems, but we use the same condition as + -- above to avoid warnings on uninitialized variables on Unix systems. + -- We are now in the parent process. - -- Restore the old descriptors + if No_Fork_On_Target then - Dup2 (Input, GNAT.OS_Lib.Standin); - Dup2 (Output, GNAT.OS_Lib.Standout); - Dup2 (Error, GNAT.OS_Lib.Standerr); - Close (Input); - Close (Output); - Close (Error); + -- Restore the old descriptors + + Dup2 (Input, GNAT.OS_Lib.Standin); + Dup2 (Output, GNAT.OS_Lib.Standout); + Dup2 (Error, GNAT.OS_Lib.Standerr); + Close (Input); + Close (Output); + Close (Error); + end if; end Set_Up_Child_Communications; ---
[PATCH] Fix memory leaks in tree-vect-data-refs.c
Hi. This is simple follow-up of the previous patch that fixes last remaining leak in vectorizer. Patch can regbootstrap on x64_64-linux-gnu. Ready for trunk? Martin >From 0d61420eb49dec0f5d14108373a546a8f1b52571 Mon Sep 17 00:00:00 2001 From: marxinDate: Wed, 9 Dec 2015 10:14:00 +0100 Subject: [PATCH] Fix memory leaks in tree-vect-data-refs.c gcc/ChangeLog: 2015-12-09 Martin Liska * tree-vect-data-refs.c: Free an overwritten dataref. --- gcc/tree-vect-data-refs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 8810af1..4c566c8 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -3847,6 +3847,7 @@ again: return false; } + free_data_ref (datarefs[i]); datarefs[i] = dr; STMT_VINFO_GATHER_SCATTER_P (stmt_info) = gatherscatter; } -- 2.6.3
[PATCH] GCC-5 backport of PR lto/65948
Hi. I would like to backport forgotten patch to GCC-5 branch. Bootstrap and regression tests have been running, ready after it finishes? Thanks, Martin >From 626df2a92db87f7c0668c7f2902f314ed6a9bc4c Mon Sep 17 00:00:00 2001 From: marxinDate: Wed, 9 Dec 2015 12:49:22 +0100 Subject: [PATCH] Backport PR lto/65948 --- gcc/ChangeLog| 9 + gcc/ipa-devirt.c | 1 + 2 files changed, 10 insertions(+) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index ce6f865..067f261 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2015-12-09 Martin Liska + + Backport from mainline + 2015-04-30 Jan Hubicka + + PR lto/65948 + * ipa-devirt.c (odr_types_equivalent_p): NULLPTR_TYPE is equivalent + to itself. + 2015-12-07 Jakub Jelinek Backport from mainline diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c index e319785..c165bce 100644 --- a/gcc/ipa-devirt.c +++ b/gcc/ipa-devirt.c @@ -1536,6 +1536,7 @@ odr_types_equivalent_p (tree t1, tree t2, bool warn, bool *warned, break; } case VOID_TYPE: +case NULLPTR_TYPE: break; default: -- 2.6.3
[PATCH] Fix PR68583 some more
This time with a simplified testcase - the remaining issue with the original one is if-conversion looking at DR_REF and nothing hoisting the address compute of a[i+1] in both arms making it see they are the same (yeah, SCEV to the rescue, but not in this patch). This patch re-organizes things so that we apply if conversion of stores when it is safe (and not only when -ftree-loop-if-convert-stores is given). And when it is not safe (aka store-data-races) we rely on -ftree-loop-if-convert-stores - which I'm going to change in a followup patch to use --param allow-store-data-races like we do for LIM (and incidentially enable for -Ofast). The patch changes the flow throughout if-conversion to remember if we if-converted a store (because that needs to trigger some different code-paths) and re-uses any_mask_load_store for that which also means we apply versioning if we if-convert a store. IMHO a good thing to avoid spurious store-data-races (if requested) in non-vectorized code. if-conversion of stores in scalar code should remain to RTL if-conversion. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. I've verified that SPEC CPU 2006 still builds and runs properly on a AVX2 machine with flag_tree_loop_if_convert_stores changed to PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES) and -Ofast -march=native. AFAICS if-conversion still has some unguarded store data races with respect to the handled-components it strips from the accesses before determining them as "same". So I suppose that strictly speaking we need to split the hash map into two when we don't allow store data races (one for reads doing the stripping and one for stores not doing it). We could also simply keep a flag "all DR_REFs same". That said, I'd like to get rid of the separate if-conversion flag for stores. Richard. 2015-12-09 Richard BienerPR tree-optimization/68583 * tree-if-conv.c (if_convertible_phi_p): Drop flag_tree_loop_if_convert_stores check in favor of the existing any_mask_load_store check. (insert_gimplified_predicates): Likewise. (combine_blocks): Likewise. (tree_if_conversion): Likewise. (ifcvt_memrefs_wont_trap): Properly check flag_tree_loop_if_convert_stores in all places that can end up introducing store-data-races. (if_convertible_gimple_assign_stmt_p): Remove restriction on flag_tree_loop_if_convert_stores for stores we can if-convert without introducing store-data-races. Force versioning for all if-converted stores. * gcc.dg/tree-ssa/ifc-pr68583.c: New testcase. * gcc.dg/vect/vect-72.c: Adjust. * gcc.dg/vect/vect-cselim-2.c: Likewise. * gcc.dg/vect/vect-strided-store-a-u8-i2.c: Likewise. Index: gcc/tree-if-conv.c === *** gcc/tree-if-conv.c (revision 231444) --- gcc/tree-if-conv.c (working copy) *** bb_with_exit_edge_p (struct loop *loop, *** 517,523 PHI is not if-convertible if: - it has more than 2 arguments. !When the flag_tree_loop_if_convert_stores is not set, PHI is not if-convertible if: - a virtual PHI is immediately used in another PHI node, - there is a virtual PHI in a BB other than the loop->header. --- 517,523 PHI is not if-convertible if: - it has more than 2 arguments. !When we didn't see if-convertible stores, PHI is not if-convertible if: - a virtual PHI is immediately used in another PHI node, - there is a virtual PHI in a BB other than the loop->header. *** if_convertible_phi_p (struct loop *loop, *** 545,554 } } ! if (flag_tree_loop_if_convert_stores || any_mask_load_store) return true; ! /* When the flag_tree_loop_if_convert_stores is not set, check that there are no memory writes in the branches of the loop to be if-converted. */ if (virtual_operand_p (gimple_phi_result (phi))) --- 545,554 } } ! if (any_mask_load_store) return true; ! /* When there were no if-convertible stores, check that there are no memory writes in the branches of the loop to be if-converted. */ if (virtual_operand_p (gimple_phi_result (phi))) *** ifcvt_memrefs_wont_trap (gimple *stmt, v *** 713,728 to unconditionally. */ if (base_master_dr && DR_BASE_W_UNCONDITIONALLY (*base_master_dr)) ! return true; else { /* or the base is know to be not readonly. */ tree base_tree = get_base_address (DR_REF (a)); if (DECL_P (base_tree) && decl_binds_to_current_def_p (base_tree) ! && flag_tree_loop_if_convert_stores ! && !TREE_READONLY (base_tree)) ! return true; } } return false; --- 713,727 to unconditionally.
Re: [Patch AArch64] Reinstate CANNOT_CHANGE_MODE_CLASS to fix pr67609
On 27 November 2015 at 13:01, James Greenhalghwrote: > 2015-11-27 James Greenhalgh > > * config/aarch64/aarch64-protos.h > (aarch64_cannot_change_mode_class): Bring back. > * config/aarch64/aarch64.c > (aarch64_cannot_change_mode_class): Likewise. > * config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Likewise. > * config/aarch64/aarch64.md (aarch64_movdi_low): Use > zero_extract rather than truncate. > (aarch64_movdi_high): Likewise. > > 2015-11-27 James Greenhalgh > > * gcc.dg/torture/pr67609.c: New. > + detailed dicussion. In all other cases, we want to be premissive s/premissive/permissive/ OK /Marcus
Re: [hsa 5/10] OpenMP lowering/expansion changes (gridification)
On Mon, Dec 07, 2015 at 12:22:43PM +0100, Martin Jambor wrote: > it creates a copy of the entire target body and expands it slightly > differently for concurrent execution on a GPU. Note that both teams > and distribute constructs are mandatory. Moreover, currently the > distribute has to be in a combined statement with the inner for > construct. And there are quite a few other restrictions which I hope The standard calls those composite constructs, and I bet for gridification you want that restriction always, without composite distribute parallel for there are two different unrelated loops. > * builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New. > (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed. > (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New. > * fortran/types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New. > (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed. > (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New. Fortran has its own ChangeLog file. > @@ -556,9 +558,9 @@ DEF_FUNCTION_TYPE_9 > (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT, >BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG, >BT_BOOL, BT_UINT, BT_PTR, BT_INT) > > -DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT, > - BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR, > - BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT) > +DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR, > + BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR, > + BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR) There shouldn't be an empty line in between this DEF_FUNCTION_TYPE_9 and the previous one. > @@ -221,9 +223,9 @@ DEF_FUNCTION_TYPE_9 > (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT, >BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG, >BT_BOOL, BT_UINT, BT_PTR, BT_INT) > > -DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT, > +DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR, > BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR, > - BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT) > + BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR) > > DEF_FUNCTION_TYPE_11 > (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG, > BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, Ditto. > --- a/gcc/gimple.def > +++ b/gcc/gimple.def > @@ -369,13 +369,17 @@ DEFGSCODE(GIMPLE_OMP_TARGET, "gimple_omp_target", > GSS_OMP_PARALLEL_LAYOUT) > /* GIMPLE_OMP_TEAMS represents #pragma omp teams > BODY is the sequence of statements inside the single section. > CLAUSES is an OMP_CLAUSE chain holding the associated clauses. */ > -DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_SINGLE_LAYOUT) > +DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_TEAMS_LAYOUT) Why? > +/* GIMPLE_OMP_GPUKERNEL represents a parallel loop lowered for > execution > + on a GPU. It is an artificial statement created by omp lowering. */ > +DEFGSCODE(GIMPLE_OMP_GPUKERNEL, "gimple_omp_gpukernel", GSS_OMP) Why do you call it GPUKERNEL or KERNEL_BODY when you really mean gridified body and gridified loop? I mean, what is GPU specific about it? PTX is unlikely going to use that. And kernel is a wide term. > @@ -622,8 +623,14 @@ struct GTY((tag("GSS_OMP_FOR"))) >/* [ WORD 11 ] > Pre-body evaluated before the loop body begins. */ >gimple_seq pre_body; > + > + /* [ WORD 12 ] > + If set, this statement is part of a gridified kernel, its clauses need > to > + be scanned and lowered but the statement should be discarded after > + lowering. */ > + bool kernel_phony; Ugh no, flags should go into GF_OMP_*. > @@ -643,6 +660,12 @@ struct GTY((tag("GSS_OMP_PARALLEL_LAYOUT"))) >/* [ WORD 10 ] > Shared data argument. */ >tree data_arg; > + > + /* [ WORD 11 ] */ > + /* If set, this statement is part of a gridified kernel, its clauses need > to > + be scanned and lowered but the statement should be discarded after > + lowering. */ > + bool kernel_phony; > }; Likewise. As for omp-low.c changes, the file is already large enough that it would be nice if it is easy to find out what routines are for gridification purposes only, use some special prefix (grid_*, ompgrid_*, ...) for all such functions? > @@ -1761,6 +1786,8 @@ fixup_child_record_type (omp_context *ctx) > { >tree f, type = ctx->record_type; > > + if (!ctx->receiver_decl) > +return; So when is receiver_decl NULL? > @@ -2113,6 +2140,14 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) > } > break; > > + case OMP_CLAUSE__GRIDDIM_: > + if (ctx->outer) > + { > + scan_omp_op
RE: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Wednesday, December 09, 2015 4:06 PM To: Ajit Kumar Agarwal Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion. On Tue, Dec 8, 2015 at 11:24 PM, Ajit Kumar Agarwalwrote: > Based on the comments on RFC patch this patch incorporates all the comments > from Jeff. Thanks Jeff for the valuable feedback. > > This patch enables the better register pressure estimate for Loop > Invariant code motion. This patch Calculate the Loop Liveness used for > regs_used used to calculate the register pressure used in the cost > estimation. > > Loop Liveness is based on the following properties. we only require > to calculate the set of objects that are live at the birth or the > header of the loop. We don't need to calculate the live through the Loop > considering Live in and Live out of all the basic blocks of the Loop. This is > because the set of objects. That are live-in at the birth or header of the > loop will be live-in at every node in the Loop. > > If a v live out at the header of the loop then the variable is live-in > at every node in the Loop. To prove this, Consider a Loop L with header h > such that The variable v defined at d is live-in at h. Since v is live at h, > d is not part of L. This follows from the dominance property, i.e. h is > strictly dominated by d. > Furthermore, there exists a path from h to a use of v which does not > go through d. For every node of the loop, p, since the loop is strongly > connected Component of the CFG, there exists a path, consisting only of nodes > of L from p to h. Concatenating those two paths prove that v is live-in and > live-out Of p. > > Also Calculate the Live-Out and Live-In for the exit edge of the loop. This > considers liveness for not only the Loop latch but also the liveness outside > the Loops. > > Bootstrapped and Regtested for x86_64 and microblaze target. > > There is an extra failure for the testcase gcc.dg/loop-invariant.c with the > change that looks correct to me. > > This is because the available_regs = 6 and the regs_needed = 1 and > new_regs = 0 and the regs_used = 10. As the reg_used that are based > on the Liveness given above is greater than the available_regs, then it's > candidate of spill and the estimate_register_pressure calculates the spill > cost. This spill cost is greater than inv_cost and gain comes to be negative. > The disables the loop invariant for the above testcase. > > Disabling of the Loop invariant for the testcase loop-invariant.c with > this patch looks correct to me considering the calculation of available_regs > in cfgloopanal.c is correct. >>You keep a lot of data (bitmaps) live just to use the number of bits set in >>the end. >>I'll note that this also increases the complexity of the pass which is >>enabled at -O1+ where >>-O1 should limit itself to O(n*log n) algorithms but live is quadratic. >.So I don't think doing this unconditionally is a good idea (and we have >-fira-loop-pressure after all). >>Please watch not making -O1 worse again after we spent so much time making it >>suitable for all the weird autogenerated code. I can also implement without keeping the bitmaps live data and would not require to set the bits in the end. I am interested in the Liveness in the loop header and the loop exit which I can calculate where I am setting the regs_used for the curr_loop. This will not require to set and keep the bitmaps for liveness and also make it available for the curr_loop that are candidates of loop Invariant. Thanks & Regards Ajit Richard. > ChangeLog: > 2015-12-09 Ajit Agarwal > > * loop-invariant.c > (calculate_loop_liveness): New. > (move_loop_invariants): Use of calculate_loop_liveness. > > Signed-off-by:Ajit Agarwal ajit...@xilinx.com > > --- > gcc/loop-invariant.c | 77 > ++ > 1 files changed, 65 insertions(+), 12 deletions(-) > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index > 53d1377..ac08594 100644 > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -1464,18 +1464,7 @@ find_invariants_to_move (bool speed, bool call_p) > registers used; we put some initial bound here to stand for > induction variables etc. that we do not detect. */ > { > - unsigned int n_regs = DF_REG_SIZE (df); > - > - regs_used = 2; > - > - for (i = 0; i < n_regs; i++) > - { > - if (!DF_REGNO_FIRST_DEF (i) && DF_REGNO_LAST_USE (i)) > - { > - /* This is a value that is used but not changed inside loop. */ > - regs_used++; > - } > - } > + regs_used = bitmap_count_bits
Re: [PATCH] Fix PR68583
On Wed, 9 Dec 2015, Richard Biener wrote: > > Well, not really in the end - the actual testcase would need code > hoisting to make the refs equal enough for ifcvt. Making the testcase > simpler doesn't seem to capture the issue so for now without one > (I'll keep trying). > > Bootstrapped and tested on x86_64-unknown-linux-gnu. This fixed PR68417, I committed the testcase below. Richard. 2015-12-09 Richard BienerPR tree-optimization/68417 * gcc.dg/vect/pr68417.c: New testcase. Index: gcc/testsuite/gcc.dg/vect/pr68417.c === --- gcc/testsuite/gcc.dg/vect/pr68417.c (revision 0) +++ gcc/testsuite/gcc.dg/vect/pr68417.c (working copy) @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_condition } */ + +#define N 16 + +typedef struct { + int x; + int y; +} Point; + +void +foo(Point *p1, Point *p2, Point *p3, int *data) +{ + int m, m1, m2; + int i; + + for (i = 0; i < N; i++) { +m = *data++; + +m1 = p1->x - m; +m2 = p2->x + m; + +p3->y = (m1 >= m2) ? p1->y : p2->y; + +p1++; +p2++; +p3++; + } +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
Re: [PATCH] GCC-5 backport of PR lto/65948
On 12/09/2015 12:50 PM, Martin Liška wrote: I would like to backport forgotten patch to GCC-5 branch. Bootstrap and regression tests have been running, ready after it finishes? Ok. Bernd
Re: [hsa 6/10] Pass manager changes
On Mon, Dec 07, 2015 at 12:24:05PM +0100, Martin Jambor wrote: > the pass manager changes required for HSA have already been committed > to trunk so all that remains are these additions to the pass pipeline. > > Thanks, > > Martin > > > 2015-12-04 Martin Jambor> Martin Liska > > * passes.def: Schedule pass_ipa_hsa and pass_gen_hsail. > * tree-pass.h (make_pass_gen_hsail): Declare. > (make_pass_ipa_hsa): Likewise. I'll defer this to richi. Jakub
Re: [hsa 2/10] Modifications to libgomp proper
On Mon, Dec 07, 2015 at 12:19:57PM +0100, Martin Jambor wrote: > +/* Flag set when the subsequent element in the device-specific argument > + values. */ > +#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM (1 << 7) > + > +/* Bitmask to apply to a target argument to find out the value identifier. > */ > +#define GOMP_TARGET_ARG_ID_MASK (((1 << 8) - 1) << 8) > +/* Target argument index of NUM_TEAMS. */ > +#define GOMP_TARGET_ARG_NUM_TEAMS(1 << 8) > +/* Target argument index of THREAD_LIMIT. */ > +#define GOMP_TARGET_ARG_THREAD_LIMIT (2 << 8) I meant that these two would be just special, passed as the first two pointers in the array, without the markup. Because, otherwise you either need to use GOMP_TARGET_ARG_SUBSEQUENT_PARAM for these always, or for 32-bit arches and for 64-bit ones shift often at runtime. Having the markup even for them is perhaps cleaner, but less efficient, so if you really want to go that way, please make sure you handle it properly for 32-bit pointers architectures though. num_teams or thread_limit could be > 32767 or > 65535. > -static void > -gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum, > -void **hostaddrs, size_t *sizes, > -unsigned short *kinds) > +static void * > +gomp_target_unshare_firstprivate (size_t mapnum, void **hostaddrs, > + size_t *sizes, unsigned short *kinds) > { >size_t i, tgt_align = 0, tgt_size = 0; >char *tgt = NULL; > @@ -1281,7 +1282,7 @@ gomp_target_fallback_firstprivate (void (*fn) (void *), > size_t mapnum, >} >if (tgt_align) > { > - tgt = gomp_alloca (tgt_size + tgt_align - 1); > + tgt = gomp_malloc (tgt_size + tgt_align - 1); I don't like using gomp_malloc here, either copy/paste the function, or create separate inline functions for the two loops, one for the first loop which returns you tgt_align and tgt_size, and another for the stuff after the allocation. Then you can use those two inline functions to implement both gomp_target_fallback_firstprivate which will use alloca, and gomp_target_unshare_firstprivate which will use gomp_malloc instead. > @@ -1356,6 +1377,11 @@ GOMP_target (int device, void (*fn) (void *), const > void *unused, > and several arguments have been added: > FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h. > DEPEND is array of dependencies, see GOMP_task for details. > + ARGS is a pointer to an array consisting of NUM_TEAMS, THREAD_LIMIT and a > + variable number of device-specific arguments, which always take two > elements > + where the first specifies the type and the second the actual value. The > + last element of the array is a single NULL. Note, here you document NUM_TEAMS and THREAD_LIMIT as special values, not encoded. > @@ -1473,6 +1508,7 @@ GOMP_target_data (int device, const void *unused, > size_t mapnum, >struct gomp_device_descr *devicep = resolve_device (device); > >if (devicep == NULL > + || (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) >|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) Would be nice to have some consistency in the order of capabilities checks. Usually you check SHARED_MEM after OPENMP_400, so perhaps do it this way here too. > @@ -1741,23 +1784,38 @@ gomp_target_task_fn (void *data) > >if (ttask->state == GOMP_TARGET_TASK_FINISHED) > { > - gomp_unmap_vars (ttask->tgt, true); > + if (ttask->tgt) > + gomp_unmap_vars (ttask->tgt, true); > return false; > } This doesn't make sense. For the GOMP_OFFLOAD_CAP_SHARED_MEM case, unless you want to run the free (ttask->firstprivate_copies); as a task, you shouldn't be queing the target task again for further execution, instead it should just be handled like a finished task at that point. The reason why for XeonPhi or PTX gomp_target_task_fn is run with GOMP_TARGET_TASK_FINISHED state is that gomp_unmap_vars can perform IO actions and doing it with the tasking lock held is highly undesirable. > > - void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn); > - ttask->tgt > - = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL, > - ttask->sizes, ttask->kinds, true, > - GOMP_MAP_VARS_TARGET); > + bool shared_mem; > + if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) > + { > + shared_mem = true; > + ttask->tgt = NULL; > + ttask->firstprivate_copies > + = gomp_target_unshare_firstprivate (ttask->mapnum, ttask->hostaddrs, > + ttask->sizes, ttask->kinds); > + } > + else > + { > + shared_mem = false; > + ttask->tgt = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, > + NULL, ttask->sizes, ttask->kinds,
Re: [hsa 3/10] HSA libgomp plugin
On Mon, Dec 07, 2015 at 12:20:49PM +0100, Martin Jambor wrote: > +#include > +#include > +#include > +#include > +#include "libgomp-plugin.h" > +#include "gomp-constants.h" > +#include "hsa.h" > +#include "hsa_ext_finalize.h" If these 2 headers are coming from outside of gcc, better use <> for them instead of "", and better place them above the libgomp specific ones. > +#include "dlfcn.h" Ditto. > +/* Flag to decide whether print to stderr information about what is going on. > + Set in init_debug depending on environment variables. */ > + > +static bool debug; > + > +/* Flag to decide if the runtime should suppress a possible fallback to host > + execution. */ > + > +static bool suppress_host_fallback; > + > +/* Initialize debug and suppress_host_fallback according to the environment. > */ > + > +static void > +init_enviroment_variables (void) > +{ > + if (getenv ("HSA_DEBUG")) > +debug = true; > + else > +debug = false; > + > + if (getenv ("HSA_SUPPRESS_HOST_FALLBACK")) > +suppress_host_fallback = true; > + else > +suppress_host_fallback = false; Wonder whether we want these custom env vars in suid programs, allowing users to change behavior might be undesirable. So perhaps use secure_getenv instead of getenv if found by configure? > + > + const char* hsa_error; Space before * instead of after it (multiple spots). > + hsa_status_string (status, _error); > + > + unsigned l = strlen (hsa_error); > + > + char *err = GOMP_PLUGIN_malloc (sizeof (char) * l); sizeof (char) is 1, please don't multiply by it, that is just confusing. It makes sense in macros where you don't know what the type really is, but not here. > + memcpy (err, hsa_error, l - 1); > + err[l] = '\0'; > + > + fprintf (stderr, "HSA warning: %s (%s)\n", str, err); Why do you allocate err at all, if you free it right away? Can't you use hsa_error directly instead? > + > + free (err); > +static void > +hsa_fatal (const char *str, hsa_status_t status) > +{ > + const char* hsa_error; > + hsa_status_string (status, _error); > + GOMP_PLUGIN_fatal ("HSA fatal error: %s (%s)", str, hsa_error); Like you do e.g. here? > +/* Callback of dispatch queues to report errors. */ > + > +static void > +queue_callback(hsa_status_t status, hsa_queue_t* queue __attribute__ > ((unused)), Missing space before (. > +/* Callback of hsa_agent_iterate_regions. Determine if a memory REGION can > be > + used for kernarg allocations and if so write it to the memory pointed to > by > + DATA and break the query. */ > + > +static hsa_status_t get_kernarg_memory_region (hsa_region_t region, void* > data) Missing newline between return type and function name. > + hsa_region_t* ret = (hsa_region_t*) data; Two spots with wrong formatting above. > +{ > + if (agent->first_module) > + agent->first_module->prev = module; Wrong indentation. > + unsigned size = end - start; > + char *buf = (char *) malloc (size); > + memcpy (buf, start, size); malloc could return NULL and you crash. Should this use GOMP_PLUGIN_malloc instead? > + struct GOMP_hsa_kernel_dispatch *shadow = GOMP_PLUGIN_malloc_cleared > +(sizeof (struct GOMP_hsa_kernel_dispatch)); Formatting, = should go on the next line, and if even that doesn't help, maybe use sizeof (*shadow)? > + > + shadow->queue = agent->command_q; > + shadow->omp_data_memory = omp_data_size > 0 > +? GOMP_PLUGIN_malloc (omp_data_size) : NULL; = should go on the next line. > + unsigned dispatch_count = kernel->dependencies_count; > + shadow->kernel_dispatch_count = dispatch_count; > + > + shadow->children_dispatches = GOMP_PLUGIN_malloc > +(dispatch_count * sizeof (struct GOMP_hsa_kernel_dispatch *)); Likewise. > +indent_stream (FILE *f, unsigned indent) > +{ > + for (int i = 0; i < indent; i++) > +fputc (' ', f); Perhaps fprintf (f, "%*s", indent, ""); instead? > + struct GOMP_hsa_kernel_dispatch *shadow = create_single_kernel_dispatch > +(kernel, omp_data_size); Formatting issues again. > + shadow->omp_num_threads = 64; > + shadow->debug = 0; > + shadow->omp_level = kernel->gridified_kernel_p ? 1 : 0; > + > + /* Create kernel dispatch data structures. We do not allow to have > + a kernel dispatch with depth bigger than one. */ > + for (unsigned i = 0; i < kernel->dependencies_count; i++) > +{ > + struct kernel_info *dependency = get_kernel_for_agent > + (kernel->agent, kernel->dependencies[i]); > + shadow->children_dispatches[i] = create_single_kernel_dispatch > + (dependency, omp_data_size); > + shadow->children_dispatches[i]->queue = Never put = at the end of line. > + kernel->agent->kernel_dispatch_command_q; Jakub
Re: [PATCH] Fix memory leaks in tree-vect-data-refs.c
On Wed, Dec 9, 2015 at 12:14 PM, Martin Liškawrote: > Hi. > > This is simple follow-up of the previous patch that fixes last remaining > leak in vectorizer. > > Patch can regbootstrap on x64_64-linux-gnu. > > Ready for trunk? Ok. Richard. > Martin
Re: [C++ Patch] PR 60218
OK
RE: [PATCH] [ARC] Add support for atomic memory built-in.
I will add this text before "*memory_barrier" pattern: ;; For ARCHS, we use a hardware data memory barrier that waits for ;; completion of current data memory operations before initiating ;; similar data memory operations. Once done, I will commit it. Thanks, Claudiu > > Tested with dg.exp (when passing -matomic to gcc compiler line, the atomic > tests are also successfully executed). The comment before "*memory_barrier" could use some elaboration on what it does for TARGET_HS. Otherwise, this is OK.
Re: [PATCH] Fix phiopt ICE in Factor conversion in COND_EXPR (PR tree-optimization/66949)
On Wed, Dec 9, 2015 at 3:22 PM, Marek Polacekwrote: > On Wed, Dec 09, 2015 at 11:41:44AM +0100, Richard Biener wrote: >> But I don't see why we have the asserts at all - they seem to be originated >> from >> development. IMHO factor_out_conditonal_conversion should simply return >> the new PHI it generates instead of relying on >> signle_non_signelton_phi_for_edges >> to return it. > > Ok, that seems to work. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? Ok. Thanks, Richard. > 2015-12-09 Marek Polacek > > PR tree-optimization/66949 > * tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Don't call > single_non_singleton_phi_for_edges to get the PHI from > factor_out_conditional_conversion. Use NULL_TREE instead of NULL. > (factor_out_conditional_conversion): Adjust declaration. Make it > return the newly-created PHI. > > * gcc.dg/torture/pr66949-1.c: New test. > * gcc.dg/torture/pr66949-2.c: New test. > > diff --git gcc/testsuite/gcc.dg/torture/pr66949-1.c > gcc/testsuite/gcc.dg/torture/pr66949-1.c > index e69de29..1b765bc 100644 > --- gcc/testsuite/gcc.dg/torture/pr66949-1.c > +++ gcc/testsuite/gcc.dg/torture/pr66949-1.c > @@ -0,0 +1,28 @@ > +/* PR tree-optimization/66949 */ > +/* { dg-do compile } */ > + > +int a, *b = , c; > + > +unsigned short > +fn1 (unsigned short p1, unsigned int p2) > +{ > + return p2 > 1 || p1 >> p2 ? p1 : p1 << p2; > +} > + > +void > +fn2 () > +{ > + int *d = > + for (a = 0; a < -1; a = 1) > +; > + if (a < 0) > +c = 0; > + *b = fn1 (*d || c, *d); > +} > + > +int > +main () > +{ > + fn2 (); > + return 0; > +} > diff --git gcc/testsuite/gcc.dg/torture/pr66949-2.c > gcc/testsuite/gcc.dg/torture/pr66949-2.c > index e69de29..e6250a3 100644 > --- gcc/testsuite/gcc.dg/torture/pr66949-2.c > +++ gcc/testsuite/gcc.dg/torture/pr66949-2.c > @@ -0,0 +1,23 @@ > +/* PR tree-optimization/66949 */ > +/* { dg-do compile } */ > + > +char a; > +int b, c, d; > +extern int fn2 (void); > + > +short > +fn1 (short p1, short p2) > +{ > + return p2 == 0 ? p1 : p1 / p2; > +} > + > +int > +main (void) > +{ > + char e = 1; > + int f = 7; > + c = a >> f; > + b = fn1 (c, 0 < d <= e && fn2 ()); > + > + return 0; > +} > diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c > index 344cd2f..d7e8aa0 100644 > --- gcc/tree-ssa-phiopt.c > +++ gcc/tree-ssa-phiopt.c > @@ -49,7 +49,7 @@ along with GCC; see the file COPYING3. If not see > static unsigned int tree_ssa_phiopt_worker (bool, bool); > static bool conditional_replacement (basic_block, basic_block, > edge, edge, gphi *, tree, tree); > -static bool factor_out_conditional_conversion (edge, edge, gphi *, tree, > tree); > +static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, > tree); > static int value_replacement (basic_block, basic_block, > edge, edge, gimple *, tree, tree); > static bool minmax_replacement (basic_block, basic_block, > @@ -310,19 +310,19 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool > do_hoist_loads) > > /* Something is wrong if we cannot find the arguments in the PHI > node. */ > - gcc_assert (arg0 != NULL && arg1 != NULL); > + gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE); > > - if (factor_out_conditional_conversion (e1, e2, phi, arg0, arg1)) > + gphi *newphi = factor_out_conditional_conversion (e1, e2, phi, > + arg0, arg1); > + if (newphi != NULL) > { > + phi = newphi; > /* factor_out_conditional_conversion may create a new PHI in > BB2 and eliminate an existing PHI in BB2. Recompute values > that may be affected by that change. */ > - phis = phi_nodes (bb2); > - phi = single_non_singleton_phi_for_edges (phis, e1, e2); > - gcc_assert (phi); > arg0 = gimple_phi_arg_def (phi, e1->dest_idx); > arg1 = gimple_phi_arg_def (phi, e2->dest_idx); > - gcc_assert (arg0 != NULL && arg1 != NULL); > + gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE); > } > > /* Do the replacement of conditional if it can be done. */ > @@ -402,9 +402,9 @@ replace_phi_edge_with_variable (basic_block cond_block, > > /* PR66726: Factor conversion out of COND_EXPR. If the arguments of the PHI > stmt are CONVERT_STMT, factor out the conversion and perform the > conversion > - to the result of PHI stmt. */ > + to the result of PHI stmt. Return the newly-created PHI, if any. */ > > -static bool > +static gphi * > factor_out_conditional_conversion (edge e0, edge e1, gphi *phi, >tree arg0, tree arg1) > { > @@ -421,7 +421,7 @@ factor_out_conditional_conversion (edge e0, edge e1,
Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
On 08/12/15 09:25, Tobias Burnus wrote: On Mon, Dec 07, 2015 at 02:09:22PM +, Matthew Wahab wrote: I wonder whether using __asm__ __volatile__ ("":::"memory"); would be sufficient as it has a way lower overhead than __sync_synchronize(). I don't know anything about Fortran or coarrays and I'm curious whether this affects architectures with weak memory models. Is the barrier only needed to stop reordering by the compiler or is does it also need to stop reordering by the hardware? Short answer: I think no mfence is needed as either the communication is local (to the thread/process) - in which case the hardware will act correctly - or the communication is remote (different thread, process, communication to different computer via interlink [ethernet, infiniband, ...]); and in the later case, the communication library has to deal with it. Thanks for explaining this, it made things clear. Based on your description, I agree that hardware reordering shouldn't be a problem. and the (main) program code (slightly trimmed): static void * restrict caf_token.0; static integer(kind=4) * restrict var; void _caf_init.1 (void); *var = 4; desc.3.data = 42; _gfortran_caf_send (caf_token.0, 0B /* offset */ var, _gfortran_caf_this_image (0), , 0B, , 4, 4, 0); __asm__ __volatile__("":::"memory"); // new tmp = *var; The problem is that in that case the compiler does not know that "_gfortran_caf_send (caf_token.0," can modify "*var". Is the restrict attribute on var correct? From what you say, it sounds like *var could be accessed through other pointers (assuming restrict has the same meaning as in C). Matthew
RE: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.
-Original Message- From: Bernd Schmidt [mailto:bschm...@redhat.com] Sent: Wednesday, December 09, 2015 7:34 PM To: Ajit Kumar Agarwal; Richard Biener Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion. On 12/09/2015 12:22 PM, Ajit Kumar Agarwal wrote: > > This is because the available_regs = 6 and the regs_needed = 1 and > new_regs = 0 and the regs_used = 10. As the reg_used that are based > on the Liveness given above is greater than the available_regs, then > it's candidate of spill and the estimate_register_pressure calculates > the > spill cost. This spill cost is greater than inv_cost and gain > comes to be > negative. The disables the loop invariant for the above > testcase. >>As far as I can tell this loop does not lead to a spill. Hence, failure of >>this testcase would suggest there is something wrong with your idea. As far as I can see there is nothing wrong with the idea, the existing implementation of the calculation of available_regs results to 6 which is not the case for this testcase. The estimate of regs_used based on the Liveness ( idea behind this patch) is correct, but the incorrect estimate of the available_regs (6) results in the gain to be negative. >> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) { >>Formatting. >> +/* Loop Liveness is based on the following proprties. >>"properties" I will incorporate the change. >> + we only require to calculate the set of objects that are live at >> + the birth or the header of the loop. >> + We don't need to calculate the live through the Loop considering >> + Live in and Live out of all the basic blocks of the Loop. This is >> + because the set of objects. That are live-in at the birth or header >> + of the loop will be live-in at every node in the Loop. >> + If a v live out at the header of the loop then the variable is >> live-in >> + at every node in the Loop. To prove this, Consider a Loop L with >> header >> + h such that The variable v defined at d is live-in at h. Since v is >> live >> + at h, d is not part of L. This follows from the dominance property, >> i.e. >> + h is strictly dominated by d. Furthermore, there exists a path from >> h to >> + a use of v which does not go through d. For every node of the loop, >> p, >> + since the loop is strongly connected Component of the CFG, there >> exists >> + a path, consisting only of nodes of L from p to h. Concatenating >> those >> + two paths prove that v is live-in and live-out Of p. */ >>Please Refrain From Randomly Capitalizing Words, and please also fix the >>grammar ("set of objects. That are live-in"). These problems make this >>comment (and also your emails) >>extremely hard to read. >>Partly for that reason, I can't quite make up my mind whether this patch >>makes things better or just different. The testcase failure makes me think >>it's probably not an improvement. I'll correct it. I am seeing the gains for Mibench/EEMBC benchmarks for Microblaze target with this patch. I will run SPEC CPU 2000 benchmarks for i386 with this patch. >> +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_IN (loop->header)); >> +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_OUT >> + (loop->header)); >>Formatting. I'll incorporate the change. Thanks & Regards Ajit Bernd
Re: [hsa 6/10] Pass manager changes
On Wed, Dec 9, 2015 at 2:20 PM, Jakub Jelinekwrote: > On Mon, Dec 07, 2015 at 12:24:05PM +0100, Martin Jambor wrote: >> the pass manager changes required for HSA have already been committed >> to trunk so all that remains are these additions to the pass pipeline. >> >> Thanks, >> >> Martin >> >> >> 2015-12-04 Martin Jambor >> Martin Liska >> >> * passes.def: Schedule pass_ipa_hsa and pass_gen_hsail. >> * tree-pass.h (make_pass_gen_hsail): Declare. >> (make_pass_ipa_hsa): Likewise. > > I'll defer this to richi. Ok. Richard. > Jakub
[PATCH] Add levels to -Wmisleading-indentation; add level 1 to -Wall
On Mon, 2015-11-02 at 16:41 -0700, Jeff Law wrote: > On 11/02/2015 12:35 PM, David Malcolm wrote: > > > > >> diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c > >> index fff4862..2559a36 100644 > >> --- a/gdb/ada-lang.c > >> +++ b/gdb/ada-lang.c > >> @@ -11359,9 +11359,11 @@ ada_evaluate_subexp (struct type *expect_type, > >> struct expression *exp, > >> return value_zero (ada_aligned_type (type), lval_memory); > >>} > >> else > >> -arg1 = ada_value_struct_elt (arg1, >elts[pc + 2].string, 0); > >> -arg1 = unwrap_value (arg1); > >> -return ada_to_fixed_value (arg1); > >> + { > >> +arg1 = ada_value_struct_elt (arg1, >elts[pc + 2].string, 0); > >> +arg1 = unwrap_value (arg1); > >> +return ada_to_fixed_value (arg1); > >> + } > >> > >>case OP_TYPE: > >> /* The value is not supposed to be used. This is here to make it > > > > Interesting. It's not technically a bug, since the "if true" clause has > > an unconditional return, but it looks like a time-bomb to me. I'm happy > > that we warn for it. > Agreed. > > > > > > These three are all of the form: > > if (record_debug) > > fprint (...multiple lines...); > > return -1; > > > > It seems reasonable to me to complain about the indentation of the > > return -1; > Agreed. > > > > > >> diff --git a/sysdeps/ieee754/flt-32/k_rem_pio2f.c > >> b/sysdeps/ieee754/flt-32/k_rem_pio2f.c > >> index 0c7685c..a0d844c 100644 > >> --- a/sysdeps/ieee754/flt-32/k_rem_pio2f.c > >> +++ b/sysdeps/ieee754/flt-32/k_rem_pio2f.c > >> @@ -65,7 +65,8 @@ int __kernel_rem_pio2f(float *x, float *y, int e0, int > >> nx, int prec, const int32 > >> > >>/* compute q[0],q[1],...q[jk] */ > >> for (i=0;i<=jk;i++) { > >> - for(j=0,fw=0.0;j<=jx;j++) fw += x[j]*f[jx+i-j]; q[i] = fw; > >> + for(j=0,fw=0.0;j<=jx;j++) fw += x[j]*f[jx+i-j]; > >> + q[i] = fw; > >> } > >> > >> jz = jk; > > > > I think it's very reasonable to complain about the above code. > Agreed. > > > > > So if I've counted things right the tally is: > > * 5 dubious-looking though correct places, where it's reasonable to > > issue a warning > > * 5 places where there's a blank line between guarded and non-guarded > > stmt, where patch 1 of the kit would have suppressed the warning > > * one bug (PR 68187) > > > > I think we want the first kind of thing at -Wall, but I'm not so keen on > > the second kind at -Wall. Is there precedent for "levels" of a warning? > > (so e.g. pedantry level 1 at -Wall, level 2 at -Wextra, and have patch 1 > > be the difference between levels 1 and 2?) > > > > Sorry for harping on about patch 1; thanks again for posting this > No worries about harping. These are real world cases. > > And yes, there's definitely precedent for different levels of a warning. > If you wanted to add a patch to have different levels and select one > for -Wall, I'd look favorably on that. The attached patch implements this idea. It's a revised version of: "[PATCH 1/4] -Wmisleading-indentation: don't warn in presence of entirely blank lines" (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03225.html) (which wasn't approved, since you *did* want warnings for them), and of: "[PATCH 4/4] Add -Wmisleading-indentation to -Wall" (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03223.html) which you approved, but which Richi was unhappy with (I haven't committed it). The attached patch adds the idea of a strictness level to -Wmisleading-indentation, where -Wmisleading-indentation becomes equivalent to -Wmisleading-indentation=1 The heuristic from patch 1 becomes the difference between -Wmisleading-indentation=1 and -Wmisleading-indentation=2. It adds -Wmisleading-indentation=1 to -Wall, with -Wmisleading-indentation=2 available for users who want extra strictness. I think this gives us a big improvement in the signal:noise ratio of the "-Wall" form of the warning for real code (see e.g. the stats for gcc itself above), relative to the earlier form of the -Wall patch. Bootstrapped with x86_64-pc-linux-gnu. Adds 33 PASS results to g++.sum; adds 11 PASS results to gcc.sum. OK for trunk? gcc/c-family/ChangeLog: * c-indentation.c (line_is_blank_p): New function. (separated_by_blank_lines_p): New function. (should_warn_for_misleading_indentation): Add param strictness_level. Move early reject from warn_for_misleading_indentation to here. At strictness level 1, don't warn if the guarded and unguarded code are separated by one or more entirely blank lines. (warn_for_misleading_indentation): Pass value of warn_misleading_indentation to should_warn_for_misleading_indentation and move early rejection case for disabled aka level 0 to there. * c.opt (Wmisleading-indentation): Convert to an alias for... (Wmisleading-indentation=): ...this new version of
[PATCH] Do not compute dependences in if-conversion
This removes the dubious dependence computation in if-conversion just for the sake of testing if that might fail (given the vectorizer itself doesn't fail that easily). It's a quadratic operation after all which we should avoid at all cost. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2015-12-09 Richard Biener* tree-if-conv.c (if_convertible_loop_p_1): Do not compute dependences. (if_convertible_loop_p): Adjust. Index: gcc/tree-if-conv.c === *** gcc/tree-if-conv.c (revision 231453) --- gcc/tree-if-conv.c (working copy) *** predicate_bbs (loop_p loop) *** 1172,1189 static bool if_convertible_loop_p_1 (struct loop *loop, -vec *loop_nest, vec *refs, !vec *ddrs, bool *any_mask_load_store) { - bool res; unsigned int i; basic_block exit_bb = NULL; ! /* Don't if-convert the loop when the data dependences cannot be ! computed: the loop won't be vectorized in that case. */ ! res = compute_data_dependences_for_loop (loop, true, loop_nest, refs, ddrs); ! if (!res) return false; calculate_dominance_info (CDI_DOMINATORS); --- 1172,1184 static bool if_convertible_loop_p_1 (struct loop *loop, vec *refs, !bool *any_mask_load_store) { unsigned int i; basic_block exit_bb = NULL; ! if (find_data_references_in_loop (loop, refs) == chrec_dont_know) return false; calculate_dominance_info (CDI_DOMINATORS); *** if_convertible_loop_p (struct loop *loop *** 1300,1306 edge_iterator ei; bool res = false; vec refs; - vec ddrs; /* Handle only innermost loop. */ if (!loop || loop->inner) --- 1295,1300 *** if_convertible_loop_p (struct loop *loop *** 1333,1342 return false; refs.create (5); ! ddrs.create (25); ! auto_vec loop_nest; ! res = if_convertible_loop_p_1 (loop, _nest, , , !any_mask_load_store); data_reference_p dr; unsigned int i; --- 1327,1333 return false; refs.create (5); ! res = if_convertible_loop_p_1 (loop, , any_mask_load_store); data_reference_p dr; unsigned int i; *** if_convertible_loop_p (struct loop *loop *** 1344,1350 free (dr->aux); free_data_refs (refs); - free_dependence_relations (ddrs); delete ref_DR_map; ref_DR_map = NULL; --- 1335,1340
[PATCH, c++] Add testcase from PR68348 to the testsuite
This patch adds the test for the fixed PR. 2015-12-09 Uros BizjakPR c++/68348 PR c++/68464 * g++.dg/pr68348.C: New test. Tested on x86_64-linux-gnu. OK for mainline? Uros. Index: g++.dg/pr68348.C === --- g++.dg/pr68348.C(revision 0) +++ g++.dg/pr68348.C(revision 0) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c++11 -O2" } */ + +struct C { + constexpr C ():w (), x (), y () {} + constexpr double fn () const noexcept; + double w; + double x; + double y; +}; + +constexpr double C::fn () const noexcept { return w; } + +C foo () +{ + C c; + c.fn (); + return c; +}
Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.
On Wed, 2015-12-09 at 11:35 +0100, Richard Biener wrote: > On Tue, Dec 8, 2015 at 11:24 PM, Ajit Kumar Agarwal >wrote: > > Based on the comments on RFC patch this patch incorporates all the comments > > from Jeff. Thanks Jeff for the valuable feedback. > > > > This patch enables the better register pressure estimate for Loop Invariant > > code motion. This patch Calculate the Loop Liveness used for regs_used > > used to calculate the register pressure used in the cost estimation. > > > > Loop Liveness is based on the following properties. we only require to > > calculate the set of objects that are live at the birth or the header of > > the loop. We > > don't need to calculate the live through the Loop considering Live in and > > Live out of all the basic blocks of the Loop. This is because the set of > > objects. That > > are live-in at the birth or header of the loop will be live-in at every > > node in the Loop. > > > > If a v live out at the header of the loop then the variable is live-in at > > every node in the Loop. To prove this, Consider a Loop L with header h such > > that The > > variable v defined at d is live-in at h. Since v is live at h, d is not > > part of L. This follows from the dominance property, i.e. h is strictly > > dominated by d. > > Furthermore, there exists a path from h to a use of v which does not go > > through d. For every node of the loop, p, since the loop is strongly > > connected > > Component of the CFG, there exists a path, consisting only of nodes of L > > from p to h. Concatenating those two paths prove that v is live-in and > > live-out Of p. > > > > Also Calculate the Live-Out and Live-In for the exit edge of the loop. This > > considers liveness for not only the Loop latch but also the liveness > > outside the Loops. > > > > Bootstrapped and Regtested for x86_64 and microblaze target. > > > > There is an extra failure for the testcase gcc.dg/loop-invariant.c with > > the change that looks correct to me. > > > > This is because the available_regs = 6 and the regs_needed = 1 and new_regs > > = 0 and the regs_used = 10. As the reg_used that are based on the Liveness > > given above is greater than the available_regs, then it's candidate of > > spill and the estimate_register_pressure calculates the spill cost. This > > spill cost is greater > > than inv_cost and gain comes to be negative. The disables the loop > > invariant for the above testcase. > > > > Disabling of the Loop invariant for the testcase loop-invariant.c with this > > patch looks correct to me considering the calculation of available_regs in > > cfgloopanal.c > > is correct. > > You keep a lot of data (bitmaps) live just to use the number of bits > set in the end. > > I'll note that this also increases the complexity of the pass which is > enabled at -O1+ where > -O1 should limit itself to O(n*log n) algorithms but live is quadratic. > > So I don't think doing this unconditionally is a good idea (and we > have -fira-loop-pressure > after all). > > Please watch not making -O1 worse again after we spent so much time > making it suitable > for all the weird autogenerated code. > > Richard. > > > ChangeLog: > > 2015-12-09 Ajit Agarwal > > > > * loop-invariant.c > > (calculate_loop_liveness): New. > > (move_loop_invariants): Use of calculate_loop_liveness. > > > > Signed-off-by:Ajit Agarwal ajit...@xilinx.com > > > > --- > > gcc/loop-invariant.c | 77 > > ++ > > 1 files changed, 65 insertions(+), 12 deletions(-) > > > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > > index 53d1377..ac08594 100644 > > --- a/gcc/loop-invariant.c > > +++ b/gcc/loop-invariant.c [...snip...] > > @@ -1966,7 +1955,63 @@ mark_ref_regs (rtx x) > >} > > } > > > > +/* Calculate the Loop Liveness used for regs_used used in > > + heuristics to calculate the register pressure. */ > > + > > +static void > > +calculate_loop_liveness (void) > > +{ > > + struct loop *loop; > > + > > + FOR_EACH_LOOP (loop, 0) > > +if (loop->aux == NULL) > > + { > > +loop->aux = xcalloc (1, sizeof (struct loop_data)); > > +bitmap_initialize (_DATA (loop)->regs_live, _obstack); > > + } > > + > > + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) > > + { > > +int i; > > +edge e; > > +vec edges; > > +edges = get_loop_exit_edges (loop); Doesn't this leak "edges"? (Could/should it be an auto_vec?) > > + > > +/* Loop Liveness is based on the following proprties. > > + we only require to calculate the set of objects that are live at > > + the birth or the header of the loop. > > + We don't need to calculate the live through the Loop considering > > + Live in and Live out of all the basic blocks of the Loop. This is > > + because the set of objects. That are live-in at the birth or header >
Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class
On Tue, Dec 8, 2015 at 5:22 PM, H.J. Luwrote: > On Mon, Nov 23, 2015 at 12:53 PM, H.J. Lu wrote: >> On Mon, Nov 23, 2015 at 1:57 AM, Richard Biener >> wrote: >>> On Sat, Nov 21, 2015 at 12:46 AM, H.J. Lu wrote: On Fri, Nov 20, 2015 at 2:17 PM, Jason Merrill wrote: > On 11/20/2015 01:52 PM, H.J. Lu wrote: >> >> On Tue, Nov 17, 2015 at 4:22 AM, Richard Biener >> wrote: >>> >>> On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu wrote: Empty record should be returned and passed the same way in C and C++. This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which defaults to return false. For C++, LANG_HOOKS_EMPTY_RECORD_P is defined to is_really_empty_class, which returns true for C++ empty classes. For LTO, we stream out a bit to indicate if a record is empty and we store it in TYPE_LANG_FLAG_0 when streaming in. get_ref_base_and_extent is changed to set bitsize to 0 for empty records. Middle-end and x86 backend are updated to ignore empty records for parameter passing and function value return. Other targets may need similar changes. >>> >>> >>> Please avoid a new langhook for this and instead claim a bit in >>> tree_type_common >>> like for example restrict_flag (double-check it is unused for >>> non-pointers). >> >> >> There is no bit in tree_type_common I can overload. restrict_flag is >> checked for non-pointers to issue an error when it is used on >> non-pointers: >> >> >> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38: >> error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’ >> typedef typename T::L __restrict__ r;// { dg-error "'__restrict__' >> qualifiers cannot" "" } > > > The C++ front end only needs to check TYPE_RESTRICT for this purpose on > front-end-specific type codes like TEMPLATE_TYPE_PARM; cp_type_quals could > handle that specifically if you change TYPE_RESTRICT to only apply to > pointers. > restrict_flag is also checked in this case: [hjl@gnu-6 gcc]$ cat x.i struct dummy { }; struct dummy foo (struct dummy __restrict__ i) { return i; } [hjl@gnu-6 gcc]$ gcc -S x.i -Wall x.i:4:13: error: invalid use of ‘restrict’ foo (struct dummy __restrict__ i) ^ x.i:4:13: error: invalid use of ‘restrict’ [hjl@gnu-6 gcc]$ restrict_flag can't also be used to indicate `i' is an empty record. >>> >>> I'm sure this error can be done during parsing w/o relying on TYPE_RESTRICT. >>> >>> But well, use any other free bit (but do not enlarge >>> tree_type_common). Eventually >>> you can free up a bit by putting sth into type_lang_specific currently >>> using bits >>> in tree_type_common. >> >> There are no bits in tree_type_common I can move. Instead, >> this patch overloads side_effects_flag in tree_base. Tested on >> Linux/x86-64. OK for trunk? I'm fine with using side_effects_flag for this. I miss an explanation of where this detail of the ABI is documented and wonder if the term "empty record" is also used in that document and how it is documented. Thus +/* Nonzero in a type considered an empty record. */ +#define TYPE_EMPTY_RECORD(NODE) \ + (TYPE_CHECK (NODE)->base.side_effects_flag) should refer to the ABI where is defined what an "empty record" is and how it is handled by the backend(s). +/* Return true if type T is an empty record. */ + +static inline bool +type_is_empty_record_p (const_tree t) +{ + return TYPE_EMPTY_RECORD (TYPE_MAIN_VARIANT (t)); the type checker should probably check the bit is consistent across variants so it can be tested on any of them. You fail to adjust other targets gimplification hooks which suggests this is a detail of the x86 psABI and not the C++ ABI? If so I miss a -Wpsabi warning for this change. Did you investigate the effect of this patch on a larger code base? More specifically does it only affect variadic functions? An entry for changes.html is warranted as well. Thanks, Richard. >> Thanks. >> >> -- >> H.J. > > > > -- > H.J.
Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.
On 12/09/2015 12:22 PM, Ajit Kumar Agarwal wrote: This is because the available_regs = 6 and the regs_needed = 1 and new_regs = 0 and the regs_used = 10. As the reg_used that are based on the Liveness given above is greater than the available_regs, then > it's candidate of spill and the estimate_register_pressure calculates > the spill cost. This spill cost is greater than inv_cost and gain > comes to be negative. The disables the loop invariant for the above > testcase. As far as I can tell this loop does not lead to a spill. Hence, failure of this testcase would suggest there is something wrong with your idea. + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) { Formatting. +/* Loop Liveness is based on the following proprties. "properties" + we only require to calculate the set of objects that are live at + the birth or the header of the loop. + We don't need to calculate the live through the Loop considering + Live in and Live out of all the basic blocks of the Loop. This is + because the set of objects. That are live-in at the birth or header + of the loop will be live-in at every node in the Loop. + If a v live out at the header of the loop then the variable is live-in + at every node in the Loop. To prove this, Consider a Loop L with header + h such that The variable v defined at d is live-in at h. Since v is live + at h, d is not part of L. This follows from the dominance property, i.e. + h is strictly dominated by d. Furthermore, there exists a path from h to + a use of v which does not go through d. For every node of the loop, p, + since the loop is strongly connected Component of the CFG, there exists + a path, consisting only of nodes of L from p to h. Concatenating those + two paths prove that v is live-in and live-out Of p. */ Please Refrain From Randomly Capitalizing Words, and please also fix the grammar ("set of objects. That are live-in"). These problems make this comment (and also your emails) extremely hard to read. Partly for that reason, I can't quite make up my mind whether this patch makes things better or just different. The testcase failure makes me think it's probably not an improvement. +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_IN (loop->header)); +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_OUT + (loop->header)); Formatting. Bernd
Re: [AArch64] Emit square root using the Newton series
On 8 December 2015 at 21:35, Evandro Menezeswrote: >Emit square root using the Newton series > >2015-12-03 Evandro Menezes > >gcc/ > * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): >Declare new > function. > * config/aarch64/aarch64-simd.md (sqrt2): New >expansion and > insn definitions. > * config/aarch64/aarch64-tuning-flags.def > (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. > * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define >new function. > * config/aarch64/aarch64.md (sqrt2): New expansion >and insn > definitions. > * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): >Expand option > description. > * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. > > This patch extends the patch that added support for implementing x^-1/2 > using the Newton series by adding support for x^1/2 as well. Hi Evandro, What benchmarking have you done on this patch? /M
Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.
On 7 December 2015 at 17:10, Matthew Wahabwrote: > On 27/11/15 17:11, Matthew Wahab wrote: >> >> On 27/11/15 13:44, Christophe Lyon wrote: On 26/11/15 16:02, Matthew Wahab wrote: >> >> > This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM > tests to specify targest and to set up command line options. > It builds on the ARMv8.1 target support added for AArch64 tests, partly > reworking that support to take into account the different > configurations > that tests may be run under. >> >> >>> I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not >>> supported by aarch64-gcc. So it seems to me that >>> check_effective_target_arm_v8_1a_neon_ok_nocache will not always work >>> for aarch64 after your patch. >> >> >>> Or does it work because no option is needed and thus "" always >>> matches and thus the loop always exits after the first iteration >>> on aarch64? >> >> >> Yes, the idea is that the empty string will make the function first try >> '-march=armv8.1-a' without any other flag. That will work for AArch64 >> because it >> doesn't need any other option. >> >>> Maybe a more accurate comment would help remembering that, in case >>> -mfpu option becomes necessary for aarch64. >>> >> >> Agreed, it's worth having a comment to explain what the 'foreach' >> construct is doing. >> >> Matthew > > > I've added a comment to the foreach construct, to make it clearer what > it's doing. > I only have a minor typo to report in the other comment you added before check_effective_target_arm_v8_1a_neon_ok_nocache: "Record the command line options that needed." ("that" doesn't sound right to my non-native ears :-) Otherwise OK for me, but I can't approve. Christophe. > Matthew > > testsuite/ > 2015-12-07 Matthew Wahab > > > * lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update > comment. Use check_effetive_target_arm_v8_1a_neon_ok to select > the command line options. > (check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial > test to allow ARM targets. Select and record a working set of > command line options. > (check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM > targets. >
Re: [PATCH] Fix phiopt ICE in Factor conversion in COND_EXPR (PR tree-optimization/66949)
On Wed, Dec 09, 2015 at 11:41:44AM +0100, Richard Biener wrote: > But I don't see why we have the asserts at all - they seem to be originated > from > development. IMHO factor_out_conditonal_conversion should simply return > the new PHI it generates instead of relying on > signle_non_signelton_phi_for_edges > to return it. Ok, that seems to work. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-12-09 Marek PolacekPR tree-optimization/66949 * tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Don't call single_non_singleton_phi_for_edges to get the PHI from factor_out_conditional_conversion. Use NULL_TREE instead of NULL. (factor_out_conditional_conversion): Adjust declaration. Make it return the newly-created PHI. * gcc.dg/torture/pr66949-1.c: New test. * gcc.dg/torture/pr66949-2.c: New test. diff --git gcc/testsuite/gcc.dg/torture/pr66949-1.c gcc/testsuite/gcc.dg/torture/pr66949-1.c index e69de29..1b765bc 100644 --- gcc/testsuite/gcc.dg/torture/pr66949-1.c +++ gcc/testsuite/gcc.dg/torture/pr66949-1.c @@ -0,0 +1,28 @@ +/* PR tree-optimization/66949 */ +/* { dg-do compile } */ + +int a, *b = , c; + +unsigned short +fn1 (unsigned short p1, unsigned int p2) +{ + return p2 > 1 || p1 >> p2 ? p1 : p1 << p2; +} + +void +fn2 () +{ + int *d = + for (a = 0; a < -1; a = 1) +; + if (a < 0) +c = 0; + *b = fn1 (*d || c, *d); +} + +int +main () +{ + fn2 (); + return 0; +} diff --git gcc/testsuite/gcc.dg/torture/pr66949-2.c gcc/testsuite/gcc.dg/torture/pr66949-2.c index e69de29..e6250a3 100644 --- gcc/testsuite/gcc.dg/torture/pr66949-2.c +++ gcc/testsuite/gcc.dg/torture/pr66949-2.c @@ -0,0 +1,23 @@ +/* PR tree-optimization/66949 */ +/* { dg-do compile } */ + +char a; +int b, c, d; +extern int fn2 (void); + +short +fn1 (short p1, short p2) +{ + return p2 == 0 ? p1 : p1 / p2; +} + +int +main (void) +{ + char e = 1; + int f = 7; + c = a >> f; + b = fn1 (c, 0 < d <= e && fn2 ()); + + return 0; +} diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c index 344cd2f..d7e8aa0 100644 --- gcc/tree-ssa-phiopt.c +++ gcc/tree-ssa-phiopt.c @@ -49,7 +49,7 @@ along with GCC; see the file COPYING3. If not see static unsigned int tree_ssa_phiopt_worker (bool, bool); static bool conditional_replacement (basic_block, basic_block, edge, edge, gphi *, tree, tree); -static bool factor_out_conditional_conversion (edge, edge, gphi *, tree, tree); +static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, tree); static int value_replacement (basic_block, basic_block, edge, edge, gimple *, tree, tree); static bool minmax_replacement (basic_block, basic_block, @@ -310,19 +310,19 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads) /* Something is wrong if we cannot find the arguments in the PHI node. */ - gcc_assert (arg0 != NULL && arg1 != NULL); + gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE); - if (factor_out_conditional_conversion (e1, e2, phi, arg0, arg1)) + gphi *newphi = factor_out_conditional_conversion (e1, e2, phi, + arg0, arg1); + if (newphi != NULL) { + phi = newphi; /* factor_out_conditional_conversion may create a new PHI in BB2 and eliminate an existing PHI in BB2. Recompute values that may be affected by that change. */ - phis = phi_nodes (bb2); - phi = single_non_singleton_phi_for_edges (phis, e1, e2); - gcc_assert (phi); arg0 = gimple_phi_arg_def (phi, e1->dest_idx); arg1 = gimple_phi_arg_def (phi, e2->dest_idx); - gcc_assert (arg0 != NULL && arg1 != NULL); + gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE); } /* Do the replacement of conditional if it can be done. */ @@ -402,9 +402,9 @@ replace_phi_edge_with_variable (basic_block cond_block, /* PR66726: Factor conversion out of COND_EXPR. If the arguments of the PHI stmt are CONVERT_STMT, factor out the conversion and perform the conversion - to the result of PHI stmt. */ + to the result of PHI stmt. Return the newly-created PHI, if any. */ -static bool +static gphi * factor_out_conditional_conversion (edge e0, edge e1, gphi *phi, tree arg0, tree arg1) { @@ -421,7 +421,7 @@ factor_out_conditional_conversion (edge e0, edge e1, gphi *phi, statement have the same unary operation, we can handle more than two arguments too. */ if (gimple_phi_num_args (phi) != 2) -return false; +return NULL; /* First canonicalize to simplify tests. */ if (TREE_CODE (arg0) != SSA_NAME) @@ -433,14 +433,14 @@ factor_out_conditional_conversion (edge e0, edge e1, gphi *phi,
Re: [PATCH] Make basic asm implicitly clobber memory, pr24414
Hi, On 09.12.2015 12:06 Bernd Schmidt wrote: > On 12/09/2015 03:18 AM, Bernd Edlinger wrote: >> Furthermore there is a documented use for asm(""): The empty >> assembler string is used to make a function >> volatile, thus calls can not be optimized away. But I think it is >> not necessary to make this clobber anything, >> nor should it be an instruction scheduling barrier, as it used to be >> in the past. > > Making that change seems risky; best not to make assumptions about how > these are used. In any case, > So would you agree on the general direction of the patch, if I drop the hunk in sched-deps.c ? Thanks Bernd.
[PATCH] Add testcase for c++/68348
This adds a testcase for the already fixed PR68348. Tested on x86_64-linux, ok for trunk? 2015-12-09 Marek PolacekPR c++/68348 * g++.dg/cpp0x/pr68348.C: New test. diff --git gcc/testsuite/g++.dg/cpp0x/pr68348.C gcc/testsuite/g++.dg/cpp0x/pr68348.C index e69de29..9033bba 100644 --- gcc/testsuite/g++.dg/cpp0x/pr68348.C +++ gcc/testsuite/g++.dg/cpp0x/pr68348.C @@ -0,0 +1,18 @@ +// PR c++/68348 +// { dg-do compile { target c++11 } } + +struct C { + constexpr C() : w(), x(), y() {} + constexpr double fn() const noexcept; + double w; + double x; + double y; +}; + +constexpr double C::fn() const noexcept { return w; } +C foo() +{ + C c; + c.fn (); + return c; +} Marek
Re: [PATCH] Add testcase for c++/68348
On 09/12/15 15:38, Marek Polacek wrote: This adds a testcase for the already fixed PR68348. Tested on x86_64-linux, ok for trunk? 2015-12-09 Marek PolacekPR c++/68348 * g++.dg/cpp0x/pr68348.C: New test. diff --git gcc/testsuite/g++.dg/cpp0x/pr68348.C gcc/testsuite/g++.dg/cpp0x/pr68348.C index e69de29..9033bba 100644 --- gcc/testsuite/g++.dg/cpp0x/pr68348.C +++ gcc/testsuite/g++.dg/cpp0x/pr68348.C @@ -0,0 +1,18 @@ +// PR c++/68348 +// { dg-do compile { target c++11 } } + +struct C { + constexpr C() : w(), x(), y() {} + constexpr double fn() const noexcept; + double w; + double x; + double y; +}; + +constexpr double C::fn() const noexcept { return w; } +C foo() +{ + C c; + c.fn (); + return c; +} Marek Same as: https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01001.html :) Kyrill
[Patch, Fortran] PR68815 - replace '%s' quotes by %< ... %>
This patch replaces some of the '%s' quotes of diagnostic strings by the nicer quotes. First, it replaces some leftovers of '%s' -> %qs in directly used error strings. It then also converts some (well: resolve.c only) '%s' to %%<%s%%>, which are using with sprintf(), but which are still passed as fmt string to the diagnostic functions. There is still code using '%s' - but those bits have the form sprintf (buffer, " '%s' ... "); gfc_error ("Bla ... : %s", ..., buffer); which prevents the use of %< ... %>. (See last comment in the PR.) In principle, %<%c%> and %<%d%> should be convertable to %qc and %qd (as the code is more readable), but the current function annotation prevent this, telling that the q flag is not valid for %c and %d. As %< is fine, I didn't dig into it. Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias PR fortran/68815 * decl.c (gfc_verify_c_interop_param, variable_decl): Use %< ... %> for quoting in diagnostics. * io.c (check_format): Ditto. * resolve.c (resolve_operator): Ditto. * symbol.c (check_conflict): Ditto. * trans-common.c (translate_common): Ditto. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index bff23e1..b03dadf 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -1194,7 +1194,7 @@ gfc_verify_c_interop_param (gfc_symbol *sym) if (sym->as != NULL && sym->as->type == AS_ASSUMED_SHAPE && !gfc_notify_std (GFC_STD_F2008_TS, "Assumed-shape array %qs " "at %L as dummy argument to the BIND(C) " - "procedure '%s' at %L", sym->name, + "procedure %qs at %L", sym->name, &(sym->declared_at), sym->ns->proc_name->name, &(sym->ns->proc_name->declared_at))) @@ -2023,9 +2023,9 @@ variable_decl (int elem) if (sym != NULL && (sym->attr.dummy || sym->attr.result)) { m = MATCH_ERROR; - gfc_error ("'%s' at %C is a redefinition of the declaration " + gfc_error ("%qs at %C is a redefinition of the declaration " "in the corresponding interface for MODULE " - "PROCEDURE '%s'", sym->name, + "PROCEDURE %qs", sym->name, gfc_current_ns->proc_name->name); goto cleanup; } diff --git a/gcc/fortran/io.c b/gcc/fortran/io.c index 8cf952f..9a77234 100644 --- a/gcc/fortran/io.c +++ b/gcc/fortran/io.c @@ -549,7 +549,7 @@ check_format (bool is_input) { const char *posint_required = _("Positive width required"); const char *nonneg_required = _("Nonnegative width required"); - const char *unexpected_element = _("Unexpected element %<%c%> in format " + const char *unexpected_element = _("Unexpected element %qc in format " "string at %L"); const char *unexpected_end = _("Unexpected end of format string"); const char *zero_width = _("Zero width in format descriptor"); diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index 10add62..65a2b7f 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -3560,7 +3560,7 @@ resolve_operator (gfc_expr *e) break; } - sprintf (msg, _("Operand of unary numeric operator '%s' at %%L is %s"), + sprintf (msg, _("Operand of unary numeric operator %%<%s%%> at %%L is %s"), gfc_op2string (e->value.op.op), gfc_typename (>ts)); goto bad_op; @@ -3576,7 +3576,7 @@ resolve_operator (gfc_expr *e) } sprintf (msg, - _("Operands of binary numeric operator '%s' at %%L are %s/%s"), + _("Operands of binary numeric operator %%<%s%%> at %%L are %s/%s"), gfc_op2string (e->value.op.op), gfc_typename (>ts), gfc_typename (>ts)); goto bad_op; @@ -3610,7 +3610,7 @@ resolve_operator (gfc_expr *e) break; } - sprintf (msg, _("Operands of logical operator '%s' at %%L are %s/%s"), + sprintf (msg, _("Operands of logical operator %%<%s%%> at %%L are %s/%s"), gfc_op2string (e->value.op.op), gfc_typename (>ts), gfc_typename (>ts)); @@ -3695,7 +3695,7 @@ resolve_operator (gfc_expr *e) ? ".eqv." : ".neqv.", gfc_op2string (e->value.op.op)); else sprintf (msg, - _("Operands of comparison operator '%s' at %%L are %s/%s"), + _("Operands of comparison operator %%<%s%%> at %%L are %s/%s"), gfc_op2string (e->value.op.op), gfc_typename (>ts), gfc_typename (>ts)); @@ -3703,13 +3703,14 @@ resolve_operator (gfc_expr *e) case INTRINSIC_USER: if (e->value.op.uop->op == NULL) - sprintf (msg, _("Unknown operator '%s' at %%L"), e->value.op.uop->name); + sprintf (msg, _("Unknown operator %%<%s%%> at %%L"), + e->value.op.uop->name); else if (op2 == NULL) - sprintf (msg, _("Operand of user operator '%s' at %%L is %s"), + sprintf (msg, _("Operand of user operator %%<%s%%> at %%L is %s"), e->value.op.uop->name, gfc_typename (>ts)); else { - sprintf (msg, _("Operands of user operator '%s' at %%L are %s/%s"), + sprintf (msg, _("Operands of user operator %%<%s%%> at %%L are %s/%s"), e->value.op.uop->name, gfc_typename (>ts),
Splitting up gcc/omp-low.c? (was: [hsa 5/10] OpenMP lowering/expansion changes (gridification))
Hi! I've been meaning to suggest this for some time already: On Wed, 9 Dec 2015 14:19:30 +0100, Jakub Jelinekwrote: > As for omp-low.c changes, the file is already large enough that it would be > nice if it is easy to find out what routines are for gridification purposes > only, use some special prefix (grid_*, ompgrid_*, ...) for all such > functions? In addition to that, how about we split up gcc/omp-low.c into several files? Would it make sense (I have not yet looked in detail) to do so along the borders of the several passes defined therein? Or, can you tell already that there would be too many cross-references between the several files to make this infeasible? I'd suggest to do this shortly before GCC 6 is released, so that backports from trunk to gcc-6-branch will be easy. (I assume we don't have to care for gcc-5-branch backports too much any longer.) $ wc -l gcc/*.c | sort -r -n | head 879881 total 25770 gcc/dwarf2out.c 19834 gcc/omp-low.c 14419 gcc/fold-const.c 14357 gcc/combine.c 14003 gcc/tree.c 11622 gcc/expr.c 11610 gcc/gimplify.c 10417 gcc/tree-vrp.c 10328 gcc/var-tracking.c Grüße Thomas signature.asc Description: PGP signature
RE: [AArch64] Emit square root using the Newton series
Hi, Marcus. I've run Geekbench, SPEC CPU2000 and synthetic benchmarks. I can share these results iterating an array with values between 1 and 100 and taking their square root: Million Operations/sJuno A53 @850MHz A57 @1100MHz X^½ DP Canon31 37 Newton 13 39 %Δ -57%6% SP Canon48 144 Newton 18 62 %Δ -63%-57% X^-½DP Canon17 16 Newton 14 42 %Δ -17%155% SP Canon28 70 Newton 20 62 %Δ -30%-11% As you can see, it's a mixed result for A57 and a definite regression for A53. In mnost benchmarks overall, this is not a good optimization for A57. That's why I left it as a target-specific tuning. Thank you, -- Evandro Menezes Austin, TX > -Original Message- > From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] > Sent: Wednesday, December 09, 2015 8:06 > To: Evandro Menezes > Cc: GCC Patches; Marcus Shawcroft; James Greenhalgh; Andrew Pinski; Benedikt > Huber; philipp.toms...@theobroma-systems.com > Subject: Re: [AArch64] Emit square root using the Newton series > > On 8 December 2015 at 21:35, Evandro Menezeswrote: > >Emit square root using the Newton series > > > >2015-12-03 Evandro Menezes > > > >gcc/ > > * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): > >Declare new > > function. > > * config/aarch64/aarch64-simd.md (sqrt2): New > >expansion and > > insn definitions. > > * config/aarch64/aarch64-tuning-flags.def > > (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. > > * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define > >new function. > > * config/aarch64/aarch64.md (sqrt2): New expansion > >and insn > > definitions. > > * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): > >Expand option > > description. > > * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. > > > > This patch extends the patch that added support for implementing > > x^-1/2 using the Newton series by adding support for x^1/2 as well. > > Hi Evandro, What benchmarking have you done on this patch? > /M
Re: [PATCH] Make basic asm implicitly clobber memory, pr24414
Hi, On 09.12.2015 16:48 Bernd Schmidt wrote: > On 12/09/2015 04:09 PM, Bernd Edlinger wrote: > >> So would you agree on the general direction of the patch, >> if I drop the hunk in sched-deps.c ? > > I'm not sure there was any consensus in that other thread, but I think > assuming that basic asms clobber memory and CC, can be justified. That > certainly seems like a safer default. Ideally though I think it would > be best if we could deprecate basic asms in functions, or at least > warn about them in -Wall. > > Well no, we did not get to a consensus on the warning issue. My personal gut feeling on that warning is a bit mixed... If we have a -Wall-enabled warning on asm("..."), people who know next to nothing about assembler will be encouraged to "fix" this warning in a part of the code which they probably do not understand at all. This frightens me a bit. Because I know they will soon find out, that adding a few colons fixes the warning, but asm("...":::) is not any better IMHO. For me, it is just very very unlikely that any piece of assembler really clobbers nothing and has no inputs and no outputs at the same time, even it it looks so at first sight... It is much more likely that someone forgot to fill in the clobber section. So for me it would also be good to warn on asm("...":::) and require that, if they want to fix this warning, they must at least write something in the clobber section, like asm ("...":::"nothing"); that would be a new clobber name which can only stand alone and, which can get stripped after the warning processing took place in the FE. So I think a warning should warn on something that is so unusual that it is likely a bug. Bernd.
Re: [gomp4] Fix Fortran deviceptr
Cesar, On 12/08/2015 11:10 AM, Cesar Philippidis wrote: On 12/08/2015 08:22 AM, James Norris wrote: 2. It appears that deviceptr code in GOACC_parallel_keyed is mostly identical to GOACC_data_start. Can you put that duplicate code into a function? That would be easier to maintain in the long run. Fixed. Where? I don't see a patch. Opp... Now attached. Since you're working on fortran, can you take a look at how gfc_match_omp_clauses is handling OMP_CLAUSE_DEVICEPTR. It seems overly confusing to me. Could it be done in a similar way as OMP_CLAUSE_COPYIN, i.e., using gfc_match_omp_map_clause? Confusion removed and replaced with code similar to how the other clauses are handled. Patch to be committed after testing completes. Thanks! Jim diff --git a/gcc/fortran/ChangeLog.gomp b/gcc/fortran/ChangeLog.gomp index 00e5746..d05326d 100644 --- a/gcc/fortran/ChangeLog.gomp +++ b/gcc/fortran/ChangeLog.gomp @@ -1,3 +1,8 @@ +2015-12-09 James Norris+ + * openmp.c (gfc_match_omp_clauses): Re-write handling of the + deviceptr clause. + 2015-12-08 Thomas Schwinge * gfortran.h (symbol_attribute): Add oacc_function_nohost member. diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index b59528be..78d2e1e 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -818,19 +818,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask, OMP_MAP_ALLOC)) continue; if ((mask & OMP_CLAUSE_DEVICEPTR) - && gfc_match ("deviceptr ( ") == MATCH_YES) - { - gfc_omp_namelist **list = >lists[OMP_LIST_MAP]; - gfc_omp_namelist **head = NULL; - if (gfc_match_omp_variable_list ("", list, true, NULL, , false) - == MATCH_YES) - { - gfc_omp_namelist *n; - for (n = *head; n; n = n->next) - n->u.map_op = OMP_MAP_FORCE_DEVICEPTR; - continue; - } - } + && gfc_match ("deviceptr ( ") == MATCH_YES + && gfc_match_omp_map_clause (>lists[OMP_LIST_MAP], + OMP_MAP_FORCE_DEVICEPTR)) + continue; if ((mask & OMP_CLAUSE_BIND) && c->routine_bind == NULL && gfc_match ("bind ( %s )", >routine_bind) == MATCH_YES) { diff --git a/libgomp/ChangeLog.gomp b/libgomp/ChangeLog.gomp index d88cb94..ec133a7 100644 --- a/libgomp/ChangeLog.gomp +++ b/libgomp/ChangeLog.gomp @@ -1,3 +1,11 @@ +2015-12-09 James Norris + + * oacc-parallel.c (handle_ftn_pointers): New function. + (GOACC_parallel_keyed, GOACC_data_start): Factor out Fortran pointer + handling. + * testsuite/libgomp.oacc-fortran/declare-1.f90: Add comment. + * testsuite/libgomp.oacc-fortran/deviceptr-1.f90: Fix comment. + 2015-12-08 James Norris * testsuite/libgomp.oacc-fortran/kernels-map-1.f90: Add new test. diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c index a606152..f68db78 100644 --- a/libgomp/oacc-parallel.c +++ b/libgomp/oacc-parallel.c @@ -57,6 +57,51 @@ find_pointer (int pos, size_t mapnum, unsigned short *kinds) return 0; } +/* Handle the mapping pair that are presented when a + deviceptr clause is used with Fortran. */ + +static void +handle_ftn_pointers (size_t mapnum, void **hostaddrs, size_t *sizes, + unsigned short *kinds) +{ + int i; + + for (i = 0; i < mapnum; i++) +{ + unsigned short kind1 = kinds[i] & 0xff; + + /* Handle Fortran deviceptr clause. */ + if (kind1 == GOMP_MAP_FORCE_DEVICEPTR) + { + unsigned short kind2; + + if (i < (signed)mapnum - 1) + kind2 = kinds[i + 1] & 0xff; + else + kind2 = 0x; + + if (sizes[i] == sizeof (void *)) + continue; + + /* At this point, we're dealing with a Fortran deviceptr. + If the next element is not what we're expecting, then + this is an instance of where the deviceptr variable was + not used within the region and the pointer was removed + by the gimplifier. */ + if (kind2 == GOMP_MAP_POINTER + && sizes[i + 1] == 0 + && hostaddrs[i] == *(void **)hostaddrs[i + 1]) + { + kinds[i+1] = kinds[i]; + sizes[i+1] = sizeof (void *); + } + + /* Invalidate the entry. */ + hostaddrs[i] = NULL; + } +} +} + static void goacc_wait (int async, int num_waits, va_list *ap); @@ -99,40 +144,7 @@ GOACC_parallel_keyed (int device, void (*fn) (void *), thr = goacc_thread (); acc_dev = thr->dev; - for (i = 0; i < mapnum; i++) -{ - unsigned short kind1 = kinds[i] & 0xff; - - /* Handle Fortran deviceptr clause. */ - if (kind1 == GOMP_MAP_FORCE_DEVICEPTR) - { - unsigned short kind2; - - if (i < (signed)mapnum - 1) - kind2 = kinds[i + 1] & 0xff; - else - kind2 = 0x; - - if (sizes[i] == sizeof (void *)) - continue; - - /* At this point, we're dealing with a Fortran deviceptr. - If the next element is not what we're expecting, then - this is an instance of where the deviceptr variable was - not used within the
RE: [PATCH][ARC] Refurbish emitting DWARF2 for epilogue.
> The main point of having rtl epilogues is to allow such optimizations. > Traditionally, we have said that at -O1, it is OK to curb optimizations for > the > sake of having programs that are saner to debug, while -O2 and above should > just optimize, and the debug info generation is just thrown in afterwards as a > best-effort attempt. > > Additionally, we also have -Og now. Well, it seems to me that we prefer to disable optimizations when talking about debug related information (see PR target/60598 git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@208749 138bc75d-0d04-0410-961f-82ee72b054a4 commit). > > You can also consider having separate options to control optimizations that > affect debugging. > If leaving out epilogue cfi is what it takes to allow epilogue scheduling > without the compiler crashing, then that is what should be done by default at > -O2, but if someone finds that particularly vexing, they might appreciate a - > mepilogue-cfi option to change that default, and instead disable some > scheduling (delay slot and otherwise). Unfortunately, dwarf2cfi checks the paths for consistency (dwarf2cfi.c:2284), throwing an error if those paths are not ok. Also, with ARC gcc we focus on Linux type of application, hence, the unwinding info needs to be consistent. As far as I can see, having a sort of mno-epilogue-cfi option will just inhibit or not the emitting of the blockage instruction, and the option will be valid only when compiling without emitting dwarf information (i.e., the mno-epilogue-cfi is incompatible with -g). Personally, I do not see the benefit of having such an option, as one may lose like 1 cycle per function call (HS/EM cpus) in very particular cases. Running the dg.exp, compile.exp, execute.exp, and builing Linux with some default apps, we found only 4 cases in which the branch scheduler slot needs the blockage mechanism. Also, adding blockage before generating any prologue instruction seems to be a wide spread practice in gcc backends (see SH for example). Best, Claudiu
Re: [PATCH] Add testcase for c++/68348
On Wed, Dec 09, 2015 at 03:39:50PM +, Kyrill Tkachov wrote: > On 09/12/15 15:38, Marek Polacek wrote: > >This adds a testcase for the already fixed PR68348. > > > >Tested on x86_64-linux, ok for trunk? > > > >2015-12-09 Marek Polacek> > > > PR c++/68348 > > * g++.dg/cpp0x/pr68348.C: New test. > > > >diff --git gcc/testsuite/g++.dg/cpp0x/pr68348.C > >gcc/testsuite/g++.dg/cpp0x/pr68348.C > >index e69de29..9033bba 100644 > >--- gcc/testsuite/g++.dg/cpp0x/pr68348.C > >+++ gcc/testsuite/g++.dg/cpp0x/pr68348.C > >@@ -0,0 +1,18 @@ > >+// PR c++/68348 > >+// { dg-do compile { target c++11 } } > >+ > >+struct C { > >+ constexpr C() : w(), x(), y() {} > >+ constexpr double fn() const noexcept; > >+ double w; > >+ double x; > >+ double y; > >+}; > >+ > >+constexpr double C::fn() const noexcept { return w; } > >+C foo() > >+{ > >+ C c; > >+ c.fn (); > >+ return c; > >+} > > > > Marek > > > > Same as: > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01001.html > :) Ah! The PR wasn't assigned to anyone. But I think my version is better because I think we prefer // { dg-do compile { target c++11 } } to /* { dg-options "-std=c++11 -O2" } */ ;) Marek
Re: [PATCH, testsuite] Fix sse4_1-round* inline asm statements
On Wed, Dec 9, 2015 at 8:18 AM, Uros Bizjakwrote: > Saying that, I see we don't need to define ASM_SUFFIX anymore. I'll > prepare the patch that removes these #defines. 2015-12-09 Uros Bizjak * gcc.target/i386/sse4_1-roundps-1.c: Remove ASM_SUFFIX define. * gcc.target/i386/sse4_1-roundps-2.c: Ditto. * gcc.target/i386/sse4_1-roundps-3.c: Ditto. * gcc.target/i386/sse4_1-roundsd-1.c: Ditto. * gcc.target/i386/sse4_1-roundsd-2.c: Ditto. * gcc.target/i386/sse4_1-roundsd-3.c: Ditto. * gcc.target/i386/sse4_1-roundss-1.c: Ditto. * gcc.target/i386/sse4_1-roundss-2.c: Ditto. * gcc.target/i386/sse4_1-roundss-3.c: Ditto. Tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN. patch will be backported to all release branches. Uros. Index: gcc.target/i386/sse4_1-roundps-1.c === --- gcc.target/i386/sse4_1-roundps-1.c (revision 231440) +++ gcc.target/i386/sse4_1-roundps-1.c (working copy) @@ -7,7 +7,6 @@ #define VEC_T __m128 #define FP_T float -#define ASM_SUFFIX "s" #define ROUND_INTRIN(x, mode) _mm_ceil_ps(x) #define ROUND_MODE _MM_FROUND_CEIL Index: gcc.target/i386/sse4_1-roundss-3.c === --- gcc.target/i386/sse4_1-roundss-3.c (revision 231440) +++ gcc.target/i386/sse4_1-roundss-3.c (working copy) @@ -7,7 +7,6 @@ #define VEC_T __m128 #define FP_T float -#define ASM_SUFFIX "s" #define ROUND_INTRIN(x, mode) _mm_floor_ss(x, x) #define ROUND_MODE _MM_FROUND_FLOOR Index: gcc.target/i386/sse4_1-roundps-2.c === --- gcc.target/i386/sse4_1-roundps-2.c (revision 231440) +++ gcc.target/i386/sse4_1-roundps-2.c (working copy) @@ -7,7 +7,6 @@ #define VEC_T __m128 #define FP_T float -#define ASM_SUFFIX "s" #define ROUND_INTRIN _mm_round_ps #define ROUND_MODE _MM_FROUND_NINT Index: gcc.target/i386/sse4_1-roundps-3.c === --- gcc.target/i386/sse4_1-roundps-3.c (revision 231440) +++ gcc.target/i386/sse4_1-roundps-3.c (working copy) @@ -7,7 +7,6 @@ #define VEC_T __m128 #define FP_T float -#define ASM_SUFFIX "s" #define ROUND_INTRIN(x, mode) _mm_floor_ps(x) #define ROUND_MODE _MM_FROUND_FLOOR Index: gcc.target/i386/sse4_1-roundsd-1.c === --- gcc.target/i386/sse4_1-roundsd-1.c (revision 231440) +++ gcc.target/i386/sse4_1-roundsd-1.c (working copy) @@ -7,7 +7,6 @@ #define VEC_T __m128d #define FP_T double -#define ASM_SUFFIX "l" #define ROUND_INTRIN(x, mode) _mm_ceil_sd(x, x) #define ROUND_MODE _MM_FROUND_CEIL Index: gcc.target/i386/sse4_1-roundsd-2.c === --- gcc.target/i386/sse4_1-roundsd-2.c (revision 231440) +++ gcc.target/i386/sse4_1-roundsd-2.c (working copy) @@ -7,7 +7,6 @@ #define VEC_T __m128d #define FP_T double -#define ASM_SUFFIX "l" #define ROUND_INTRIN(x, mode) _mm_round_sd(x, x, mode) #define ROUND_MODE _MM_FROUND_NINT Index: gcc.target/i386/sse4_1-roundsd-3.c === --- gcc.target/i386/sse4_1-roundsd-3.c (revision 231440) +++ gcc.target/i386/sse4_1-roundsd-3.c (working copy) @@ -7,7 +7,6 @@ #define VEC_T __m128d #define FP_T double -#define ASM_SUFFIX "l" #define ROUND_INTRIN(x, mode) _mm_floor_sd(x, x) #define ROUND_MODE _MM_FROUND_FLOOR Index: gcc.target/i386/sse4_1-roundss-1.c === --- gcc.target/i386/sse4_1-roundss-1.c (revision 231440) +++ gcc.target/i386/sse4_1-roundss-1.c (working copy) @@ -7,7 +7,6 @@ #define VEC_T __m128 #define FP_T float -#define ASM_SUFFIX "s" #define ROUND_INTRIN(x, mode) _mm_ceil_ss(x, x) #define ROUND_MODE _MM_FROUND_CEIL Index: gcc.target/i386/sse4_1-roundss-2.c === --- gcc.target/i386/sse4_1-roundss-2.c (revision 231440) +++ gcc.target/i386/sse4_1-roundss-2.c (working copy) @@ -7,7 +7,6 @@ #define VEC_T __m128 #define FP_T float -#define ASM_SUFFIX "s" #define ROUND_INTRIN(x, mode) _mm_round_ss(x, x, mode) #define ROUND_MODE _MM_FROUND_NINT
Re: [PATCH] Add levels to -Wmisleading-indentation; add level 1 to -Wall
On 12/09/2015 04:38 PM, David Malcolm wrote: +/* The following function contains examples of bad indentation that's + arguably not misleading, due to a blank line between the guarded and the + non-guarded code. Some of the blank lines deliberately contain + redundant whitespace, to verify that this is handled. + Based on examples seen in gcc/fortran/io.c, gcc/function.c, and + gcc/regrename.c. + + At -Wmisleading-indentation (implying level 1) we shouldn't emit + warnings for this function. Compare with + fn_40_level_1 in Wmisleading-indentation-level-1.c and + fn_40_level_2 in Wmisleading-indentation-level-2.c. */ + +void +fn_40_implicit_level_1 (int arg) +{ +if (flagA) + foo (0); + + foo (1); + I personally see no use for the blank line heuristic, in fact I think it is misguided. To my eyes a warning is clearly warranted for the examples in this testcase. Do we actually have any users calling for this heuristic? Bernd
Re: [PATCH] Make basic asm implicitly clobber memory, pr24414
On 12/09/2015 04:09 PM, Bernd Edlinger wrote: So would you agree on the general direction of the patch, if I drop the hunk in sched-deps.c ? I'm not sure there was any consensus in that other thread, but I think assuming that basic asms clobber memory and CC, can be justified. That certainly seems like a safer default. Ideally though I think it would be best if we could deprecate basic asms in functions, or at least warn about them in -Wall. Bernd
[committed] Avoid undefined behavior in test.
I have committed the following testsuite patch as obvious. The test calls a variadic function that extracts pointers with va_arg, but the terminating NULL is passed as an int, not a pointer. This wouldn't trip on 32-bit architectures, and even on 64-bit the test simply iterates until it gets a NULL stack slot (in any containing frame). Noticed while testing NVPTX with -mgomp. 2015-12-09 Alexander Monakov* gcc.c-torture/execute/980716-1.c: Avoid undefined behavior due to passing terminating NULL as int rather than pointer. --- gcc.c-torture/execute/980716-1.c(revision 231457) +++ gcc.c-torture/execute/980716-1.c(working copy) @@ -20,7 +20,7 @@ int main() { -stub(1, "ab", "bc", "cx", 0); +stub(1, "ab", "bc", "cx", (char *)0); exit (0); }
Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
Done. I have permission for contributing but I don't have write permission on the repository. 2015-12-09 8:23 GMT+01:00 Tobias Burnus: > Alessandro Fanfarillo wrote: >> >> in attachment the new patch. I also checked the behavior with >> move_alloc: it synchronizes right after the deregistration of the >> destination. >> I also noticed that __asm__ __volatile__ ("":::"memory") is called >> before sync all and not after. It shouldn't be a problem, right? > > > The patch looks mostly good to me, except: > >> @@ -1222,6 +1230,15 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr >> *expr, tree lhs, tree lhs_kind, >> se->expr = res_var; >> if (array_expr->ts.type == BT_CHARACTER) >> se->string_length = argse.string_length; >> + >> + /* It guarantees memory consistency within the same segment */ >> + /* tmp = gfc_build_string_const (strlen ("memory")+1, "memory"), */ >> + /* tmp = build5_loc (input_location, ASM_EXPR, void_type_node, */ >> + /* gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE, >> */ >> + /* tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE); */ >> + /* ASM_VOLATILE_P (tmp) = 1; */ >> + /* gfc_add_expr_to_block (>pre, tmp); */ >> + >> } > > > Do not add out-commented code. Please remove. > > >> gfc_trans_critical (gfc_code *code) >> { >>stmtblock_t block; >>tree tmp, token = NULL_TREE; >> >>gfc_start_block (); >> >>if (flag_coarray == GFC_FCOARRAY_LIB) >> { >>token = gfc_get_symbol_decl (code->resolved_sym); >>token = GFC_TYPE_ARRAY_CAF_TOKEN (TREE_TYPE (token)); >>tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_lock, 7, >> token, integer_zero_node, >> integer_one_node, >> null_pointer_node, null_pointer_node, >> null_pointer_node, integer_zero_node); >>gfc_add_expr_to_block (, tmp); >> } >> >> + /* It guarantees memory consistency within the same segment */ >> + tmp = gfc_build_string_const (strlen ("memory")+1, "memory"), >> +tmp = build5_loc (input_location, ASM_EXPR, void_type_node, >> + gfc_build_string_const (1, ""), NULL_TREE, >> NULL_TREE, >> + tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE); >> + ASM_VOLATILE_P (tmp) = 1; >> + >> + gfc_add_expr_to_block (, tmp); >> + >>tmp = gfc_trans_code (code->block->next); >>gfc_add_expr_to_block (, tmp); >> >>if (flag_coarray == GFC_FCOARRAY_LIB) >> { >>tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_unlock, >> 6, >> token, integer_zero_node, >> integer_one_node, >> null_pointer_node, null_pointer_node, >> integer_zero_node); >>gfc_add_expr_to_block (, tmp); >> } >> >> + /* It guarantees memory consistency within the same segment */ >> + tmp = gfc_build_string_const (strlen ("memory")+1, "memory"), >> + tmp = build5_loc (input_location, ASM_EXPR, void_type_node, >> + gfc_build_string_const (1, ""), NULL_TREE, >> NULL_TREE, >> + tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE); >> + ASM_VOLATILE_P (tmp) = 1; >> + >> + gfc_add_expr_to_block (, tmp); >> >>return gfc_finish_block (); >> } > > > Please move the two new code additions into the associated "if (flag_coarray > == GFC_FCOARRAY_LIB)" blocks - and keep an eye on the indentation: for the > current code location, the second block is wrongly idented. > > > With the two issues fixed: LGTM. > > Cheers, > > Tobias > > PS: I assume you can still commit yourself. I am asking because Steve did > commit the recent EVENT patch for you. 2015-12-09 Tobias Burnus Alessandro Fanfarillo * trans.c (gfc_allocate_using_lib,gfc_deallocate_with_status): Introducing __asm__ __volatile__ ("":::"memory") after image control statements. * trans-stmt.c (gfc_trans_sync, gfc_trans_event_post_wait, gfc_trans_lock_unlock, gfc_trans_critical): Ditto. * trans-intrinsic.c (gfc_conv_intrinsic_caf_get, conv_caf_send): Introducing __asm__ __volatile__ ("":::"memory") after send, before get and around sendget. 2015-12-09 Tobias Burnus Alessandro Fanfarillo * gfortran.dg/coarray_40.f90: New. commit 96975d401c4b66c0362d853bf6b604cda171552b Author: Alessandro Fanfarillo Date: Wed Dec 9 17:05:28 2015 +0100 Introducing __asm__ __volatile__ (:::memory) after image control statements, send, get and sendget diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index 21efe44..31bad35 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -1211,6 +1211,14
[PATCH] Better error recovery for merge-conflict markers (v4)
On Wed, 2015-11-04 at 14:56 +0100, Bernd Schmidt wrote: > On 10/30/2015 04:16 PM, David Malcolm wrote: > > The idea is to more gracefully handle merger conflict markers > > in the source code being compiled. Specifically, in the C and > > C++ frontends, if we're about to emit an error, see if the > > source code is at a merger conflict marker, and if so, emit > > a more specific message, so that the user gets this: > > > > foo.c:1:1: error: source file contains patch conflict marker > > <<< HEAD > > ^ > > > > It's something of a "fit and finish" cosmetic item, but these > > things add up. > > This seems like fairly low impact but also low cost, so I'm fine with it > in principle. I wonder whether the length of the marker is the same > across all versions of patch (and VC tools)? It's hardcoded for GNU patch: http://git.savannah.gnu.org/cgit/patch.git/tree/src/merge.c which hardcodes e.g.: fputs (outstate->after_newline + "\n<<<\n", fp); I don't know if it's hardcoded for CVS or Subversion, but both have documentation showing that format: ftp://ftp.gnu.org/old-gnu/Manuals/cvs/html_node/cvs_38.html http://svnbook.red-bean.com/en/1.7/svn.tour.cycle.html#svn.tour.cycle.resolve It's the default of git: http://git.kernel.org/cgit/git/git.git/tree/Documentation/merge-config.txt (config option "merge.conflictStyle") This git commit (to git) seems to have generalized it to support a "conflict-marker-size" attribute: https://github.com/git/git/commit/8588567c96490b8d236b1bc13f9bcb0dfa118efe Mercurial uses them; the format appears to be a keyword-argument in: https://selenic.com/hg/file/tip/mercurial/simplemerge.py#l91 but it's hardcoded in this regex in filemerge.py: if re.search("^(<<< .*|===|>>> .*)$", fcd.data(), Bazaar uses them; see e.g.: http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/view/head:/bzrlib/tests/test_merge3.py (I couldn't easily tell if they're configurable) FWIW, Perforce appears to use a different format; http://www.perforce.com/perforce/doc.current/manuals/p4guide/chapter.resolve.html has an example showing: ORIGINAL file#n (text from the original version) THEIR file#m (text from their file) YOURS file (text from your file) >From what I can tell, Perforce is the outlier here. > > +static bool > > +c_parser_peek_conflict_marker (c_parser *parser, enum cpp_ttype tok1_kind) > > +{ > > + c_token *token2 = c_parser_peek_2nd_token (parser); > > + if (token2->type != tok1_kind) > > +return false; > > + c_token *token3 = c_parser_peek_nth_token (parser, 3); > > + if (token3->type != tok1_kind) > > +return false; > > + c_token *token4 = c_parser_peek_nth_token (parser, 4); > > + if (token4->type != conflict_marker_get_final_tok_kind (tok1_kind)) > > +return false; > > + return true; > > +} > > Just thinking out loud - I guess it would be too much to hope for to > share lexers between frontends so that we need only one copy of this? Probably :( > > +extern short some_var; /* this line would lead to a warning */ > > Would or does? I don't see anything suppressing it? It's skipped in error-handling. c_parser_declaration_or_fndef has: 1794 declarator = c_parser_declarator (parser, 1795specs->typespec_kind != ctsk_none, 1796C_DTR_NORMAL, ); 1797 if (declarator == NULL) 1798{ [...snip...] 1807 c_parser_skip_to_end_of_block_or_statement (parser); The call to c_parser_declarator fails, and when issuing: 3465 c_parser_error (parser, "expected identifier or %<(%>"); we emit the "conflict marker" wording error for the error. Then at line 1807 we skip, discarding everything up to the ";" in that decl. Would a better wording be: extern short some_var; /* This line would lead to a warning due to the duplicate name, but it is skipped when handling the conflict marker. */ > There seems to be no testcase verifying what happens if the marker is > not at the start of the line (IMO it should not be interpreted as a marker). The v3 patch actually reported them as markers regardless of location. The v4 patch now verifies that they are at the start of the line; I've added test coverage for this (patch-conflict-markers-11.c). That said, it's not clear they're always at the beginning of a line; this bazaar bug indicates that CVS (and bazaar) can emit them mid-line: https://bugs.launchpad.net/bzr/+bug/36399 I noticed a visual glitch with the v3 patch now that we have range information for tokens: with caret-printing, we get just the first token within the marker underlined: <<< HEAD ^~ which looks strange (especially with the underlined chars colorized). Hence in the v4 patch I've added a location tweak so that it underline/colorizes *all* of the marker: <<< HEAD ^~~ Wording-wise, should it be
[PATCH v2] Do not sanitize left shifts for -fwrapv (PR68418)
Left shifts into the sign bit is a kind of overflow, and the standard chooses to treat left shifts of negative values the same way. However, the -fwrapv option modifies the language to one where integers are defined as two's complement---which also defines entirely the behavior of shifts. Disable sanitization of left shifts when -fwrapv is in effect, using the same logic as instrument_si_overflow. The same change was proposed for LLVM at https://llvm.org/bugs/show_bug.cgi?id=25552. Bootstrapped/regtested x86_64-pc-linux-gnu. Ok for trunk, and for GCC 5 branch after 5.3 is released? Thanks, Paolo gcc: PR sanitizer/68418 * c-family/c-ubsan.c (ubsan_instrument_shift): Disable sanitization of left shifts for wrapping signed types as well. gcc/testsuite: PR sanitizer/68418 * gcc.dg/ubsan/c99-wrapv-shift-1.c, gcc.dg/ubsan/c99-wrapv-shift-2.c: New testcases. Index: c-family/c-ubsan.c === --- c-family/c-ubsan.c (revision 231289) +++ c-family/c-ubsan.c (working copy) @@ -124,12 +124,17 @@ ubsan_instrument_shift (location_t loc, enum tree_ t = fold_convert_loc (loc, op1_utype, op1); t = fold_build2 (GT_EXPR, boolean_type_node, t, uprecm1); + /* If this is not a signed operation, don't perform overflow checks. + Also punt on bit-fields. */ + if (!INTEGRAL_TYPE_P (type0) + || TYPE_OVERFLOW_WRAPS (type0) + || GET_MODE_BITSIZE (TYPE_MODE (type0)) != TYPE_PRECISION (type0)) +; + /* For signed x << y, in C99/C11, the following: (unsigned) x >> (uprecm1 - y) if non-zero, is undefined. */ - if (code == LSHIFT_EXPR - && !TYPE_UNSIGNED (type0) - && flag_isoc99) + else if (code == LSHIFT_EXPR && flag_isoc99 && cxx_dialect < cxx11) { tree x = fold_build2 (MINUS_EXPR, op1_utype, uprecm1, fold_convert (op1_utype, unshare_expr (op1))); @@ -142,9 +147,7 @@ ubsan_instrument_shift (location_t loc, enum tree_ /* For signed x << y, in C++11 and later, the following: x < 0 || ((unsigned) x >> (uprecm1 - y)) if > 1, is undefined. */ - if (code == LSHIFT_EXPR - && !TYPE_UNSIGNED (type0) - && (cxx_dialect >= cxx11)) + else if (code == LSHIFT_EXPR && cxx_dialect >= cxx11) { tree x = fold_build2 (MINUS_EXPR, op1_utype, uprecm1, fold_convert (op1_utype, unshare_expr (op1))); Index: testsuite/gcc.dg/ubsan/c99-wrapv-shift-1.c === --- testsuite/gcc.dg/ubsan/c99-wrapv-shift-1.c (revision 0) +++ testsuite/gcc.dg/ubsan/c99-wrapv-shift-1.c (working copy) @@ -0,0 +1,9 @@ +/* { dg-do run } */ +/* { dg-options "-fsanitize=shift -fwrapv -w -std=c99" } */ + +int +main (void) +{ + int a = -42; + a << 1; +} Index: testsuite/gcc.dg/ubsan/c99-wrapv-shift-2.c === --- testsuite/gcc.dg/ubsan/c99-wrapv-shift-2.c (revision 0) +++ testsuite/gcc.dg/ubsan/c99-wrapv-shift-2.c (working copy) @@ -0,0 +1,9 @@ +/* { dg-do run } */ +/* { dg-options "-fsanitize=shift -fwrapv -w -std=c99" } */ + +int +main (void) +{ + int a = 1; + a <<= 31; +}
Re: Splitting up gcc/omp-low.c?
On 12/09/2015 05:24 PM, Thomas Schwinge wrote: In addition to that, how about we split up gcc/omp-low.c into several files? Would it make sense (I have not yet looked in detail) to do so along the borders of the several passes defined therein? Or, can you tell already that there would be too many cross-references between the several files to make this infeasible? It would be nice to get rid of all the code duplication in that file. That alone could reduce the size by quite a bit, and hopefully make it easier to read. I suspect a split along the ompexp/omplow boundary would be quite easy to achieve. I'd suggest to do this shortly before GCC 6 is released, so that backports from trunk to gcc-6-branch will be easy. (I assume we don't have to care for gcc-5-branch backports too much any longer.) I'll declare myself agnostic as to whether such a change is appropriate for gcc-6 at this stage. I guess it kind of depends on the specifics. Bernd
Re: [PATCH] Better error recovery for merge-conflict markers (v4)
On 12/09/2015 05:58 PM, David Malcolm wrote: On Wed, 2015-11-04 at 14:56 +0100, Bernd Schmidt wrote: This seems like fairly low impact but also low cost, so I'm fine with it in principle. I wonder whether the length of the marker is the same across all versions of patch (and VC tools)? It's hardcoded for GNU patch: [...] >From what I can tell, Perforce is the outlier here. Thanks for checking all that. Just thinking out loud - I guess it would be too much to hope for to share lexers between frontends so that we need only one copy of this? Probably :( Someone slap sense into me, I just thought of deriving C and C++ parsers from a common base class... (no this is not a suggestion for this patch). Would a better wording be: extern short some_var; /* This line would lead to a warning due to the duplicate name, but it is skipped when handling the conflict marker. */ I think so, yes. That said, it's not clear they're always at the beginning of a line; this bazaar bug indicates that CVS (and bazaar) can emit them mid-line: https://bugs.launchpad.net/bzr/+bug/36399 Ok. CVS I think we shouldn't worry about, and it looks like this is one particular bug/corner case where the conflict end marker is the last thing in the file. I think on the whole it's best to check for beginning of the line as you've done. Wording-wise, should it be "merge conflict marker", rather than "patch conflict marker"? Clang spells it: "error: version control conflict marker in file" http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#merge_conflicts Yeah, if another compiler has a similar/identical diagnostic I think we should just copy that unless there's a very good reason not to. Rebased on top of r231445 (from yesterday). Successfully bootstrapped on x86_64-pc-linux-gnu. Adds 82 new PASSes to g++.sum and 27 new PASSes to gcc.sum. OK for trunk? I'm inclined to say yes since it was originally submitted in time and it's hard to imagine how the change could be risky (I'll point out right away that there are one or two other patches in the queue that were also submitted in time which I feel should not be considered for gcc-6 at this point due to risk). Let's wait until the end of the week for objections, commit then. Bernd
Re: [PATCH] enable loop fusion on isl-15
Richard Biener wrote: > On Fri, Dec 4, 2015 at 8:59 PM, Sebastian Paul Popwrote: > > I would highly recommend updating the required version of ISL to isl-0.15: > > that would simplify the existing code, removing a lot of code under "#ifdef > > old ISL", > > and allow us to fully transition to schedule_trees instead of dealing with > > the > > old antiquated union_maps in the scheudler. The result is faster > > compilation time. > > Hmm. I think we agreed to raise the requirement to ISL 0.14. OTOH the plan > was to make graphite enabled by -O3 [-fprofile-use] by default which would > mean making ISL a hard host requirement. That raises the barrier on making > the version requirement stricter ... > > Sebastian, were quite into stage3 already - what's your plans / progress with > the defaulting of GRPAHITE? (compile-time / performance numbers though > I see ICEs still popping up - a good thing in some sense as it looks like > GRAPHITE gets testing). Richard, to enable Graphite by default in -O3 we had two requirements: - reduce compilation time when compiling with Graphite, - show performance improvements when using Graphite. We did a good work in reducing the overall compilation time used by Graphite with the rewrite of the SCoP detection algorithm. Overall compilation time spent in SCoP detection on tramp3d-v4 was reduced from 7% to 0.3%. During a bootstrap of GCC the slowest SCoP detection was 0.3%, and the overall percentage overhead in terms of number of instructions executed by the SCoP detection was reduced from 0.3% to 0.01% (measured with counting the number of instructions with valgrind.) Details of algorithmic changes and experiments are in a paper that we submitted (and just got accepted) in the polyhedral workshop IMPACT'16: https://gcc.gnu.org/wiki/Graphite?action=AttachFile=view=scop-detection-submitted-for-review.pdf On the code generation side, we made sure to keep the representation in SSA all the time, such that when Graphite does not transform the code, the original code is left in place with no change. When Graphite does a transform, the generated code is also in SSA form, and we fixed several scalar performance issues linked to the loop level where scalar definitions were inserted: now we compute the insertion place such that there is no need for other code motion passes like LICM to optimize the code generated by Graphite. We are still working on tuning Graphite's fusion, tiling, and interchange and we will report improvements on the Polybench loop kernels soon. Thanks, Sebastian > > Thanks, > Richard. > > > Thanks, > > Sebastian > > > > -Original Message- > > From: Mike Stump [mailto:mikest...@comcast.net] > > Sent: Friday, December 04, 2015 12:03 PM > > To: Alan Lawrence > > Cc: Sebastian Pop; seb...@gmail.com; gcc-patches@gcc.gnu.org; > > hiradi...@msn.com > > Subject: Re: [PATCH] enable loop fusion on isl-15 > > > > On Dec 4, 2015, at 5:13 AM, Alan Lawrencewrote: > >> On 05/11/15 21:43, Sebastian Pop wrote: > >>>* graphite-optimize-isl.c (optimize_isl): Call > >>>isl_options_set_schedule_maximize_band_depth. > >>> > >>>* gcc.dg/graphite/fuse-1.c: New. > >>>* gcc.dg/graphite/fuse-2.c: New. > >>>* gcc.dg/graphite/interchange-13.c: Remove bogus check. > >> > >> I note that the test > >> > >> scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12" 1 > >> > >> FAILs under isl-0.14, with which GCC can still be built and generally > > claims to work. > >> > >> Is it worth trying to detect this in the testsuite, so we can XFAIL it? By > > which I mean, is there a reasonable testsuite mechanism by which we could do > > that? > > > > You can permanently ignore it by updating to 0.15? I don't see the > > advantage of bothering to finesse this too much. I don't know of a way to > > detect 14 v 15 other than this test case, but, if you do that, you can't use > > that result to gate this test case. If one wanted to engineer in a way, one > > would expose the isl version via a preprocessor symbol (built in), and then > > the test case would use that to gate it. If we had to fix it, I think I'd > > prefer we just raise the isl version to 15 or later and be done with it. > >
Re: [PATCH] Add levels to -Wmisleading-indentation; add level 1 to -Wall
On Wed, 2015-12-09 at 16:40 +0100, Bernd Schmidt wrote: > On 12/09/2015 04:38 PM, David Malcolm wrote: > > +/* The following function contains examples of bad indentation that's > > + arguably not misleading, due to a blank line between the guarded and the > > + non-guarded code. Some of the blank lines deliberately contain > > + redundant whitespace, to verify that this is handled. > > + Based on examples seen in gcc/fortran/io.c, gcc/function.c, and > > + gcc/regrename.c. > > + > > + At -Wmisleading-indentation (implying level 1) we shouldn't emit > > + warnings for this function. Compare with > > + fn_40_level_1 in Wmisleading-indentation-level-1.c and > > + fn_40_level_2 in Wmisleading-indentation-level-2.c. */ > > + > > +void > > +fn_40_implicit_level_1 (int arg) > > +{ > > +if (flagA) > > + foo (0); > > + > > + foo (1); > > + > > I personally see no use for the blank line heuristic, in fact I think it > is misguided. To my eyes a warning is clearly warranted for the examples > in this testcase. This goes back to the examples given in: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03225.html This is about managing the signal:noise ratio for something in -Wall. The distinction I want to make here is between badly indented code vs misleadingly indented code. Yes, the code is badly indented, but to my eyes the code is sufficiently badly indented that it has become *not* misleading: I can immediately see that something is awry. I want to preserve the -Wall level of the warning for the cases when it's not immediately clear that there's a problem. The levels idea means that you can turn off the blank-lines heuristic by setting -Wmisleading-indentation=2. > Do we actually have any users calling for this heuristic? FWIW, gcc trunk is clean with the blank-lines heuristic, but not without it (see the URL above for the 3 places I know of that fire without the heuristic). Dave
Re: [RFA] Transparent alias suport part 10: Fix base+offset alias analysis oracle WRT aliases
> > + bool in_symtab1 = decl_in_symtab_p (base1); > > + bool in_symtab2 = decl_in_symtab_p (base2); > > + > > + /* Declarations of non-automatic variables may have aliases. All other > > + decls are unique. */ > > + if (in_symtab1 != in_symtab2 || !in_symtab1) > > +return 0; > > + ret = symtab_node::get_create (base1)->equal_address_to > > +(symtab_node::get_create (base2), true); > > And I don't like ::get_create too much, why's ::get not enough here? > Predicates really shouldn't end up changing global state. I will re-test with ::get here. I think I only copied the code from folding. One case we can get decls w/o corresponding symbol table entries during late compilation is when we introduce a new builtin call. We may just revisit all those place and be sure we inser that to symbol table, but that is for next stage1. > > please refactor to avoid doing this twice WRT !x_decl / !y_decl > > Ok with these changes. Thanks! I think I also need to update IPA-PTA code to work on alias targets same way as I did for ipa-reference (need to construct a testcase) and we sould be down to missed optimizations and not wrongcodes with extra aliases (GVN/CSE/operand_equal_p should know about aliases to make them completely no-op) Honza > > Thanks, > Richard. > > > + /* If decls are different or we know by offsets that there is no > > overlap, > > +we win. */ > > + if (!cmp || !offset_overlap_p (c, xsize, ysize)) > > + return 0; > > + /* Decls may or may not be different and offsets overlap*/ > > + return -1; > > +} > > + else if (rtx_equal_for_memref_p (x, y)) > > { > >return offset_overlap_p (c, xsize, ysize); > > } > > > >/* This code used to check for conflicts involving stack references and > > globals but the base address alias code now handles these cases. */ > > > > @@ -2636,7 +2703,7 @@ nonoverlapping_memrefs_p (const_rtx x, c > > are constants or if one is a constant and the other a pointer into the > > stack frame. Otherwise a different base means we can't tell if they > > overlap or not. */ > > - if (! rtx_equal_p (basex, basey)) > > + if (! compare_base_decls (exprx, expry)) > > return ((CONSTANT_P (basex) && CONSTANT_P (basey)) > > || (CONSTANT_P (basex) && REG_P (basey) > > && REGNO_PTR_FRAME_P (REGNO (basey))) > > @@ -2647,6 +2714,10 @@ nonoverlapping_memrefs_p (const_rtx x, c > >if (loop_invariant) > > return 0; > > > > + /* Offset based disambiguation is OK even if we do not know that the > > + declarations are necessarily different > > +(i.e. compare_base_decls (exprx, expry) == 2) */ > > + > >sizex = (!MEM_P (rtlx) ? (int) GET_MODE_SIZE (GET_MODE (rtlx)) > >: MEM_SIZE_KNOWN_P (rtlx) ? MEM_SIZE (rtlx) > >: -1); > > Index: alias.h > > === > > --- alias.h (revision 231438) > > +++ alias.h (working copy) > > @@ -36,6 +36,7 @@ extern int nonoverlapping_memrefs_p (con > > extern void dump_alias_stats_in_alias_c (FILE *s); > > tree reference_alias_ptr_type (tree); > > bool alias_ptr_types_compatible_p (tree, tree); > > +int compare_base_decls (tree, tree); > > > > /* This alias set can be used to force a memory to conflict with all > > other memories, creating a barrier across which no memory reference > > > > > > -- > Richard Biener> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg)
Re: [AArch64] Emit square root using the Newton series
Hi Evandro, On 08/12/15 21:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? A comment on the patch itself from me... diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 6f7dbce..11c6c9a 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,4 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) - +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT) That seems like a misleading name to me. If we're doing this, that means that the sqrt instruction is not faster than doing the inverse sqrt estimation followed by a multiply. I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines is more appropriate. Thanks, Kyrill
Re: [AArch64] Emit square root using the Newton series
On 09/12/15 17:02, Kyrill Tkachov wrote: On 09/12/15 16:59, Evandro Menezes wrote: On 12/09/2015 10:52 AM, Kyrill Tkachov wrote: Hi Evandro, On 08/12/15 21:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? A comment on the patch itself from me... diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 6f7dbce..11c6c9a 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,4 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) - +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT) That seems like a misleading name to me. If we're doing this, that means that the sqrt instruction is not faster than doing the inverse sqrt estimation followed by a multiply. I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines is more appropriate. Unfortunately, this is the case on Exynos M1: the series is faster than the instruction. :-( So, other targets when this is also true, using the "fast_sqrt" option might make sense. Sure, but the way your patch is written, we will emit the series when "fast_sqrt" is set, rather than the other way around, unless I'm misreading the logic in: Sorry, what I meant to say is it would be clearer, IMO, to describe the compiler action that is being taken (e.g. the rename_fma_regs tuning flag), in this case it's estimating sqrt using a series. Kyrill diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 030a101..f6d2da4 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4280,7 +4280,23 @@ ;; sqrt -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:VDQF 0 "register_operand") +(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))] + "TARGET_SIMD" +{ + if ((AARCH64_EXTRA_TUNE_FAST_SQRT & aarch64_tune_params.extra_tuning_flags) + && !optimize_function_for_size_p (cfun) + && flag_finite_math_only + && !flag_trapping_math + && flag_unsafe_math_optimizations) +{ + aarch64_emit_swsqrt (operands[0], operands[1]); + DONE; +} +}) Thanks, Kyrill
Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
Hi Christian, On 08/12/15 12:53, Christian Bruel wrote: Hi, The order of the NEON builtins construction has led to complications since the attribute target support. This was not a problem when driven from the command line, but was causing various issues when the builtins was mixed between fpu configurations or when used with LTO. Firstly the builtin functions was not initialized before the parsing of functions, leading to wrong type initializations. Then error catching code when a builtin was used without the proper fpu flags was incomprehensible for the user, for instance #include "arm_neon.h" int8x8_t a, b; int16x8_t e; void main() { e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b); } compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave pages of /arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name '__simd64_int8_t' typedef __simd64_int8_t int8x8_t; ... ... arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka __vector(4) int}' to type 'int' which has different size return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c); ^~ ... ... and one for each arm_neon.h lines.. by postponing the check into arm_expand_builtin, we now emit something more useful: testo.c: In function 'main': testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not supported in this configuration. e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b); ^~~ One small side effect to note: The total memory allocated is 370k bigger when neon is not used, so this support will have a follow-up to make their initialization lazy. But I'd like first to stabilize the stuff for stage3 (or get it pre-approved if the memory is an issue) tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\} (a few tests that was fail are now unsupported) I agree, the vector types (re)initialisation is a tricky part. I've seen similar issues in the aarch64 work for target attributes bool arm_vector_mode_supported_p (machine_mode mode) { - /* Neon also supports V2SImode, etc. listed in the clause below. */ - if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode + if (mode == V2SFmode || mode == V4SImode || mode == V8HImode || mode == V4HFmode || mode == V16QImode || mode == V4SFmode - || mode == V2DImode || mode == V8HFmode)) -return true; - - if ((TARGET_NEON || TARGET_IWMMXT) - && ((mode == V2SImode) - || (mode == V4HImode) - || (mode == V8QImode))) + || mode == V2DImode || mode == V8HFmode + || mode == V2SImode || mode == V4HImode || mode == V8QImode) return true; So this allows vector modes unconditionally for all targets/fpu configurations? I was tempted to do that in aarch64 when I was encountering similar issues. In the end what worked for me was re-laying out the vector types in SET_CURRENT_FUNCTION if necessary (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01084.html) Kyrill
[gomp4, committed] backport: "Fix oacc kernels default mapping for scalars"
Hi, I've backported the patch "Fix oacc kernels default mapping for scalars" from trunk to gomp-4_0-branch. Committed to gomp-4_0-branch. Thanks, - Tom backport: "Fix oacc kernels default mapping for scalars" 2015-12-02 Tom de Vriesbackport from trunk: * gimplify.c (enum gimplify_omp_var_data): Add enum value GOVD_MAP_FORCE. (oacc_default_clause): Fix default for scalars in oacc kernels. (gimplify_adjust_omp_clauses_1): Handle GOVD_MAP_FORCE. * c-c++-common/goacc/kernels-default-2.c: New test. * c-c++-common/goacc/kernels-default.c: New test. --- gcc/gimplify.c | 21 +++-- .../c-c++-common/goacc/kernels-default-2.c | 17 + gcc/testsuite/c-c++-common/goacc/kernels-default.c | 14 ++ gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 | 2 +- 4 files changed, 47 insertions(+), 7 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index e8964c6..d305165 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -90,8 +90,11 @@ enum gimplify_omp_var_data /* Flag for shared vars that are or might be stored to in the region. */ GOVD_WRITTEN = 131072, + /* Flag for GOVD_MAP, if it is a forced mapping. */ + GOVD_MAP_FORCE = 262144, + /* OpenACC deviceptr clause. */ - GOVD_USE_DEVPTR = 1 << 18, + GOVD_USE_DEVPTR = 1 << 19, GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR @@ -5988,8 +5991,12 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags) gcc_unreachable (); case ORT_ACC_KERNELS: - /* Everything under kernels are default 'present_or_copy'. */ + /* Scalars are default 'copy' under kernels, non-scalars are default + 'present_or_copy'. */ flags |= GOVD_MAP; + if (!AGGREGATE_TYPE_P (TREE_TYPE (decl))) + flags |= GOVD_MAP_FORCE; + rkind = "kernels"; break; @@ -7700,10 +7707,12 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data) } else if (code == OMP_CLAUSE_MAP) { - OMP_CLAUSE_SET_MAP_KIND (clause, - flags & GOVD_MAP_TO_ONLY - ? GOMP_MAP_TO - : GOMP_MAP_TOFROM); + int kind = (flags & GOVD_MAP_TO_ONLY + ? GOMP_MAP_TO + : GOMP_MAP_TOFROM); + if (flags & GOVD_MAP_FORCE) + kind |= GOMP_MAP_FLAG_FORCE; + OMP_CLAUSE_SET_MAP_KIND (clause, kind); if (DECL_SIZE (decl) && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST) { diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c new file mode 100644 index 000..232b123 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c @@ -0,0 +1,17 @@ +/* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fdump-tree-gimple" } */ + +#define N 2 + +void +foo (void) +{ + unsigned int a[N]; + +#pragma acc kernels + { +a[0]++; + } +} + +/* { dg-final { scan-tree-dump-times "map\\(tofrom" 1 "gimple" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-default.c b/gcc/testsuite/c-c++-common/goacc/kernels-default.c new file mode 100644 index 000..58cd5e1 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-default.c @@ -0,0 +1,14 @@ +/* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-fdump-tree-gimple" } */ + +void +foo (void) +{ + unsigned int i; +#pragma acc kernels + { +i++; + } +} + +/* { dg-final { scan-tree-dump-times "map\\(force_tofrom" 1 "gimple" } } */ diff --git a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 index 3f46eac..9d34cbd 100644 --- a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 @@ -17,6 +17,6 @@ end subroutine ! { dg-final { scan-tree-dump-times "target oacc_parallel firstprivate.a." 1 "gimple" } } ! { dg-final { scan-tree-dump-times "acc loop reduction..:a. private.p." 1 "gimple" } } -! { dg-final { scan-tree-dump-times "target oacc_kernels map.tofrom:a .len: 4.." 1 "gimple" } } +! { dg-final { scan-tree-dump-times "target oacc_kernels map.force_tofrom:a .len: 4.." 1 "gimple" } } ! { dg-final { scan-tree-dump-times "acc loop reduction..:a. private.k." 1 "gimple" } }
Re: [patch] remove WCHAR_TYPE definition for FreeBSD PowerPC64
On 06.12.15 23:54, Andreas Tobler wrote: Hi, I'm going to commit this patch to trunk, 5.4 and 4.9 branch if there are no objections. The redefinition of WCHAR_TYPE for PowerPC64 is wrong since its beginning. My fault. We use the definition from freebsd.h. Committed to trunk. But I left the #undef WCHAR_TYPE. I deleted to fast.. For the diff see: https://gcc.gnu.org/ml/gcc-cvs/2015-12/msg00379.html I'll apply the same fix later to gcc-5 and gcc-4.9 tree. Andreas 2015-12-06 Andreas Tobler* config/rs6000/freebsd64.h: Remove the redefinition of WCHAR_TYPE. Index: freebsd64.h === --- freebsd64.h (revision 231129) +++ freebsd64.h (working copy) @@ -316,8 +316,6 @@ #define PTRDIFF_TYPE (TARGET_64BIT ? "long int" : "int") /* rs6000.h gets this wrong for FreeBSD. We use the GCC defaults instead. */ -#undef WCHAR_TYPE -#defineWCHAR_TYPE (TARGET_64BIT ? "int" : "long int") #undef WCHAR_TYPE_SIZE #define WCHAR_TYPE_SIZE 32
Re: [PATCH] enable loop fusion on isl-15
On December 9, 2015 6:45:42 PM GMT+01:00, Sebastian Popwrote: >Richard Biener wrote: >> On Fri, Dec 4, 2015 at 8:59 PM, Sebastian Paul Pop > wrote: >> > I would highly recommend updating the required version of ISL to >isl-0.15: >> > that would simplify the existing code, removing a lot of code under >"#ifdef >> > old ISL", >> > and allow us to fully transition to schedule_trees instead of >dealing with >> > the >> > old antiquated union_maps in the scheudler. The result is faster >> > compilation time. >> >> Hmm. I think we agreed to raise the requirement to ISL 0.14. OTOH >the plan >> was to make graphite enabled by -O3 [-fprofile-use] by default which >would >> mean making ISL a hard host requirement. That raises the barrier on >making >> the version requirement stricter ... >> >> Sebastian, were quite into stage3 already - what's your plans / >progress with >> the defaulting of GRPAHITE? (compile-time / performance numbers >though >> I see ICEs still popping up - a good thing in some sense as it looks >like >> GRAPHITE gets testing). > >Richard, > >to enable Graphite by default in -O3 we had two requirements: >- reduce compilation time when compiling with Graphite, >- show performance improvements when using Graphite. > >We did a good work in reducing the overall compilation time used by >Graphite >with the rewrite of the SCoP detection algorithm. Overall compilation >time >spent in SCoP detection on tramp3d-v4 was reduced from 7% to 0.3%. >During a >bootstrap of GCC the slowest SCoP detection was 0.3%, and the overall >percentage >overhead in terms of number of instructions executed by the SCoP >detection was >reduced from 0.3% to 0.01% (measured with counting the number of >instructions >with valgrind.) Details of algorithmic changes and experiments are in >a paper >that we submitted (and just got accepted) in the polyhedral workshop >IMPACT'16: >https://gcc.gnu.org/wiki/Graphite?action=AttachFile=view=scop-detection-submitted-for-review.pdf > >On the code generation side, we made sure to keep the representation in >SSA all >the time, such that when Graphite does not transform the code, the >original code >is left in place with no change. When Graphite does a transform, the >generated >code is also in SSA form, and we fixed several scalar performance >issues linked >to the loop level where scalar definitions were inserted: now we >compute the >insertion place such that there is no need for other code motion passes >like >LICM to optimize the code generated by Graphite. > >We are still working on tuning Graphite's fusion, tiling, and >interchange and we >will report improvements on the Polybench loop kernels soon. Thanks for the progress report! Let's raise the ISL requirement to 0.14 now to see whether it imposes problems on some hosts. Richard. >Thanks, >Sebastian > >> >> Thanks, >> Richard. >> >> > Thanks, >> > Sebastian >> > >> > -Original Message- >> > From: Mike Stump [mailto:mikest...@comcast.net] >> > Sent: Friday, December 04, 2015 12:03 PM >> > To: Alan Lawrence >> > Cc: Sebastian Pop; seb...@gmail.com; gcc-patches@gcc.gnu.org; >> > hiradi...@msn.com >> > Subject: Re: [PATCH] enable loop fusion on isl-15 >> > >> > On Dec 4, 2015, at 5:13 AM, Alan Lawrence>wrote: >> >> On 05/11/15 21:43, Sebastian Pop wrote: >> >>>* graphite-optimize-isl.c (optimize_isl): Call >> >>>isl_options_set_schedule_maximize_band_depth. >> >>> >> >>>* gcc.dg/graphite/fuse-1.c: New. >> >>>* gcc.dg/graphite/fuse-2.c: New. >> >>>* gcc.dg/graphite/interchange-13.c: Remove bogus check. >> >> >> >> I note that the test >> >> >> >> scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12" >1 >> >> >> >> FAILs under isl-0.14, with which GCC can still be built and >generally >> > claims to work. >> >> >> >> Is it worth trying to detect this in the testsuite, so we can >XFAIL it? By >> > which I mean, is there a reasonable testsuite mechanism by which we >could do >> > that? >> > >> > You can permanently ignore it by updating to 0.15? I don't see the >> > advantage of bothering to finesse this too much. I don't know of a >way to >> > detect 14 v 15 other than this test case, but, if you do that, you >can't use >> > that result to gate this test case. If one wanted to engineer in a >way, one >> > would expose the isl version via a preprocessor symbol (built in), >and then >> > the test case would use that to gate it. If we had to fix it, I >think I'd >> > prefer we just raise the isl version to 15 or later and be done >with it. >> >
[PATCH 2/3] [graphite] add array access function in the right order
we used to add the access functions in the wrong order, Fortran style, leading to unprofitable interchanges. --- gcc/graphite-sese-to-poly.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index 887c212..480c552 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -912,7 +912,7 @@ pdr_add_memory_accesses (isl_map *acc, dr_info ) for (i = 0; i < nb_subscripts; i++) { isl_pw_aff *aff; - tree afn = DR_ACCESS_FN (dr, nb_subscripts - 1 - i); + tree afn = DR_ACCESS_FN (dr, i); aff = extract_affine (scop, afn, isl_space_domain (isl_map_get_space (acc))); -- 1.9.1
[Patch] Fix for MIPS PR target/65604
This is a MIPS patch to make mips_output_division obey the -fno-delayed-branch flag. Right now, with mips1 and -mcheck-zero-division, the division instruction is put into the bne delay slot even when -fno-delayed-branch is specified. This change uses a similar strategy to MIPS16 where we do the division first and then do the zero test while the division is being calculated. Tested with mips1 runs and by inspecting the code that is output. OK to checkin? Steve Ellcey sell...@imgtec.com 2015-12-09 Steve EllceyPR target/65604 * config/mips/mips.c (mips_output_division): Check flag_delayed_branch. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 6145944..8444a91 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -13687,9 +13687,17 @@ mips_output_division (const char *division, rtx *operands) } else { - output_asm_insn ("%(bne\t%2,%.,1f", operands); - output_asm_insn (s, operands); - s = "break\t7%)\n1:"; + if (flag_delayed_branch) + { + output_asm_insn ("%(bne\t%2,%.,1f", operands); + output_asm_insn (s, operands); + s = "break\t7%)\n1:"; + } + else + { + output_asm_insn (s, operands); + s = "bne\t%2,%.,1f\n\tnop\n\tbreak\t7\n1:"; + } } } return s;
Re: PATCH to shorten_compare -Wtype-limits handling
On 20 November 2015 at 22:28, Jeff Lawwrote: >>> That was the overall plan and he posted a patch for that. But that patch >>> didn't do the due diligence to verify that once the shortening code was >>> made >>> "pure" that we didn't regress on the quality of the code we generated. >> >> >> I thought that the original plan was to make the warning code also use >> match.pd. That is, that all folding, including FE folding, will be >> match.pd-based. Has this changed? > > I don't think that's changed. > > Detangling the two is the first step. Once detangled we can then look to > move the warning to a more suitable location -- right now it's in the C/C++ > front-ends, firing way too early. Instead it ought to be checked in gimple > form, after match.pd canonicalization and simplifications. Perhaps worth noting is that the warnings in shorten_compare are basically trying to figure out if X CMP_OP Y can be folded to true/false. They do not care about shortening/optimizing the types and conversions. Perhaps an intermediate step would be to duplicate the function into a warning version and an optimization version, making the warning version pure as you say, call the warning version first, then the optimizing one. Note that this is not exactly what you want to have at the end and it adds a bit of duplication, but it gives a clear separation between what is needed for warning and what is needed for optimizing.This is step one. Step two makes the warning version fold using match.pd, fix any warning regressions. Note that the warning version does not really need to be pure. If the comparison can be folded to true/false, there is no much more optimization to be done. But I'm not sure how the FEs are handling results from folding for warnings. Otherwise, one is probably duplicating a bit of analysis between the warning and optimizing functions. Not sure if the overhead will be measurable at all once the warning version uses match.pd. Step three moves the optimizing version to match.pd. Identifying regressions here may be harder, but no harder than any other move of optimizations to match.pd. The attached patch implements step 1. Feel free to do as you please with it. Cheers, Manuel. Index: c-common.c === --- c-common.c (revision 230753) +++ c-common.c (working copy) @@ -4419,10 +4419,251 @@ expr_original_type (tree expr) { STRIP_SIGN_NOPS (expr); return TREE_TYPE (expr); } +static void +warn_shorten_compare (location_t loc, tree op0, tree op1, + tree restype, enum tree_code rescode) +{ + if (!warn_type_limits) +return; + + switch (rescode) +{ +case NE_EXPR: +case EQ_EXPR: +case LT_EXPR: +case GT_EXPR: +case LE_EXPR: +case GE_EXPR: + break; + +default: + return; +} + + /* Throw away any conversions to wider types + already present in the operands. */ + int unsignedp0, unsignedp1; + tree primop0 = c_common_get_narrower (op0, ); + tree primop1 = c_common_get_narrower (op1, ); + + /* If primopN is first sign-extended from primopN's precision to opN's + precision, then zero-extended from opN's precision to + restype precision, shortenings might be invalid. */ + if (TYPE_PRECISION (TREE_TYPE (primop0)) < TYPE_PRECISION (TREE_TYPE (op0)) + && TYPE_PRECISION (TREE_TYPE (op0)) < TYPE_PRECISION (restype) + && !unsignedp0 + && TYPE_UNSIGNED (TREE_TYPE (op0))) +primop0 = op0; + if (TYPE_PRECISION (TREE_TYPE (primop1)) < TYPE_PRECISION (TREE_TYPE (op1)) + && TYPE_PRECISION (TREE_TYPE (op1)) < TYPE_PRECISION (restype) + && !unsignedp1 + && TYPE_UNSIGNED (TREE_TYPE (op1))) +primop1 = op1; + + /* Handle the case that OP0 does not *contain* a conversion + but it *requires* conversion to FINAL_TYPE. */ + if (op0 == primop0 && TREE_TYPE (op0) != restype) +unsignedp0 = TYPE_UNSIGNED (TREE_TYPE (op0)); + if (op1 == primop1 && TREE_TYPE (op1) != restype) +unsignedp1 = TYPE_UNSIGNED (TREE_TYPE (op1)); + + /* If one of the operands must be float, we do not warn. */ + if (TREE_CODE (TREE_TYPE (primop0)) == REAL_TYPE + || TREE_CODE (TREE_TYPE (primop1)) == REAL_TYPE) +return; + + /* If first arg is constant, swap the args (changing operation + so value is preserved), for canonicalization. Don't do this if + the second arg is 0. */ + + if (TREE_CONSTANT (primop0) + && !integer_zerop (primop1) && !real_zerop (primop1) + && !fixed_zerop (primop1)) +{ + std::swap (primop0, primop1); + std::swap (op0, op1); + std::swap (unsignedp0, unsignedp1); + + switch (rescode) + { + case LT_EXPR: + rescode = GT_EXPR; + break; + case GT_EXPR: + rescode = LT_EXPR; + break; + case LE_EXPR: + rescode = GE_EXPR; + break; + case GE_EXPR: + rescode = LE_EXPR; + break; +
Re: [AArch64] Emit square root using the Newton series
On 12/09/2015 11:16 AM, Kyrill Tkachov wrote: On 09/12/15 17:02, Kyrill Tkachov wrote: On 09/12/15 16:59, Evandro Menezes wrote: On 12/09/2015 10:52 AM, Kyrill Tkachov wrote: Hi Evandro, On 08/12/15 21:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? A comment on the patch itself from me... diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 6f7dbce..11c6c9a 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,4 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) - +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT) That seems like a misleading name to me. If we're doing this, that means that the sqrt instruction is not faster than doing the inverse sqrt estimation followed by a multiply. I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines is more appropriate. Unfortunately, this is the case on Exynos M1: the series is faster than the instruction. :-( So, other targets when this is also true, using the "fast_sqrt" option might make sense. Sure, but the way your patch is written, we will emit the series when "fast_sqrt" is set, rather than the other way around, unless I'm misreading the logic in: Sorry, what I meant to say is it would be clearer, IMO, to describe the compiler action that is being taken (e.g. the rename_fma_regs tuning flag), in this case it's estimating sqrt using a series. Kyrill diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 030a101..f6d2da4 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4280,7 +4280,23 @@ ;; sqrt -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:VDQF 0 "register_operand") +(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))] + "TARGET_SIMD" +{ + if ((AARCH64_EXTRA_TUNE_FAST_SQRT & aarch64_tune_params.extra_tuning_flags) + && !optimize_function_for_size_p (cfun) + && flag_finite_math_only + && !flag_trapping_math + && flag_unsafe_math_optimizations) +{ + aarch64_emit_swsqrt (operands[0], operands[1]); + DONE; +} +}) Kyrill, How about "approx_sqrt" for, you guessed it, approximate square root? The same adjective would perhaps describe "recip_sqrt" better too. Thanks, -- Evandro Menezes
[gomp-nvptx] git branch created
Hello, I have created a git-only branch "gomp-nvptx" for development of OpenMP offloading on NVPTX. Changelogs on the branch have the corresponding suffix, and I'll use it to tag emails with patches for the branch. The branch currently has 41 patch on top of yesterday's trunk. Some of those have not been posted yet (for various reasons like being temporary or incomplete), but most important stuff has been posted. Building correct -mgomp multilib for Newlib requires a patch, which has been requested for inclusion here: https://github.com/MentorEmbedded/nvptx-newlib/pull/2 The libgomp c/c++ testsuite has 6 failures, of those 2 due to missing alloca, 2 due to missing usleep, and 2 due to stubbed-out GOMP_OFFLOAD_async_run. Fortran testsuite has several failures due to incomplete libgomp port. Thanks. Alexander
Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class
On Wed, Dec 9, 2015 at 6:05 AM, Richard Bienerwrote: > On Tue, Dec 8, 2015 at 5:22 PM, H.J. Lu wrote: >> On Mon, Nov 23, 2015 at 12:53 PM, H.J. Lu wrote: >>> On Mon, Nov 23, 2015 at 1:57 AM, Richard Biener >>> wrote: On Sat, Nov 21, 2015 at 12:46 AM, H.J. Lu wrote: > On Fri, Nov 20, 2015 at 2:17 PM, Jason Merrill wrote: >> On 11/20/2015 01:52 PM, H.J. Lu wrote: >>> >>> On Tue, Nov 17, 2015 at 4:22 AM, Richard Biener >>> wrote: On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu wrote: > > Empty record should be returned and passed the same way in C and C++. > This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which > defaults to return false. For C++, LANG_HOOKS_EMPTY_RECORD_P is > defined > to is_really_empty_class, which returns true for C++ empty classes. > For > LTO, we stream out a bit to indicate if a record is empty and we store > it in TYPE_LANG_FLAG_0 when streaming in. get_ref_base_and_extent is > changed to set bitsize to 0 for empty records. Middle-end and x86 > backend are updated to ignore empty records for parameter passing and > function value return. Other targets may need similar changes. Please avoid a new langhook for this and instead claim a bit in tree_type_common like for example restrict_flag (double-check it is unused for non-pointers). >>> >>> >>> There is no bit in tree_type_common I can overload. restrict_flag is >>> checked for non-pointers to issue an error when it is used on >>> non-pointers: >>> >>> >>> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38: >>> error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’ >>> typedef typename T::L __restrict__ r;// { dg-error "'__restrict__' >>> qualifiers cannot" "" } >> >> >> The C++ front end only needs to check TYPE_RESTRICT for this purpose on >> front-end-specific type codes like TEMPLATE_TYPE_PARM; cp_type_quals >> could >> handle that specifically if you change TYPE_RESTRICT to only apply to >> pointers. >> > > restrict_flag is also checked in this case: > > [hjl@gnu-6 gcc]$ cat x.i > struct dummy { }; > > struct dummy > foo (struct dummy __restrict__ i) > { > return i; > } > [hjl@gnu-6 gcc]$ gcc -S x.i -Wall > x.i:4:13: error: invalid use of ‘restrict’ > foo (struct dummy __restrict__ i) > ^ > x.i:4:13: error: invalid use of ‘restrict’ > [hjl@gnu-6 gcc]$ > > restrict_flag can't also be used to indicate `i' is an empty record. I'm sure this error can be done during parsing w/o relying on TYPE_RESTRICT. But well, use any other free bit (but do not enlarge tree_type_common). Eventually you can free up a bit by putting sth into type_lang_specific currently using bits in tree_type_common. >>> >>> There are no bits in tree_type_common I can move. Instead, >>> this patch overloads side_effects_flag in tree_base. Tested on >>> Linux/x86-64. OK for trunk? > > I'm fine with using side_effects_flag for this. > > I miss an explanation of where this detail of the ABI is documented and wonder > if the term "empty record" is also used in that document and how it is > documented. > > Thus > > +/* Nonzero in a type considered an empty record. */ > +#define TYPE_EMPTY_RECORD(NODE) \ > + (TYPE_CHECK (NODE)->base.side_effects_flag) > > should refer to the ABI where is defined what an "empty record" is and how > it is handled by the backend(s). Empty C++ class is a corner case which isn't covered in psABI nor C++ ABI. There is no mention of "empty record" in GCC documentation. But there are plenty of "empty class" in gcc/cp. This change affects all targets. C++ ABI should specify how it should be passed. > +/* Return true if type T is an empty record. */ > + > +static inline bool > +type_is_empty_record_p (const_tree t) > +{ > + return TYPE_EMPTY_RECORD (TYPE_MAIN_VARIANT (t)); > > the type checker should probably check the bit is consistent across > variants so it can be tested on any of them. TYPE_EMPTY_RECORD is only relevant for parameter passing. For struct foo; typedef foo bar; TYPE_EMPTY_RECORD has no impact. Since call only uses the main variant type, checking TYPE_MAIN_VARIANT is sufficient. > You fail to adjust other targets gimplification hooks which suggests > this is a detail of the x86 psABI and not the C++ ABI? If so I miss > a -Wpsabi warning for this change. My change is generic. The only x86 specific change is diff --git a/gcc/config/i386/i386.c
Re: [Patch, Fortran] PR68815 - replace '%s' quotes by %< ... %>
On Wed, Dec 09, 2015 at 04:53:37PM +0100, Tobias Burnus wrote: > > Build and regtested on x86-64-gnu-linux. > OK for the trunk? > OK. -- Steve
[PATCH 3/3] [graphite] specify more isl codegen options
--- gcc/graphite-optimize-isl.c | 8 1 file changed, 8 insertions(+) diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c index f90fcfd..50f2b3c 100644 --- a/gcc/graphite-optimize-isl.c +++ b/gcc/graphite-optimize-isl.c @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3. If not see #include #ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS #include +#include #endif #include "graphite.h" @@ -405,7 +406,14 @@ optimize_isl (scop_p scop) isl_options_set_schedule_maximize_band_depth (scop->isl_context, 1); #ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS /* ISL-0.15 or later. */ + isl_options_set_schedule_serialize_sccs (scop->isl_context, 0); isl_options_set_schedule_maximize_band_depth (scop->isl_context, 1); + isl_options_set_schedule_max_constant_term (scop->isl_context, 20); + isl_options_set_schedule_max_coefficient (scop->isl_context, 20); + isl_options_set_tile_scale_tile_loops (scop->isl_context, 0); + isl_options_set_coalesce_bounded_wrapping (scop->isl_context, 1); + isl_options_set_ast_build_exploit_nested_bounds (scop->isl_context, 1); + isl_options_set_ast_build_atomic_upper_bound (scop->isl_context, 1); #else isl_options_set_schedule_fuse (scop->isl_context, ISL_SCHEDULE_FUSE_MIN); #endif -- 1.9.1
[PATCH 1/3] [graphite] dump param name when timing out
--- gcc/graphite-optimize-isl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c index 8727e39..f90fcfd 100644 --- a/gcc/graphite-optimize-isl.c +++ b/gcc/graphite-optimize-isl.c @@ -426,7 +426,7 @@ optimize_isl (scop_p scop) if (!schedule || isl_ctx_last_error (scop->isl_context) == isl_error_quota) { if (dump_file && dump_flags) - fprintf (dump_file, "ISL timed out at %d operations\n", + fprintf (dump_file, "ISL timed out --param max-isl-operations=%d\n", max_operations); if (schedule) isl_schedule_free (schedule); -- 1.9.1
Re: [PATCH][ARC] Refurbish emitting DWARF2 for epilogue.
On 09/12/15 15:34, Claudiu Zissulescu wrote: Well, it seems to me that we prefer to disable optimizations when talking about debug related information (see PR target/60598 git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@208749 138bc75d-0d04-0410-961f-82ee72b054a4 commit). Actually, unwind information might also be needed for exception handling, depending on the target. (sjlj exception handling or dwarf2) In which case it becomes a matter of program correctness if any exception might be raised that could lead to incorrect unwinding due to incorrect/incomplete unwind info. So were both wrong: this is not merely about debugging. OTOH, the example you give also shows a much more nuanced approach to throttling optimization: the patch doesn't dead all epilogue scheduling, but specifically tests for the presence of a frame related insn at the point where it could cause trouble. The equivalent would be to reject a frame related insn in the delay slot of a conditional branch. For arc.md, that would likely involve splitting four of the define_delay statements into two each. If that seems a bit much, a compromize would be to patch in_delay_slot, thus affecting unconditional branches too. More ambitious approaches would be to move the note to a (zero-sized, pseudo-)placeholder insn, or finding a way to make the dwarf2cfi cope with speculated frame related insns. Eg.g even if the delay slot is not anulled, could we mark the note as anulled? Another approach would be to fix fill_eager_delay_slots not to stuff unconditional frame related insns into non-anulled delay slots of conditional branches. Or have some target hook to make it not even bother filling delay slots speculatively; for targets that can fully unexpose the delay slot, like SH and ARC >= ARC700, this aspect of fill_eager_delay_slots only mucks up schedules and increases code size. Unfortunately, dwarf2cfi checks the paths for consistency (dwarf2cfi.c:2284), throwing an error if those paths are not ok. Also, with ARC gcc we focus on Linux type of application, hence, the unwinding info needs to be consistent. As far as I can see, having a sort of mno-epilogue-cfi option will just inhibit or not the emitting of the blockage instruction, and the option will be valid only when compiling without emitting dwarf information (i.e., the mno-epilogue-cfi is incompatible with -g). Personally, I do not see the benefit of having such an option, as one may lose like 1 cycle per function call (HS/EM cpus) in very particular cases. Running the dg.exp, compile.exp, execute.exp, and builing Linux with some default apps, we found only 4 cases in which the branch scheduler slot needs the blockage mechanism. The number of problems you found needs not bear any relation to the number of optimizations suppressed by the blockage instruction. Consider the case when there are high-latency unconditional instructions just before a lengthy epilogue. sched2 scheduling some epilogue instructions into the scheduling bubbles can hide the latencies. More relevant ways to get data would be comparing the object files (from a whole toolchain library set and/or one or more big application(s)) built with/without the blockage insn emitted, or to benchmark it. Also, adding blockage before generating any prologue instruction seems to be a wide spread practice in gcc backends (see SH for example). It is a quick way to 'fix' (actually, hide) bugs. At the expense of optimization. It can be justified if the resources to keep a port in working order are limited, and so are the forgone optimization benefits. But at a minimum, you should add a comment to explain what problem you are papering over. That works best if you actually file a bug report in bugzilla first (about the interaction of fill_eager_delay_slots and dwarf2cfi) so that you can name the bug.
Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment
Alessandro Fanfarillo wrote: Done. Thanks. Committed as r231476. Do we need to do anything about GCC 5 or is this only a GCC 6 issue? I have permission for contributing but I don't have write permission on the repository. That can be changed: Simply fill out the form and list me (burnus (at] gcc.gnu.org) as sponsor: https://sourceware.org/cgi-bin/pdw/ps_form.cgi – and see https://gcc.gnu.org/svnwrite.html Tobias
Re: [PATCH] rtlanal: Fix bits/bytes confusion in set_noop_p (PR68814)
On Wed, Dec 09, 2015 at 10:13:34PM +0100, Eric Botcazou wrote: > > The meaning of ZERO_EXTRACT depends on BITS_BIG_ENDIAN, not on > > BYTES_BIG_ENDIAN. > > That's correct. > > --- a/gcc/rtlanal.c > > +++ b/gcc/rtlanal.c > > @@ -1534,7 +1534,7 @@ set_noop_p (const_rtx set) > > > >if (GET_CODE (dst) == ZERO_EXTRACT) > > return rtx_equal_p (XEXP (dst, 0), src) > > - && ! BYTES_BIG_ENDIAN && XEXP (dst, 2) == const0_rtx > > + && !BITS_BIG_ENDIAN && XEXP (dst, 2) == const0_rtx > >&& !side_effects_p (src); > > > >if (GET_CODE (dst) == STRICT_LOW_PART) > > I have a hard time figuring out what a SET verifying the condition would > really mean. Could you post it and explain where it comes from? Sure! The condition is meant to say "if you set the low part of a reg to be equal to that reg". And it only handles little-endian numbering (many things with zero_extract do). My case comes from the gcc.dg/pr65492-2.c, the "test1int2" case. combine has made an insn modifying insn i321: zero_extract(%3:DI,0x20,0)=%3:DI which is splatting the SImode parameter to both the high and low halves of the dest reg. Then, it tries to combine it with the USE of that reg at the end of the function, giving Failed to match this instruction: (parallel [ (use (ior:DI (and:DI (reg/i:DI 3 3) (const_int 4294967295 [0x])) (ashift:DI (reg:DI 3 3 [ c ]) (const_int 32 [0x20] (set (zero_extract:DI (reg/i:DI 3 3) (const_int 32 [0x20]) (const_int 0 [0])) (reg:DI 3 3 [ c ])) ]) and if it has a parallel of two which doesn't match, it sees if it just needs one arm because the other is a noop set, and that ends up with deleting noop move 21 because of the wrong test, making the testcase fail. (powerpc64le has BITS_BIG_ENDIAN set, a bit unusual). Segher
[PATCH] Fix up noce_try_abs again (PR rtl-optimization/68670)
Hi! Not sure what I've been thinking when writing the previous noce_try_abs fix. I thought that the optimization can be applied for all the conditions, and whether it can be applied depends on if it is cond ? ~x : x or cond ? x : ~x. But that is not the case, the optimization can be only applied to a subset of conditions, and when it can be applied, it can be applied to both the cond ? ~x : x and cond ? x : ~x cases (depending on the condition one is one_cmpl_abs (x) and the other ~one_cmpl_abs (x)). So, the previous patch fixed some cases (the ones triggered in the testcases), kept other cases broken (as can be seen in the new testcases), pessimized some cases (stopped applying one_cmpl_abs when it could be applied) and kept optimizing other cases. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/5.4? 2015-12-09 Jakub JelinekPR rtl-optimization/68376 PR rtl-optimization/68670 * ifcvt.c (noce_try_abs): For one_cmpl allow < 0, >= 0 or > -1 conditions regardless of negate, and disallow all other conditions. * gcc.c-torture/execute/pr68376-2.c (f5, f6, f7, f8): New tests. (main): Call them. * gcc.dg/pr68670-1.c: New test. * gcc.dg/pr68670-2.c: New test. --- gcc/ifcvt.c.jj 2015-12-03 16:39:36.0 +0100 +++ gcc/ifcvt.c 2015-12-09 17:18:35.418052077 +0100 @@ -2601,17 +2601,14 @@ noce_try_abs (struct noce_if_info *if_in Note that these rtx constants are known to be CONST_INT, and therefore imply integer comparisons. The one_cmpl case is more complicated, as we want to handle - only x < 0 ? ~x : x or x >= 0 ? ~x : x but not - x <= 0 ? ~x : x or x > 0 ? ~x : x, as the latter two - have different result for x == 0. */ + only x < 0 ? ~x : x or x >= 0 ? x : ~x to one_cmpl_abs (x) + and x < 0 ? x : ~x or x >= 0 ? ~x : x to ~one_cmpl_abs (x), + but not other cases (x > -1 is equivalent of x >= 0). */ if (c == constm1_rtx && GET_CODE (cond) == GT) -{ - if (one_cmpl && negate) - return FALSE; -} +; else if (c == const1_rtx && GET_CODE (cond) == LT) { - if (one_cmpl && !negate) + if (one_cmpl) return FALSE; } else if (c == CONST0_RTX (GET_MODE (b))) @@ -2619,23 +2616,8 @@ noce_try_abs (struct noce_if_info *if_in if (one_cmpl) switch (GET_CODE (cond)) { - case GT: - if (!negate) - return FALSE; - break; case GE: - /* >= 0 is the same case as above > -1. */ - if (negate) - return FALSE; - break; case LT: - if (negate) - return FALSE; - break; - case LE: - /* <= 0 is the same case as above < 1. */ - if (!negate) - return FALSE; break; default: return FALSE; --- gcc/testsuite/gcc.c-torture/execute/pr68376-2.c.jj 2015-11-19 09:48:05.0 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr68376-2.c 2015-12-09 17:58:57.0 +0100 @@ -26,6 +26,30 @@ f4 (int x) return x <= 0 ? x : ~x; } +__attribute__((noinline, noclone)) int +f5 (int x) +{ + return x >= 0 ? ~x : x; +} + +__attribute__((noinline, noclone)) int +f6 (int x) +{ + return x >= 0 ? x : ~x; +} + +__attribute__((noinline, noclone)) int +f7 (int x) +{ + return x > 0 ? ~x : x; +} + +__attribute__((noinline, noclone)) int +f8 (int x) +{ + return x > 0 ? x : ~x; +} + int main () { @@ -37,5 +61,13 @@ main () abort (); if (f4 (5) != -6 || f4 (-5) != -5 || f4 (0) != 0) abort (); + if (f5 (5) != -6 || f5 (-5) != -5 || f5 (0) != -1) +abort (); + if (f6 (5) != 5 || f6 (-5) != 4 || f6 (0) != 0) +abort (); + if (f7 (5) != -6 || f7 (-5) != -5 || f7 (0) != 0) +abort (); + if (f8 (5) != 5 || f8 (-5) != 4 || f8 (0) != -1) +abort (); return 0; } --- gcc/testsuite/gcc.dg/pr68670-1.c.jj 2015-12-09 17:22:41.465621589 +0100 +++ gcc/testsuite/gcc.dg/pr68670-1.c2015-12-09 17:22:05.251126555 +0100 @@ -0,0 +1,5 @@ +/* PR rtl-optimization/68670 */ +/* { dg-do run } */ +/* { dg-options "-O2 -ftracer" } */ + +#include "../gcc.c-torture/execute/pr68376-1.c" --- gcc/testsuite/gcc.dg/pr68670-2.c.jj 2015-12-09 17:22:44.528578868 +0100 +++ gcc/testsuite/gcc.dg/pr68670-2.c2015-12-09 17:22:31.641758606 +0100 @@ -0,0 +1,5 @@ +/* PR rtl-optimization/68670 */ +/* { dg-do run } */ +/* { dg-options "-O2 -ftracer" } */ + +#include "../gcc.c-torture/execute/pr68376-2.c" Jakub
Re: [PATCH] rtlanal: Fix bits/bytes confusion in set_noop_p (PR68814)
> The meaning of ZERO_EXTRACT depends on BITS_BIG_ENDIAN, not on > BYTES_BIG_ENDIAN. That's correct. > Testing in progress on powerpc64le-linux; if it passes, is this > okay for trunk? > > > Segher > > > 2015-12-09 Segher Boessenkool> > PR rtl-optimization/68814 > * rtlanal.c (set_noop_p): Use BITS_BIG_ENDIAN instead of > BYTES_BIG_ENDIAN. > --- > gcc/rtlanal.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c > index 67098e5..f893bca 100644 > --- a/gcc/rtlanal.c > +++ b/gcc/rtlanal.c > @@ -1534,7 +1534,7 @@ set_noop_p (const_rtx set) > >if (GET_CODE (dst) == ZERO_EXTRACT) > return rtx_equal_p (XEXP (dst, 0), src) > -&& ! BYTES_BIG_ENDIAN && XEXP (dst, 2) == const0_rtx > +&& !BITS_BIG_ENDIAN && XEXP (dst, 2) == const0_rtx > && !side_effects_p (src); > >if (GET_CODE (dst) == STRICT_LOW_PART) I have a hard time figuring out what a SET verifying the condition would really mean. Could you post it and explain where it comes from? -- Eric Botcazou
[PATCH] Fix up fold_ctor_reference and fully_constant_vn_reference_p (PR tree-optimization/68785)
Hi! On a testcase like below which would trigger UB at runtime we trigger UB in the compiler, by reading uninitialized bytes. The VCE folding for which native_{encode,interpret}_expr has been originally written passes the length from the first one to the second one, so that the latter can return NULL_TREE (not fold) if not enough bytes in the buffer were filled. I believe this is the shortest fix for this issue and makes the code consistent with what is used in VCE folding. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2015-12-09 Jakub JelinekPR tree-optimization/68785 * gimple-fold.c (fold_ctor_reference): Pass return value from native_encode_expr to native_interpret_expr. * tree-ssa-sccvn.c (fully_constant_vn_reference_p): Likewise. * gcc.dg/pr68785.c: New test. --- gcc/gimple-fold.c.jj2015-11-24 11:43:35.0 +0100 +++ gcc/gimple-fold.c 2015-12-09 10:48:06.824975709 +0100 @@ -5495,9 +5495,10 @@ fold_ctor_reference (tree type, tree cto && size <= MAX_BITSIZE_MODE_ANY_MODE) { unsigned char buf[MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT]; - if (native_encode_expr (ctor, buf, size / BITS_PER_UNIT, - offset / BITS_PER_UNIT) > 0) - return native_interpret_expr (type, buf, size / BITS_PER_UNIT); + int len = native_encode_expr (ctor, buf, size / BITS_PER_UNIT, + offset / BITS_PER_UNIT); + if (len > 0) + return native_interpret_expr (type, buf, len); } if (TREE_CODE (ctor) == CONSTRUCTOR) { --- gcc/tree-ssa-sccvn.c.jj 2015-12-04 17:19:12.0 +0100 +++ gcc/tree-ssa-sccvn.c2015-12-09 10:50:30.329960789 +0100 @@ -1370,8 +1370,9 @@ fully_constant_vn_reference_p (vn_refere else { unsigned char buf[MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT]; - if (native_encode_expr (ctor, buf, size, off) > 0) - return native_interpret_expr (ref->type, buf, size); + int len = native_encode_expr (ctor, buf, size, off); + if (len > 0) + return native_interpret_expr (ref->type, buf, len); } } } --- gcc/testsuite/gcc.dg/pr68785.c.jj 2015-12-09 10:52:00.232698487 +0100 +++ gcc/testsuite/gcc.dg/pr68785.c 2015-12-09 10:50:54.0 +0100 @@ -0,0 +1,9 @@ +/* PR tree-optimization/68785 */ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +int +foo (void) +{ + return *(int *) ""; +} Jakub
Re: [PATCH] Better error recovery for merge-conflict markers (v4)
On 12/09/2015 10:44 AM, Bernd Schmidt wrote: Just thinking out loud - I guess it would be too much to hope for to share lexers between frontends so that we need only one copy of this? Probably :( Someone slap sense into me, I just thought of deriving C and C++ parsers from a common base class... (no this is not a suggestion for this patch). It'd be nice. But I don't even think it's on anyone's TODO list. Wording-wise, should it be "merge conflict marker", rather than "patch conflict marker"? Clang spells it: "error: version control conflict marker in file" http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#merge_conflicts Yeah, if another compiler has a similar/identical diagnostic I think we should just copy that unless there's a very good reason not to. Agreed. Rebased on top of r231445 (from yesterday). Successfully bootstrapped on x86_64-pc-linux-gnu. Adds 82 new PASSes to g++.sum and 27 new PASSes to gcc.sum. OK for trunk? I'm inclined to say yes since it was originally submitted in time and it's hard to imagine how the change could be risky (I'll point out right away that there are one or two other patches in the queue that were also submitted in time which I feel should not be considered for gcc-6 at this point due to risk). Let's wait until the end of the week for objections, commit then. As the person who has come the closest to objecting, I'll go on the record as "not objecting". jeff